High-Reliability Organizations

I’m writing this blog from the inside of an airplane high above the Atlantic. I have total confidence that my plane will deliver me safely to Europe. It’s astonishing. The people who run every aspect of this plane are ordinary folk. I knew a guy in college who spent his entire career as a pilot for the very airline I’m flying today. He was competent, smart, and very, very careful. But he was not expected to make things up as he went along. He liked to repeat an old saying: “There are old pilots and there are bold pilots, but there are no old, bold pilots.”

When I was younger, I recall that airplane crashes were relatively common. These were always prominently reported in the news. But today, airplane disasters not caused by terrorists or crazy people are extremely rare. The reason is that air disasters are so catastrophic that airlines have adopted procedures in every aspect of their operation to ensure that planes arrive safely at their destinations. Every system important to safety is checked and rechecked, with technology and humans backing each other up. I happen to have a nephew who is studying to be an aircraft mechanic. His course is extremely rigorous. Most people don’t make it through. His final test, he says, will have 80 questions. The minimum acceptable score: 80. His brother is a nuclear engineer on a navy submarine. Same kind of training, same requirement for success. No room for error. The need for such care in airplanes and submarines is obvious. But why not in education?

My friend and colleague Sam Stringfield had this idea many years ago. Based on it, he and a Welsh colleague, David Reynolds, created what they called “high-reliability schools.” They evaluated them in Wales, and found substantially greater gains in schools using this approach than in control schools.

Despite its success, the high-reliability idea did not catch hold in education. Yet any student who is unnecessarily failing in school is a catastrophe waiting to happen. You don’t need a lot of data tables to be convinced that students not reading well by third grade are headed for big trouble. They are disproportionately likely to end up in special education, to repeat one or more grades, to drop out of high school, and to get into behavioral difficulties and problems with the law. Each of these outcomes is hugely damaging to the student and hugely expensive to the taxpayer.

Yet there is no problem in all of education that is better researched than early reading failure. There are many proven strategies known to greatly reduce reading failure: whole school methods, small group, individual tutoring, technology, and more. Our Evidence for ESSA web site lists dozens of proven approaches. It is probably already the case that any school could identify students at risk of reading failure in kindergarten or first grade and then apply proven, easily available methods conscientiously to ensure that virtually every child will succeed in reading.

The point here is that if we wanted to, we could treat early reading the way airlines and submarines treat safety, as a life or death issue.

If schools accepted the high-reliability challenge for early reading, here is what they would do. First, they’d adopt proven pre-reading programs for pre-kindergarten, and then proven beginning reading programs for grades K-3. Teachers of these grades would receive extensive professional development and then in-class coaching to help them use these proven strategies as well as they were used in the research that validated them, or better.

Starting in kindergarten, we’d start to assess students in early reading skills, so we’d know which students need assistance in which specific skills. We’d continue to assess all students over time to be sure that all are on a path to success. The assessments would include vision and hearing so that problems in these areas are solved.

Each school would have staff trained and equipped to provide an array of services for students who are in need of additional help. These would include small-group tutoring for students with mild problems, and one-to-one tutoring for more serious problems. Multiple proven programs, each focusing on distinct problems, would be ready to deploy for students who need them. Students who need eyeglasses, hearing accommodations, or other health assistance would be treated. Students who are English learners would receive assistance with language and reading.

The point is, each school would be committed to ensuring the success of every child, and would be prepared to do so. Like my high-reliability nephews, the goal of every person in every school would be zero failures. Not just fewer. Zero.

There is no question that this goal could be accomplished. The only issue is whether it could be accomplished at a cost that would be politically acceptable. My guess is that a full-scale, replicable schoolwide strategy to ensure zero reading failures in high-poverty schools could add about $200 per child per year, from grades pre-K to 3. A lot of money, you say? Recall from a previous blog that the average per-pupil cost in the U.S. is approximately $11,000. What if it were $11,200, just for a few years? The near-term savings in special education and retentions, much less longer-term costs of delinquency and dropout, would more than return this investment.

But more than cost-effectiveness, there is a moral imperative here. Failing children who could succeed is simply wrong. We could greatly reduce or eliminate this problem, just as the aircraft industry has done. Our society must come to see school failure as the catastrophe that it is, and to use whatever proven methods are needed to make reading failure a problem of the past.

This blog is sponsored by the Laura and John Arnold Foundation

Advertisements

Transforming Transformation (and Turning Around Turnaround)

At the very end of the Obama Administration, the Institute for Education Sciences (IES) released the final report of an evaluation of the outcomes of the federal School Improvement Grant program. School Improvement Grants (SIG) are major investments to help schools with the lowest academic achievement in their states to greatly improve their outcomes.

The report, funded by the independent and respected IES and carried out by the equally independent and respected Mathematica Policy Associates, found that SIG grants made essentially no difference in the achievement of the students in schools that received them.

Bummer.

In Baltimore, where I live, we believe that if you spend $7 billion on something, as SIG has so far, you ought to have something to show for it. The disappointing findings of the Mathematica evaluation are bad news for all of the usual reasons. Even if there were some benefits, SIG turned out to be a less-than-compelling use of taxpayers’ funds.  The students and schools that received it really needed major improvement, but improved very little. The findings undermine faith in the ability of very low-achieving schools to turn themselves around.

However, the SIG findings are especially frustrating because they could have been predicted, were in fact predicted by many, and were apparent long before this latest report. There is no question that SIG funds could have made a substantial difference. Had they been invested in proven programs and practices, they would have surely improved student outcomes just as they did in the research that established the effectiveness of the proven programs.

But instead of focusing on programs proven to work, SIG forced schools to choose among four models that had never been tried before and were very unlikely to work.

Three of the four models were so draconian that few schools chose them. One involved closing the school, and another, conversion to a charter school. These models were rarely selected unless schools were on the way to doing these things anyway. Somewhat more popular was “turnaround,” which primarily involved replacing the principal and 50% of the staff. The least restrictive model, “transformation,” involved replacing the principal, using achievement growth to evaluate teachers, using data to inform instruction, and lengthening the school day or year.

The problem is that very low achieving schools are usually in low achieving areas, where there are not long lines of talented applicants for jobs as principals or teachers. A lot of school districts just swapped principals between SIG and non-SIG schools. None of the mandated strategies had a strong research base, and they still don’t. Low achieving schools usually have limited capacity to reform themselves under the best of circumstances, and SIG funding required replacing principals, good or bad, thereby introducing instability in already tumultuous places. Further, all four of the SIG models had a punitive tone, implying that the problem was bad principals and teachers. Who wants to work in a school that is being punished?

What else could SIG have done?

SIG could have provided funding to enable low-performing schools and their districts to select among proven programs. This would have maintained an element of choice while ensuring that whatever programs schools chose would have been proven effective, used successfully in other low-achieving schools, and supported by capable intermediaries willing and able to work effectively in struggling schools.

Ironically, SIG did finally introduce such an option, but it was too little, too late.  In 2015, SIG introduced two additional models, one of which was an Evidence-Based, Whole-School Reform model that would allow schools to utilize SIG funds to adopt a proven whole-school approach. The U.S. Department of Education carefully reviewed the evidence and identified four approaches with strong evidence and the ability to expand that could be utilized under this model. But hardly any schools chose to utilize these approaches because there was little promotion of the new models, and few school, district, or state leaders to this day even know they exist.

The old SIG program is changing under the Every Student Succeeds Act (ESSA). In order to receive school improvement funding under ESSA, schools will have to select from programs that meet the strong, moderate, or promising evidence requirements defined in ESSA. Evidence for ESSA, the free web site we are due to release later this month, will identify more than 90 reading and math programs that meet these requirements.

This is a new opportunity for federal, state, and district officials to promote the use of proven programs and build local capacity to disseminate proven approaches. Instead of being seen as a trip to the woodshed, school improvement funding might be seen as an opportunity for eager teachers and administrators to do cutting edge instruction. Schools using these innovative approaches might become more exciting and fulfilling places to work, attracting and retaining the best teachers and administrators, whose efforts will be reflected in their students’ success.

Perhaps this time around, school improvement will actually improve schools.

The Maryland Challenge

As the Olympic Games earlier this summer showed, Americans love to compare ourselves with other countries. Within the U.S., we like to compare our states with other states. When Ohio State plays the University of Michigan, it’s not just a football game.

In education, we also like to compare, and we usually don’t like what we see. Comparisons can be useful in giving us a point of reference for what is possible, but a point of reference doesn’t help if it is not seen as a peer. For example, U. S. students are in the middle of the pack of developed nations on Program for International Student Assessment (PISA) tests for 15 year olds, but Americans expect to do a lot better than that. The National Assessment of Educational Progress (NAEP) allows us to compare scores within the U.S., and unless you’re in Massachusetts, which usually scores highest, you probably don’t like those comparisons either. When we don’t like our ranking, we explain it away as best as we can. Countries with higher PISA scores have fewer immigrants, or pay their teachers better, or have cultures that value education more. States that do better are richer, or have other unfair advantages. These explanations may or may not have an element of truth, but the bottom line is that comparisons on such a grand scale are just not that useful. There are far too many factors that are different between nations or states, some of which are changeable and some not, at least in the near term.

If comparisons among unequal places are not so useful, what point of reference would be better?

Kevan Collins, Director of the Education Endowment Foundation in England (England’s equivalent to our Investing in Innovation (i3) program), has an answer to this dilemma, which he explained at a recent conference I attended in Stockholm. His idea is based on a major, very successful initiative of Tony Blair’s government beginning in 2003, called the London Challenge. Secondary schools in the greater London area were put into clusters according to students’ achievement at the end of primary (elementary) school, levels of poverty, numbers of children speaking languages other than English at home, size, and other attributes. Examination of the results being achieved by schools within the same cluster showed remarkable variation in test scores. Even in the poorest clusters there were schools performing above the national average, and in the wealthiest clusters there were schools below the average. Schools low in their own clusters were given substantial resources to improve, with a particular emphasis on leadership. Over time, London went from being one of the lowest-achieving areas of England to scoring among the highest. Later versions of this plan in Manchester and in the Midlands did not work as well, but they did not have much time before the end of the Blair government meant the end of the experiment.

Fast forward to today, and think about states in the U. S. as the unit of reform. Imagine that Maryland, my state, categorized its Title I elementary, middle, and high schools according to percent free lunch, ethnic composition, percent English learners, urban/rural, school size, and so on. Each of Maryland’s Title I schools would be in a cluster of perhaps 50 very similar schools. As in England, there would be huge variation in achievement within clusters.

Just forming clusters to shame schools low in their own cluster would not be enough. The schools need help to greatly improve their outcomes.

This being 2016, we have many more proven programs than were available in the London Challenge. Schools scoring below the median of their cluster might have the opportunity to choose proven programs appropriate to their strengths and needs. The goal would be to assist every school below the median in its own cluster to at least reach the median. School staffs would have to vote by at least 80% in favor to adopt various programs. The school would also commit to use most of its federal Title I funds to match supplemental state or federal funding to pay for the programs. Schools above the median would also be encouraged to adopt proven programs, but might not receive matching funds.

Imagine what could happen. Principals and staffs could no longer argue that it is unfair for their schools to be compared to dissimilar schools. They might visit schools performing at the highest levels in their clusters, and perhaps even form coalitions across district lines to jointly select proven approaches and help each other implement them.

Not all schools would likely participate in the first years, but over time, larger numbers might join in. Because schools would be implementing programs already known to work in schools just like theirs, and would be held accountable within a fair group of peers, schools should see rapid growth toward and beyond their cluster median, and more importantly, the entire clusters should advance toward state goals.

A plan like this could make a substantial difference in performance among all Title I schools statewide. It would focus attention sharply where it is needed, on improved teaching and learning in the schools that need it most. Within a few years, Maryland, or any other state that did the same, might blow past Massachusetts, and a few years after that, we’d all be getting visits from Finnish educators!

Rx for School Improvement

School Improvement Grants, or SIG, are supposed to be strong medicine for the most difficult ailments in the American school system: schools performing in the lowest 5% of their states. SIG provides proportional funding to states, which then hold competitions among low-achieving elementary and secondary schools. SIG funding is quite substantial, yet evaluations of SIG recipients find modest impacts on achievement. Most SIG schools gain about as much as non-SIG schools in the same state on state achievement measures.

Part of the problem with SIG is that until recently, schools had to choose among four models. All required draconian changes in governance (such as closure or charterization) or personnel (firing the principal and/or half of the staff). Worse, the grants were for only three years, so many schools spent most of that time recovering from SIG-inflicted disruptions. Not to mention that none of the solutions had any evidence of effectiveness.

Last year, SIG changed for the better. Schools could choose among three additional models, including a proven whole-school reform model, in which schools could implement an externally-developed model with at least one large, randomized study indicating positive achievement effects. This and one other model were continued into the current year, after which SIG will transition into a remodeled School Improvement program under the Every Student Succeeds Act (ESSA).

The proven whole-school option should have been a major advance, but it is not yet making much of a difference. Few SIG schools applied under this option for the 2015-2016 school year. One problem is that only four proven programs qualified: two elementary (our Success for All program and Positive Action), and two secondary (Institute for Student Achievement and New York City’s small high schools program). Things may pick up this year, but none of us see any indication yet that this will be the case. More likely, schools will once again chose among the four original models, because they are familiar and known to reliably bring in the grants.

As ESSA requires states and districts to transition to the new School Improvement program, perhaps things will be different. ESSA does not require any particular models, but does require that schools receiving School Improvement funding use programs that meet “strong,” “moderate,” or “promising” standards of evidence of effectiveness.

ESSA could have the same problem as the previous SIG proven whole-school reform option. Not many programs will meet the evidence standards at first. With the traditional four SIG models swept away and more federal emphasis on research, perhaps there will be more use of proven programs in low performing schools, but perhaps not.

Here is an additional idea that could greatly increase the use of proven programs in low performing schools. Consistent with ESSA regulations, School Improvement leaders at the federal and state levels might encourage qualifying schools to either adopt proven whole-school models, as in the current whole-school reform model, or to build their own model, using proven components. For example, a qualifying school might adopt a proven reading program, a proven math program, and a proven tutoring approach for struggling readers. Because there are several proven models of each kind, this would give schools much more flexibility and choice. Coordinating multiple programs takes some care, but the coordination itself would be part of the School Improvement plan. Imagine, for example, that the U.S. Department of Education created a recommended list of components that could be fulfilled by one partner organization (a proven whole-school program) or by several providers of proven approaches. Here is a simple checklist that might be suggested for an elementary school:
 Proven reading program
 Proven math program
 Proven tutoring program for struggling readers
 Proven social-emotional learning/behavior management approach

ESSA allows for a considerable range of evidence, from “strong” (at least one randomized study) to “promising” (one correlational study with statistical controls for pretests). The law is what it is, but I wonder if states and local districts, and perhaps even the U.S. Department of Education, might encourage schools to choose programs that meet the highest standards. The four programs approved by the U.S. Department of Education that meet the current SIG whole-school reform model in the current law had to meet what amounts to the “strong” evidence standard. For schools in major trouble, why would we encourage use of weaker evidence? Stronger evidence increases certainty of effectiveness, and certainty is the goal.

We have spent many years and billions of dollars failing to turn around thousands of schools that demonstrably need a lot of help. These are places where proven programs should make a substantial difference. Can anyone think of a reason we shouldn’t try?

Helping Struggling Schools

2015-04-02-1427985578-9049647-MrFixIt_500x39004_2_2015.jpg

Illustration by James Bravo

There are a lot of schools in the U.S. that need to be achieving much better outcomes. However, there is a much smaller group of schools in which achievement levels are appalling. The solutions for garden-variety low-achieving schools are arguably different from those for schools with the very worst levels of performance.

In recent years, a key resource for very low-achieving schools has been the School Improvement Grants (SIG) program. SIG provides substantial funding to schools in the lowest 5 percent of their states on achievement measures. Up until this year, schools receiving SIG funds had to choose among four models.

This year, three new models were added to SIG. These include an option to use an evidence-proven whole-school reform model; such a model has to have been successfully evaluated in a study meeting What Works Clearinghouse (WWC) standards. This is potentially a significant advance. Perhaps, if many schools choose proven models, this option will enhance the effectiveness and image of the much-maligned SIG program. (Full disclosure: Our Success for All program is one of four approaches approved thus far by the U.S. Department of Education for use under the evidence-proven whole-school option for SIG.)

However, stop for a moment and consider what’s going on here. The lowest-achieving schools in America are being offered substantial funding (currently, up to $2 million over five years) to turn themselves around. These schools need the very best programs provided by organizations with a proven track record. Of all schools, why should these very needy schools receive unproven approaches, perhaps invented from scratch for the SIG proposal and therefore never even piloted before? When your car breaks down, do you tow it to a mechanic who has never fixed a car before? When you have medical problems, do you want to be someone’s first patient?

There should always be a fund separate from ordinary Title I to provide intensive assistance to schools in the lowest 5 percent of their states on state assessments. However, instead of focusing SIG on governance, personnel, and untested prescriptions, as it has done up to now, SIG (or its successor) should focus on helping schools select and effectively implement proven programs. In addition to the four “evidence-proven whole-school reform” models identified recently, SIG schools might be funded to implement a mix of reading approaches, math approaches, tutoring models, and social-emotional approaches, for example, each of which has convincing evidence of effectiveness.

The recent changes in SIG, allowing proven whole-school reforms, are a big step in the right direction, but additional steps in the same direction are needed to make this crucial investment a model of wise use of federal funds to solve serious problems in education.

Stop the Churn: How Federal Policy Adds Chaos to Schools

I just read a very interesting book called Improbable Scholars, by David Kirp of the University of California at Berkeley. In it, Kirp tells stories of his more than two years of observing schools in Union City, New Jersey, a mostly poor, Hispanic district that has done well on state tests for many years. He describes a caring, planful, well-organized district led for many years by an outstanding educator, Sandy Sanger. Kirp also gives brief descriptions of two other outstanding districts, Aldine, Texas and Montgomery County, Maryland.

Kirp notes some commonalities across these different districts, including an emphasis on high-quality preschools and intelligent use of data. However, most importantly, he describes them as tortoises rather than hares, places that build up great staff and great schools bit by bit over long periods.

Necessary for the “tortoise” approach, however, is something in very short supply in urban districts: Stability. All of Kirp’s outstanding districts have had superintendents who stayed in office for a decade or more, far more than the national urban average of 2 ½ years.

Our experience working with urban districts is very much the same. Districts that consistently do well with disadvantaged and minority students – Steubenville, Ohio; Alhambra, Arizona; Geary County, Kansas; Victoria, Texas and many more – are not the big headline districts that change superintendents the way Donald Trump changes interns. Instead, these are places in which dedicated educators work for a decade or more to progressively improve outcomes for all children.

As I mentioned recently in Power to the Schools, the problem is that federal, state, and local policies promote churn rather than stability. Heavy pressure on superintendents to boost scores right now or risk firing lead to surface solutions (or cheating of various kinds) rather than long-term planning and coalition building. Policies such as state takeovers and school closures rarely work and they add immeasurably to churn. Michele Rhee and her broom, which swept out 90% of principals in the District of Columbia, is just one example.

The federal School Improvement Grants program requires that schools close down, become charters, or fire substantial proportions of their staff and principal to qualify for large grants. The result? We don’t know yet, but early reports focus on the difficulties this churn introduces to a school community.

I am not suggesting complacency about poor performance. Some school and district leaders have to go. But there is a difference between pruning bad apples and constantly uprooting trees. Wholesale and indiscriminate firings give whole districts a short-term, fearful mentality instead of a loving, strategic, and planful mentality.

Instead of relying on threats and firings, federal and state policies need to focus on assisting low-performing schools and districts to learn about and adopt proven models. This allows the good people already in the schools to work at high-quality implementation of strategies known to work, rather than dodging and weaving to avoid punishment. Incompetent administrators and teachers can be weeded out, but the ones who are doing their best can make better use of support than fear.

The first principle learned by every physician is “first, do no harm.” In education policy, this should also be a starting point. Churn itself undermines school success, so before introducing massive personnel changes, consider whether the personnel who are already there could be aided to do a better job. People don’t go into education to deliberately harm children. Education policies can and must use accountability to recognize and support progress without becoming a source of terror, cynicism, and churn.

Innovation Step by Step

Here’s an astonishing statistic. Apparently, dairy cows today each produce six times as much milk as they did in 1950. Consumption of dairy products per person is about the same as it was then, so if milk per cow were the same as in 1950, we’d need six times as many cows per person; vastly greater acreage and other resources would be needed. The same pattern is true for almost any area of agriculture.

Yet do you recall any breakthroughs in agriculture in the past 60 years? I don’t. Instead, the steady gains in agricultural productivity are due to hundreds or thousands of small advances. In the case of dairy cows, it’s advances in breeding, feed, veterinary care, milking technology, and so on.

In education, we often act as though we’re waiting for breakthroughs: New technologies, new assessments, radically new teaching methods, and so on. When breakthroughs do not materialize, we lose faith in research and development as a path to reform.

Yet in medicine, technology, agriculture, and other fields that base progress on evidence, progress is constant and cumulative. Breakthroughs may take place, but more often it’s small, step-by-step improvements with evidence of effectiveness that move the field forward. When education finally embraces R&D as a basis for adoption of innovation, progress in each subject and grade level will probably also be steady rather than remarkable. Programs and practices found to make a modest but meaningful difference in student learning outcomes will accumulate over time, as took place in dairy farming and so many other fields that have seen substantial progress over the years.

As my colleague Jon Baron recently wrote in a New York Times article, “Scientifically rigorous studies – particularly, the “gold standard” of randomized controlled trials – are a mainstay of medicine, providing conclusive evidence of effectiveness for most major medical advances in recent history. In social spending, by contrast, such studies have only a toehold. Where they have been used, however, they have demonstrated the same ability to produce important, credible evidence about what works – and illuminated a path to major progress.”

Precisely because genuine progress in educational programs and practice is likely to be gradual, it is especially critical that support for the R&D process be sustained and steady over time, since exciting headlines will be rare. If someone comes up with a “smart pill” or a new technology that doubles learning rates, all the better; the same R&D process that supports evolutionary change could also produce revolutionary change. But don’t count on it.

Let’s be clear. Reading scores in the U. S. have been virtually unchanged since 1980. Achievement gaps by social class and race have been about the same for 30 years. We should be outraged by this, but we need to turn that outrage into a commitment both to use the proven programs and practices available now and to engage in research and development that leads over time to truly transformative innovation.

Don’t forget to follow me on Facebook and Twitter.