Miss Evers’ Boys (And Girls)

Most people who have ever been involved with human subjects’ rights know about the Tuskegee Syphilis Study. This was a study of untreated syphilis, in which 622 poor, African American sharecroppers, some with syphilis and some without, were evaluated over 40 years.

The study, funded and overseen by the U.S. Public Health Service, started in 1932. In 1940, researchers elsewhere discovered that penicillin cured syphilis. By 1947, penicillin was “standard of care” for syphilis, meaning that patients with syphilis received penicillin as a matter of course, anywhere in the U.S.

But not in Tuskegee. Not in 1940. Not in 1947. Not until 1972, when a whistle-blower made the press aware of what was happening. In the meantime, many of the men died of syphilis, 40 of their wives contracted the disease, and 19 of their children were born with congenital syphilis. The men had never even been told the nature of the study, they were not informed in 1940 or 1947 that there was now a cure, and they were not offered that cure. Leaders of the U.S. Public Health Service were well aware that there was a cure for syphilis, but for various reasons, they did not stop the study. Not in 1940, not in 1947, not even when whistle-blowers told them what was going on. They stopped it only when the press found out.

blog_11-1-18_tuskegee_500x363

In 1997 a movie on the Tuskegee Syphilis Study was released. It was called Miss Evers’ Boys. Miss Evers (actually, Eunice Rivers) was the African-American public health nurse who was the main point of contact for the men over the whole 40 years. She deeply believed that she, and the study, were doing good for the men and their community, and she formed close relationships with them. She believed in the USPHS leadership, and thought they would never harm her “boys.”

The Tuskegee study was such a crime and scandal that it utterly changed procedures for medical research in the U.S. and most of the world. Today, participants in research with any level of risk, or their parents if they are children, must give informed consent for participation in research, and even if they are in a control group, they must receive at least “standard of care”: currently accepted, evidence-based practices.

If you’ve read my blogs, you’ll know where I’m going with this. Failure to use proven educational treatments, unlike medical ones, is rarely fatal, at least not in the short term. But otherwise, our profession carries out Tuskegee crimes all the time. It condemns failing students to ineffective programs and practices when effective ones are known. It fails to even inform parents or children, much less teachers and principals, that proven programs exist: Proven, practical, replicable solutions for the problems they face every day.

Like Miss Rivers, front-line educators care deeply about their charges. Most work very hard and give their absolute best to help all of their children to succeed. Teaching is too much hard work and too little money for anyone to do it for any reason but for the love of children.

But somewhere up the line, where the big decisions are made, where the people are who know or who should know which programs and practices are proven to work and which are not, this information just does not matter. There are exceptions, real heroes, but in general, educational leaders who believe that schools should use proven programs have to fight hard for this position. The problem is that the vast majority of educational expenditures—textbooks, software, professional development, and so on—lack even a shred of evidence. Not a scintilla. Some have evidence that they do not work. Yet advocates for those expenditures (such as sales reps and educators who like the programs) argue strenuously for programs with no evidence, and it’s just easier to go along. Whole states frequently adopt or require textbooks, software, and services of no known value in terms of improving student achievement. The ESSA evidence standards were intended to focus educators on evidence and incentivize use of proven programs, at least for the lowest-achieving 5% of schools in each state, but so far it’s been slow going.

Yet there are proven alternatives. Evidence for ESSA (www.evidenceforessa.org) lists more than 100 PK-12 reading and math programs that meet the top three ESSA evidence standards. The majority meet the top level, “Strong.” And most of the programs were researched with struggling students. Yet I am not perceiving a rush to find out about proven programs. I am hearing a lot of new interest in evidence, but my suspicion, growing every day, is that many educational leaders do not really care about the evidence, but are instead just trying to find a way to keep using the programs and providers they already have and already like, and are looking for evidence to justify keeping things as they are.

Every school has some number of struggling students. If these children are provided with the same approaches that have not worked with them or with millions like them, it is highly likely that most will fail, with all the consequences that flow from school failure: Retention. Assignment to special education. Frustration. Low expectations. Dropout. Limited futures. Poverty. Unemployment. There are 50 million children in grades PK to 12 in the U.S. This is the grinding reality for perhaps 10 to 20 million of them. Solutions are readily available, but not known or used by caring and skilled front-line educators.

In what way is this situation unlike Tuskegee in 1940?

 Photo credit: By National Archives Atlanta, GA (U.S. government) ([1], originally from National Archives) [Public domain], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Response to Proven Instruction (RTPI)

Response to Intervention (RTI) is one of those great policy ideas caring policymakers always come up with that is carefully crafted and enthusiastically announced, then inconsistently implemented, evaluated at great cost, and found to have minimal impacts, if any.   In the case of RTI, the policy is genuinely sensible, but the 2015 MDRC evaluation (Balu et al., 2015) found that the implementation was poor and outcomes were nil, at least as measured in a much-criticized regression discontinuity design (see Fuchs & Fuchs, 2017).  An improvement on RTI, multi-tier systems of support (MTSS), adds in some good ideas, but I don’t think it will be enough.

The problem, I think, relates to something I wrote about at the time the MDRC study appeared. In fact, I gave the phenomenon a name: Bob’s Law, which states that any policy or intervention that is not well defined will not be well implemented and therefore will not work, no matter how sensible it may be. In the case of RTI/MTSS, everyone has a pretty good idea what “tier 1, tier 2, and tier 3” are in concept, but no one knows what they are actually composed of. So each district and school and teacher makes up their own strategies to do general teaching followed by remediation if needed, followed by intensive services if necessary. The problem is that since the actual programs provided in each tier are not specified, everyone will do pretty much what they would have done if RTI had not existed. And guess what?  If both RTI and non-RTI teachers are drawing from the same universally accepted basket of teaching methods, there is no reason to believe that outcomes will be better than ordinary practice if the RTI group is doing more or less the same thing as the non-RTI group.  This is not to say that standard methods are deficient, but why would we expect outcomes to differ if practices don’t?

Response to Proven Instruction (RTPI).

I recently wrote an article proposing a new approach to RTI/MTSS (Slavin, Inns, Pellegrini, & Lake, 2018).  The idea is simple. Why not insist that struggling learners receive tier 1, tier 2, and (if necessary) tier 3 services, each of which is proven to work in rigorous research?  In the article I listed numerous tier 2 and tier 3 services for reading and math that have all been successfully evaluated, with significant outcomes and effect sizes in excess of +0.20.  Every one of these programs involved tutoring, one to one or one to small group, by teachers or paraprofessionals. I also listed tier 1 services found to be very effective for struggling learners.  All of these programs are described at www.evidenceforessa.org.

blog 10 25 18 figure 1 2

If there are so many effective approaches for struggling learners, these should form the core of RTI/MTSS services. I would argue that tier 1 should be composed of proven whole class or whole school programs; tier 2, one-to-small group tutoring by well-qualified paraprofessionals using proven approaches; and tier 3, one-to-one tutoring by paraprofessionals or teachers using proven approaches (see Figure 1).

The result would have to be substantial improvements in the achievement of struggling learners, and reductions in special education and retentions.  These outcomes are assured, as long as implementation is strong, because the programs themselves are proven to work.  Over time, better and more cost-effective programs would be sure to appear, but we could surely do a lot better today with the programs we have now.

Millions of children live in the cruel borderlands between low reading groups and special education. These students are perfectly normal, except from 9:00 to 3:00 on school days. They start school with enthusiasm, but then slide over the years into failure, despair, and then dropout or delinquency.  If we have proven approaches and can use them in a coherent system to ensure success for all of these children, why would we not use them?

Children have a right to have every chance to succeed.  We have a moral imperative to see that they receive what they need, whatever it takes.

References

Balu, R., Zhu, P., Doolittle, F., Schiller, E., Jenkins, J., & Gersten, R. (2015). Evaluation of response to intervention practices for elementary school reading. Washington, DC: U.S. Department of Education, Institute for Education Sciences, NCEE 2016-4000.

Fuchs, D., & Fuchs, L.S. (2017). Critique of the National Evaluation of Response to Intervention: A case for simpler frameworks. Exceptional Children, 83 (3), 1-14.

Slavin, R.E., Inns, A., Pellegrini, M., & Lake, C. (2018). Response to proven instruction (RTPI): Enabling struggling learners. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Two Years of Second Grade? Really?

In a recent blog, Mike Petrilli, President of the Fordham Institute, floated an interesting idea. Given the large numbers of students in high-poverty schools who finish elementary school far behind, what if we gave them all a second year of second grade? (he calls it “2.5”). This, he says, would give disadvantaged schools another year to catch kids up, without all the shame and fuss of retaining them.

blog_10-18-18_2ndgrade_500x333

At one level, I love this idea, but not on its merits. One more year of second grade would cost school districts or states the national average per-pupil cost of $11,400. So would I like to have $11,400 more for every child in a school district serving many disadvantaged students? You betcha. But another year of second grade is not in the top hundred things I’d do with it.

Just to give you an idea of what we’re talking about, my state, Maryland, has about 900,000 students in grades K-12. Adding a year of second grade for all of them would cost about $10,260,000,000. If half of them are, say, in Title 1 schools (one indicator of high poverty), that’s roughly $5 billion and change. Thanks, Mike! To be fair, this $5 billion would be spent over a 12-year period, as students go through year 2.5, so let’s say only a half billion a year.

What could Maryland’s schools do with a half billion dollars a year?  Actually, I wrote them a plan, arguing that if Maryland were realistically planning to ensure the success of every child on that state tests, they could do it, but it would not be cheap.

What Maryland, or any state, could do with serious money would be to spend it on proven programs, especially for struggling learners. As one example, consider tutoring. The well-known Reading Recovery program, for instance, uses a very well-trained tutor working one-to-one with a struggling first grader for about 16 weeks. The cost was estimated by Hollands et al. (2016) at roughly $4600. So Petrilli’s second grade offer could be traded for about three years of tutoring, not just for struggling first graders, but for every single student in a high-poverty school. And there are much less expensive forms of tutoring. It would be easy to figure out how every single student in, say, Baltimore, could receive tutoring every single year of elementary school using paraprofessionals and small groups for students with less serious problems and one-to-one tutoring for those with more serious problems (see Slavin, Inns, & Pellegrini, 2018).

Our Evidence for ESSA website lists many proven, highly effective approaches in reading and math. These are all ready to go; the only reason that they are not universally used is that they cost money, or so I assume. And not that much money, in the grand scheme of things.

I don’t understand why, even in this thought experiment, Mike Petrili is unwilling to consider the possibility of spending serious money on programs and practices that have actually been proven to work. But in case anyone wants to follow up on his idea, or at least pilot it in Maryland, please mail me $5 billion, and I will make certain that every student in every high-poverty school in the state does in fact reach the end of elementary school performing at or near grade level. Just don’t expect to see double when you check in on our second graders.

References

Hollands, F. M., Kieffer, M. J., Shand, R., Pan, Y., Cheng, H., & Levin, H. M. (2016). Cost-effectiveness analysis of early reading programs: A demonstration with recommendations for future research. Journal of Research on Educational Effectiveness9(1), 30-53.

Slavin, R. E., Inns, A., Pellegrini, M. & Lake (2018).  Response to proven instruction (RTPI): Enabling struggling learners. Submitted for publication.

Photo credit: By Petty Officer 1st Class Jerry Foltz (https://www.dvidshub.net/image/383907) [Public domain], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

New Findings on Tutoring: Four Shockers

blog_04 05 18_SURPRISE_500x353One-to-one and one-to-small group tutoring have long existed as remedial approaches for students who are performing far below expectations. Everyone knows that tutoring works, and nothing in this blog contradicts this. Although different approaches have their champions, the general consensus is that tutoring is very effective, and the problem with widespread use is primarily cost (and for tutoring by teachers, availability of sufficient teachers). If resources were unlimited, one-to-one tutoring would be the first thing most educators would recommend, and they would not be wrong. But resources are never unlimited, and the numbers of students performing far below grade level are overwhelming, so cost-effectiveness is a serious concern. Further, tutoring seems so obviously effective that we may not really understand what makes it work.

In recent reviews, my colleagues and I examined what is known about tutoring. Beyond the simple conclusion that “tutoring works,” we found some big surprises, four “shockers.” Prepare to be amazed! Further, I propose an explanation to account for these unexpected findings.

We have recently released three reviews that include thorough, up-to-date reviews of research on tutoring. One is a review of research on programs for struggling readers in elementary schools by Amanda Inns and colleagues (2018). Another is a review on programs for secondary readers by Ariane Baye and her colleagues (2017). Finally, there is a review on elementary math programs by Marta Pellegrini et al. (2018). All three use essentially identical methods, from the Best Evidence Encyclopedia (www.bestevidence.org). In addition to sections on tutoring strategies, all three also include other, non-tutoring methods directed at the same populations and outcomes.

What we found challenges much of what everyone thought they knew about tutoring.

Shocker #1: In all three reviews, tutoring by paraprofessionals (teaching assistants) was at least as effective as tutoring by teachers. This was found for reading and math, and for one-to-one and one-to-small group tutoring.  For struggling elementary readers, para tutors actually had higher effect sizes than teacher tutors. Effect sizes were +0.53 for paras and +0.36 for teachers in one-to-one tutoring. For one-to-small group, effect sizes were +0.27 for paras, +0.09 for teachers.

Shocker #2: Volunteer tutoring was far less effective than tutoring by either paras or teachers. Some programs using volunteer tutors provided them with structured materials and extensive training and supervision. These found positive impacts, but far less than those for paraprofessional tutors. Volunteers tutoring one-to-one had an effect size of +0.18, paras had an effect size of +0.53. Because of the need for recruiting, training, supervision, and management, and also because the more effective tutoring models provide stipends or other pay, volunteers were not much less expensive than paraprofessionals as tutors.

Shocker #3:  Inexpensive substitutes for tutoring have not worked. Everyone knows that one-to-one tutoring works, so there has long been a quest for approaches that simulate what makes tutoring work. Yet so far, no one, as far as I know, has found a way to turn lead into tutoring gold. Although tutoring in math was about as effective as tutoring in reading, a program that used online math tutors communicating over the Internet from India and Sri Lanka to tutor students in England, for example, had no effect. Technology has long been touted as a means of simulating tutoring, yet even when computer-assisted instruction programs have been effective, their effect sizes have been far below those of the least expensive tutoring models, one-to-small group tutoring by paraprofessionals. In fact, in the Inns et al. (2018) review, no digital reading program was found to be effective with struggling readers in elementary schools.

 Shocker #4: Certain whole-class and whole-school approaches work as well or better for struggling readers than tutoring, on average. In the Inns et al. (2018) review, the average effect size for one-to-one tutoring approaches was +0.31, and for one-to-small group approaches it was +0.14. Yet the mean for whole-class approaches, such as Ladders to Literacy (ES = +0.48), PALS (ES = +0.65), and Cooperative Integrated Reading and Composition (ES = +0.19) averaged +0.33, similar to one-to-one tutoring by teachers (ES = +0.36). The mean effect sizes for comprehensive tiered school approaches, such as Success for All (ES = +0.41) and Enhanced Core Reading Instruction (ES = +0.22) was +0.43, higher than any category of tutoring (note that these models include tutoring as part of an integrated response to implementation approach). Whole-class and whole-school approaches work with many more students than do tutoring models, so these impacts are obtained at a much lower cost per pupil.

Why does tutoring work?

Most researchers and others would say that well-structured tutoring models work primarily because they allow tutors to fully individualize instruction to the needs of students. Yet if this were the only explanation, then other individualized approaches, such as computer-assisted instruction, would have outcomes similar to those of tutoring. Why is this not the case? And why do paraprofessionals produce at least equal outcomes to those produced by teachers as tutors? None of this squares with the idea that the impact of tutoring is entirely due to the tutor’s ability to recognize and respond to students’ unique needs. If that were so, other forms of individualization would be a lot more effective, and teachers would presumably be a lot more effective at diagnosing and responding to students’ problems than would less highly trained paraprofessionals. Further, whole-class and whole-school reading approaches, which are not completely individualized, would have much lower effect sizes than tutoring.

My theory to account for the positive effects of tutoring in light of the four “shockers” is this:

  • Tutoring does not work due to individualization alone. It works due to individualization plus nurturing and attention.

This theory begins with the fundamental and obvious assumption that children, perhaps especially low achievers, are highly motivated by nurturing and attention, perhaps far more than by academic success. They are eager to please adults who relate to them personally.  The tutoring setting, whether one-to-one or one-to-very small group, gives students the undivided attention of a valued adult who can give them personal nurturing and attention to a degree that a teacher with 20-30 students cannot. Struggling readers may be particularly eager to please a valued adult, because they crave recognition for success in a skill that has previously eluded them.

Nurturing and attention may explain the otherwise puzzling equality of outcomes obtained by teachers and paraprofessionals as tutors. Both types of tutors, using structured materials, may be equally able to individualize instruction, and there is no reason to believe that paras will be any less nurturing or attentive. The assumption that teachers would be more effective as tutors depends on the belief that tutoring is complicated and requires the extensive education a teacher receives. This may be true for very unusual learners, but for most struggling students, a paraprofessional may be as capable as a teacher in providing individualization, nurturing, and attention. This is not to suggest that paraprofessionals are as capable as teachers in every way. Teachers have to be good at many things: preparing and delivering lessons, managing and motivating classes, and much more. However, in their roles as tutors, teachers and paraprofessionals may be more similar.

Volunteers certainly can be nurturing and attentive, and can be readily trained in structured programs to individualize instruction. The problem, however, is that studies of volunteer programs report difficulties in getting volunteers to attend every day and to avoid dropping out when they get a paying job. This is may be less of a problem when volunteers receive a stipend; paid volunteers are much more effective than unpaid ones.

The failure of tutoring substitutes, such as individualized technology, is easy to predict if the importance of nurturing and attention is taken into account. Technology may be fun, and may be individualized, but it usually separates students from the personal attention of caring adults.

Whole-Class and Whole-School Approaches.

Perhaps the biggest shocker of all is the finding that for struggling readers, certain non-technology approaches to instruction for whole classes and schools can be as effective as tutoring. Whole-class and whole-school approaches can serve many more students at much lower cost, of course. These classroom approaches mostly use cooperative learning and phonics-focused teaching, or both, and the whole-school models especially Success for All,  combine these approaches with tutoring for students who need it.

The success of certain whole-class programs, of certain tutoring approaches, and of whole-school approaches that combine proven teaching strategies with tutoring for students who need more, argues for response to intervention (RTI), the policy that has been promoted by the federal government since the 1990s. So what’s new? What’s new is that the approach I’m advocating is not just RTI. It’s RTI done right, where each component of  the strategy has strong evidence of effectiveness.

The good news is that we have powerful and cost-effective tools at our disposal that we could be putting to use on a much more systematic scale. Yet we rarely do this, and as a result far too many students continue to struggle with reading, even ending up in special education due to problems schools could have prevented. That is the real shocker. It’s up to our whole profession to use what works, until reading failure becomes a distant memory. There are many problems in education that we don’t know how to solve, but reading failure in elementary school isn’t one of them.

Practical Implications.

Perhaps the most important practical implication of this discussion is a realization that benefits similar or greater than those of one-to-one tutoring by teachers can be obtained in other ways that can be cost-effectively extended to many more students: Using paraprofessional tutors, using one-to-small group tutoring, or using whole-class and whole-school tiered strategies. It is no longer possible to say with a shrug, “of course tutoring works, but we can’t afford it.” The “four shockers” tell us we can do better, without breaking the bank.

 

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (2017). Effective reading programs for secondary students. Manuscript submitted for publication. Also see Baye, A., Lake, C., Inns, A. & Slavin, R. E. (2017, August). Effective Reading Programs for Secondary Students. Baltimore, MD: Johns Hopkins University, Center for Research and Reform in Education.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Photo by Westsara (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

 

High-Reliability Organizations

I’m writing this blog from the inside of an airplane high above the Atlantic. I have total confidence that my plane will deliver me safely to Europe. It’s astonishing. The people who run every aspect of this plane are ordinary folk. I knew a guy in college who spent his entire career as a pilot for the very airline I’m flying today. He was competent, smart, and very, very careful. But he was not expected to make things up as he went along. He liked to repeat an old saying: “There are old pilots and there are bold pilots, but there are no old, bold pilots.”

When I was younger, I recall that airplane crashes were relatively common. These were always prominently reported in the news. But today, airplane disasters not caused by terrorists or crazy people are extremely rare. The reason is that air disasters are so catastrophic that airlines have adopted procedures in every aspect of their operation to ensure that planes arrive safely at their destinations. Every system important to safety is checked and rechecked, with technology and humans backing each other up. I happen to have a nephew who is studying to be an aircraft mechanic. His course is extremely rigorous. Most people don’t make it through. His final test, he says, will have 80 questions. The minimum acceptable score: 80. His brother is a nuclear engineer on a navy submarine. Same kind of training, same requirement for success. No room for error. The need for such care in airplanes and submarines is obvious. But why not in education?

My friend and colleague Sam Stringfield had this idea many years ago. Based on it, he and a Welsh colleague, David Reynolds, created what they called “high-reliability schools.” They evaluated them in Wales, and found substantially greater gains in schools using this approach than in control schools.

Despite its success, the high-reliability idea did not catch hold in education. Yet any student who is unnecessarily failing in school is a catastrophe waiting to happen. You don’t need a lot of data tables to be convinced that students not reading well by third grade are headed for big trouble. They are disproportionately likely to end up in special education, to repeat one or more grades, to drop out of high school, and to get into behavioral difficulties and problems with the law. Each of these outcomes is hugely damaging to the student and hugely expensive to the taxpayer.

Yet there is no problem in all of education that is better researched than early reading failure. There are many proven strategies known to greatly reduce reading failure: whole school methods, small group, individual tutoring, technology, and more. Our Evidence for ESSA web site lists dozens of proven approaches. It is probably already the case that any school could identify students at risk of reading failure in kindergarten or first grade and then apply proven, easily available methods conscientiously to ensure that virtually every child will succeed in reading.

The point here is that if we wanted to, we could treat early reading the way airlines and submarines treat safety, as a life or death issue.

If schools accepted the high-reliability challenge for early reading, here is what they would do. First, they’d adopt proven pre-reading programs for pre-kindergarten, and then proven beginning reading programs for grades K-3. Teachers of these grades would receive extensive professional development and then in-class coaching to help them use these proven strategies as well as they were used in the research that validated them, or better.

Starting in kindergarten, we’d start to assess students in early reading skills, so we’d know which students need assistance in which specific skills. We’d continue to assess all students over time to be sure that all are on a path to success. The assessments would include vision and hearing so that problems in these areas are solved.

Each school would have staff trained and equipped to provide an array of services for students who are in need of additional help. These would include small-group tutoring for students with mild problems, and one-to-one tutoring for more serious problems. Multiple proven programs, each focusing on distinct problems, would be ready to deploy for students who need them. Students who need eyeglasses, hearing accommodations, or other health assistance would be treated. Students who are English learners would receive assistance with language and reading.

The point is, each school would be committed to ensuring the success of every child, and would be prepared to do so. Like my high-reliability nephews, the goal of every person in every school would be zero failures. Not just fewer. Zero.

There is no question that this goal could be accomplished. The only issue is whether it could be accomplished at a cost that would be politically acceptable. My guess is that a full-scale, replicable schoolwide strategy to ensure zero reading failures in high-poverty schools could add about $200 per child per year, from grades pre-K to 3. A lot of money, you say? Recall from a previous blog that the average per-pupil cost in the U.S. is approximately $11,000. What if it were $11,200, just for a few years? The near-term savings in special education and retentions, much less longer-term costs of delinquency and dropout, would more than return this investment.

But more than cost-effectiveness, there is a moral imperative here. Failing children who could succeed is simply wrong. We could greatly reduce or eliminate this problem, just as the aircraft industry has done. Our society must come to see school failure as the catastrophe that it is, and to use whatever proven methods are needed to make reading failure a problem of the past.

This blog is sponsored by the Laura and John Arnold Foundation

The Maryland Challenge

As the Olympic Games earlier this summer showed, Americans love to compare ourselves with other countries. Within the U.S., we like to compare our states with other states. When Ohio State plays the University of Michigan, it’s not just a football game.

In education, we also like to compare, and we usually don’t like what we see. Comparisons can be useful in giving us a point of reference for what is possible, but a point of reference doesn’t help if it is not seen as a peer. For example, U. S. students are in the middle of the pack of developed nations on Program for International Student Assessment (PISA) tests for 15 year olds, but Americans expect to do a lot better than that. The National Assessment of Educational Progress (NAEP) allows us to compare scores within the U.S., and unless you’re in Massachusetts, which usually scores highest, you probably don’t like those comparisons either. When we don’t like our ranking, we explain it away as best as we can. Countries with higher PISA scores have fewer immigrants, or pay their teachers better, or have cultures that value education more. States that do better are richer, or have other unfair advantages. These explanations may or may not have an element of truth, but the bottom line is that comparisons on such a grand scale are just not that useful. There are far too many factors that are different between nations or states, some of which are changeable and some not, at least in the near term.

If comparisons among unequal places are not so useful, what point of reference would be better?

Kevan Collins, Director of the Education Endowment Foundation in England (England’s equivalent to our Investing in Innovation (i3) program), has an answer to this dilemma, which he explained at a recent conference I attended in Stockholm. His idea is based on a major, very successful initiative of Tony Blair’s government beginning in 2003, called the London Challenge. Secondary schools in the greater London area were put into clusters according to students’ achievement at the end of primary (elementary) school, levels of poverty, numbers of children speaking languages other than English at home, size, and other attributes. Examination of the results being achieved by schools within the same cluster showed remarkable variation in test scores. Even in the poorest clusters there were schools performing above the national average, and in the wealthiest clusters there were schools below the average. Schools low in their own clusters were given substantial resources to improve, with a particular emphasis on leadership. Over time, London went from being one of the lowest-achieving areas of England to scoring among the highest. Later versions of this plan in Manchester and in the Midlands did not work as well, but they did not have much time before the end of the Blair government meant the end of the experiment.

Fast forward to today, and think about states in the U. S. as the unit of reform. Imagine that Maryland, my state, categorized its Title I elementary, middle, and high schools according to percent free lunch, ethnic composition, percent English learners, urban/rural, school size, and so on. Each of Maryland’s Title I schools would be in a cluster of perhaps 50 very similar schools. As in England, there would be huge variation in achievement within clusters.

Just forming clusters to shame schools low in their own cluster would not be enough. The schools need help to greatly improve their outcomes.

This being 2016, we have many more proven programs than were available in the London Challenge. Schools scoring below the median of their cluster might have the opportunity to choose proven programs appropriate to their strengths and needs. The goal would be to assist every school below the median in its own cluster to at least reach the median. School staffs would have to vote by at least 80% in favor to adopt various programs. The school would also commit to use most of its federal Title I funds to match supplemental state or federal funding to pay for the programs. Schools above the median would also be encouraged to adopt proven programs, but might not receive matching funds.

Imagine what could happen. Principals and staffs could no longer argue that it is unfair for their schools to be compared to dissimilar schools. They might visit schools performing at the highest levels in their clusters, and perhaps even form coalitions across district lines to jointly select proven approaches and help each other implement them.

Not all schools would likely participate in the first years, but over time, larger numbers might join in. Because schools would be implementing programs already known to work in schools just like theirs, and would be held accountable within a fair group of peers, schools should see rapid growth toward and beyond their cluster median, and more importantly, the entire clusters should advance toward state goals.

A plan like this could make a substantial difference in performance among all Title I schools statewide. It would focus attention sharply where it is needed, on improved teaching and learning in the schools that need it most. Within a few years, Maryland, or any other state that did the same, might blow past Massachusetts, and a few years after that, we’d all be getting visits from Finnish educators!

Lighthouses

Visiting coastal lighthouses is one of the highlights of any beach vacation. Mostly relics of the 18th and 19th century, lighthouses played a vital role in helping harbor pilots and ship captains find channels and avoid hazards such as rocks and sandbars. They saved many ships and crews that would otherwise have been lost at night or in storms or fog.

In education, the idea of “lighthouse schools” comes up frequently. The notion here is to identify schools that are making impressive progress, especially in challenging circumstances, and then publicize their successes. The metaphor, of course, is that these “lighthouses” guide others to reach their goals.

Taken as a form of journalism, rather than a form of science, a policy of identifying and publicizing lighthouse schools in high-poverty neighborhoods makes a statement that success is actually possible. With stories of tough but loving principals, caring and hard-working staff and ringing praise from parents and politicians, lighthouse stories are irresistible feel-good press.

However, if what we want is evidence-based reform, identifying and describing lighthouse schools is less useful. The process is riddled with problems. First, you have to make sure that the lighthouse school is truly a shining example. In the 1980s, everyone was abuzz about District 2 in New York City, which was rapidly rising in academic performance using a variety of innovative methods and visionary leadership. I asked my New York friends at the time what was going on in District 2, and they fell on the floor laughing. It turned out that District 2 was quickly gentrifying. A former high-poverty area was now attracting many upper-middle class families eager for quality schools that did not charge tuition. Lighthouse schools often have such explanations.

Lighthouse schools are sometimes identified based on a single outstanding year, which may be in the past. This may mean that a temporarily-outstanding school is now rather ordinary.

Even if a given school has made major, sustained gains serving the same population it always served, it is rarely clear what caused the change. Was it the principal? New teachers? Additional resources? Innovative programs? In a single school, it is impossible to pick out what made the difference because everything is intertwined with everything else. A journalistic approach simply cannot account for this.

Further, even if we have strong suspicions about what makes the lighthouse shine, that factor may not be replicable. What if it is the principal? Another school might hire away that principal, but that hardly moves the system forward. What if a local foundation gave the school a pile of money? What if the school managed to attract volunteer tutors from across the city? Such advantages are possible in some schools or even districts, but not in others.

The whole lighthouse school idea is undercut by the observation that, whatever the “secret sauce” might be, it does not travel well. If it did, we would find not only lighthouse schools but lighthouse districts and states, in which great ideas spread outward using replicable methods. The very fact that one school stands out from its neighbors should give us pause about whether the neighbors have the capacity and the willingness to imitate success.

Lighthouse schools can certainly contribute ideas or inspiration to evidence-based reform, but before a program can be considered effective and replicable, it needs to be clearly defined and then evaluated by rigorous methods. Such evaluation would normally compare at least 20 schools assigned at random to implement the program to 20 similar schools that continue with their practice as usual. If such a study finds that the schools using the innovative program did better, then we really know something worthwhile. In this scenario, we could have confidence because factors other than the innovation balance out. The experimental and control schools are likely to have equal numbers of good principals, equal funding (on average), equally qualified teachers (on average) and so on. The only difference between the experimental and control schools is the “secret sauce” itself, which, if it can work in 20 schools or more, is probably replicable.

Lighthouses once guided ships to safe harbors, but in education, policies limited to finding and celebrating lighthouse schools are less likely to improve outcomes more broadly. They may lead policy in a good direction, but they may just as likely guide us onto the rocks.