The Farmer and the Moon Rocks: What Did the Moon Landing Do For Him?

Many, many years ago, during the summer after my freshman year in college, I hitchhiked from London to Iran.  This was the summer of 1969, so Apollo 11 was also traveling.   I saw television footage of the moon landing in Heraklion, Crete, where a television store switched on all of its sets and turned them toward the sidewalk.  A large crowd watched the whole thing.  This was one of the few times I recall when it was really cool to be an American abroad.

After leaving Greece, I went on to Turkey, and then Iran.  In Teheran, I got hold of an English-language newspaper.  It told an interesting story.  In rural Iran, many people believed that the moon was a goddess.  Obviously, a spaceship cannot land on a goddess, so many people concluded that the moon landing must be a hoax.

A reporter from the newspaper interviewed a number of people about the moon landing.  Some were adamant that the landing could not have happened.  However, one farmer was more pragmatic.  He asked the reporter, “I hear the astronauts brought back moon rocks.  Is that right?”

“That’s what they say!” replied the reporter.

“I am fixing my roof, and I could sure use a few of those moon rocks.  Do you think they might give me some?”

blog_8-1-19_moonfarmer_500x432 (002)

The moon rock story illustrates a daunting problem in the dissemination of educational research. Researchers do high-quality research on topics of great importance to the practice of education. They publish this research in top journals, and get promotions and awards for it, but in most cases, their research does not arouse even the slightest bit of interest among the educators for whom it was intended.

The problem relates to the farmer repairing his roof.  He had a real problem to solve, and he needed help with it.  A reporter comes and tells him about the moon landing. The farmer does not think, “How wonderful!  What a great day for science and discovery and the future of mankind!”  Instead, he thinks, “What does this have to do with me?”  Thinking back on the event, I sometimes wonder if he really expected any moon rocks, or if he was just sarcastically saying, “I don’t care.”

Educators care deeply about their students, and they will do anything they can to help them succeed.  But if they hear about research that does not relate to their children, or at least to children like theirs, they are unlikely to care very much.  Even if the research is directly applicable to their students, they are likely to reason, perhaps from long experience, that they will never get access to this research, because it costs money or takes time or upsets established routines or is opposed by powerful groups or whatever.  The result is status quo as far as the eye can see, or implementation of small changes that are currently popular but unsupported by evidence of effectiveness.  Ultimately, the result is cynicism about all research.

Part of the problem is that education is effectively a government monopoly, so entrepreneurship or responsible innovation are difficult to start or maintain.  However, the fact that education is a government monopoly can also be made into a positive, if government leaders are willing to encourage and support evidence-based reform.

Imagine that government decided to provide incentive funding to schools to help them adopt programs that meet a high standard of evidence.  This has actually happened under the ESSA law, but only in a very narrow slice of schools, those very low achieving schools that qualify for school improvement.  Imagine that the government provided a lot more support to schools to help them learn about, adopt, and effectively implement proven programs, and then gradually expanded the categories of schools that could qualify for this funding.

Going back to the farmer and the moon rocks, such a policy would forge a link between exciting research on promising innovations and the real world of practice.  It could cause educators to pay much closer attention to research on practical programs of relevance to them, and to learn how to tell the difference between valid and biased research.  It could help educators become sophisticated and knowledgeable consumers of evidence and of programs themselves.

One of the best examples of the transformation such policies could bring about is agriculture.  Research has a long history in agriculture, and from colonial times, government has encouraged and incentivized farmers to pay attention to evidence about new practices, new seeds, new breeds of animals, and so on.  By the late 19th century, the U.S. Department of Agriculture was sponsoring research, distributing information designed to help farmers be more productive, and much more.  Today, research in agriculture is a huge enterprise, constantly making important discoveries that improve productivity and reduce costs.  As a result, world agriculture, especially American agriculture, is able to support far larger populations at far lower costs than anyone ever thought possible.  The Iranian farmer talking about the moon rocks could not see how advances in science could possibly benefit him personally.  Today, however, in every developed economy, farmers have a clear understanding of the connection between advances in science and their own success.  Everyone knows that agriculture can have bad as well as good effects, as when new practices lead to pollution, but when governments decide to solve those problems, they turn to science. Science is not inherently good or bad, but if it is powerful, then democracies can direct it to do what is best for people.

Agriculture has made dramatic advances over the past hundred years, and continues to make rapid progress by linking science to practice.  In education, we are just starting to make the link between evidence and practice.  Isn’t it time to learn from the experiences of medicine, technology, and agriculture, among many other evidence based fields, to achieve more rapid progress in educational practice and outcomes?

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Educational Policies vs. Educational Programs: Evidence from France

Ask any parent what their kids say when they ask them what they did in school today. Invariably, they respond, “Nuffin,” or some equivalent. My four-year-old granddaughter always says, “I played with my fwends.” All well and good.

However, in educational policy, policy makers often give the very same answer when asked, “What did the schools not using the (insert latest policy darling) do?”

“Nuffin’”. Or they say, “Whatever they usually do.” There’s nothing wrong with the latter answer if it’s true. But given the many programs now known to improve student achievement (see www.evidenceforessa.org), why don’t evaluators compare outcomes of new policy initiatives to those of proven educational programs known to improve the same outcomes the policy innovation is supposed to improve, perhaps at far lower cost per student? The evaluations should also compare to “business as usual,” but adding proven programs to evaluations of large policy innovations would help avoid declaring policy innovations to be successful when they are in fact just slightly more effective than “business as usual,” and much less effective or less cost-effective than alternative proven approaches? For example, when evaluating charter schools, why not routinely compare them to whole-school reform models that have similar objectives? When evaluating extending the school day or school year to help high-poverty schools, why not compare these innovations to using the same amount of additional money to hiring tutors to use proven tutoring models to help struggling students? In evaluating policies in which students are held back if they do not read at grade level by third grade, why not compare these approaches to intensive phonics instruction and tutoring in grades K-3, which are known to greatly improve student reading achievement?

blog_7-25-19_LeoandAdaya_375x500
There is nuffin like a good fwend.

As one example of research comparing a policy intervention to a promising educational intervention, I recently saw a very interesting pair of studies from France. Ecalle, Gomes, Auphan, Cros, & Magnan (2019) compared two interventions applied in special priority areas with high poverty levels. Both interventions focused on reading in first grade.

One of the interventions involved halving class size, from approximately 24 students to 12. The other provided intensive reading instruction in small groups (4-6 children) to students who were struggling in reading, as well as less intensive interventions to larger groups (10-12 students). Low achievers got two 30-minute interventions each day for a year, while the higher-performing readers got one 30-minute intervention each day. In both cases, the focus of instruction was on phonics. In all cases, the additional interventions were provided by the students’ usual teachers.

The students in small classes were compared to students in ordinary-sized classes, while the students in the educational intervention were compared to students in same-sized classes who did not get the group interventions. Similar measures and analyses were used in both comparisons.

The results were nearly identical for the class size policy and the educational intervention. Halving class size had effect sizes of +0.14 for word reading and +0.22 for spelling. Results for the educational intervention were +0.13 for word reading, +0.12 for spelling, +0.14 for a group test of reading comprehension, +0.32 for an individual test of comprehension, and +0.19 for fluency.

These studies are less than perfect in experimental design, but they are nevertheless interesting. Most importantly, the class size policy required an additional teacher for each class of 24. Using Maryland annual teacher salaries and benefits ($84,000), that means the cost in our state would be about $3500 per student. The educational intervention required one day of training and some materials. There was virtually no difference in outcomes, but the differences in cost were staggering.

The class size policy was mandated by the Ministry of Education. The educational intervention was offered to schools and provided by a university and a non-profit. As is so often the case, the policy intervention was simplistic, easy to describe in the newspaper, and minimally effective. The class size policy reminds me of a Florida program that extended the school schedule by an hour every day in high-poverty schools, mainly to provide more time for reading instruction. The cost per child was about $800 per year. The outcomes were minimal (ES=+0.05).

After many years of watching what schools do and reviewing research on outcomes of innovations, I find it depressing that policies mandated on a substantial scale are so often found to be ineffective. They are usually far more expensive than much more effective, rigorously evaluated programs that are, however, a bit more difficult to describe, and rarely arouse great debate in the political arena. It’s not that anyone is opposed to the educational intervention, but it is a lot easier to carry a placard saying “Reduce Class Size Now!” than to carry one saying “Provide Intensive Phonics in Small Groups with More Supplemental Teaching for the Lowest Achievers Now!” The latter just does not fit on a placard, and though easy to understand if explained, it does not lend itself to easy communication. Actually, there are much more effective first grade interventions than the one evaluated in France (see www.evidenceforessa.org). At a cost much less than $3500 per student, several one-to-one tutoring programs using well-trained teaching assistants as tutors would have been able to produce an effect size of more than +0.50 for all first graders on average. This would even fit on a placard: “Tutoring Now!”

I am all in favor of trying out policy innovations. But when parents of kids in a proven-program comparison group are asked what they did in school today, they shouldn’t say “nuffin’”. They should say, “My tooter taught me to read. And I played with my fwends.”

References

Ecalle, J., Gomes, C., Auphan, P., Cros, L., & Magnan, A. (2019). Effects of policy and educational interventions intended to reduce difficulties in literacy skills in grade 1. Studies in Educational Evaluation, 61, 12-20.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

The Fabulous 20%: Programs Proven Effective in Rigorous Research

blog_4-18-19_girlcheer_500x333
Photo courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

Over the past 15 years, governments in the U.S. and U.K. have put quite a lot of money (by education standards) into rigorous research on promising programs in PK-12 instruction. Rigorous research usually means studies in which schools, teachers, or students are assigned at random to experimental or control conditions and then pre- and posttested on valid measures independent of the developers. In the U.S., the Institute for Education Sciences (IES) and Investing in Innovation (i3), now called Education Innovation Research (EIR), have led this strategy, and in the U.K., it’s the Education Endowment Foundation (EEF). Enough research has now been done to enable us to begin to see important patterns in the findings.

One finding that is causing some distress is that the numbers of studies showing significant positive effects is modest. Across all funding programs, the proportion of studies reporting positive, significant findings averages around 20%. It is important to note that most funded projects evaluate programs that have been newly developed and not previously evaluated. The “early phase” or “development” category of i3/EIR is a good example; it provides small grants intended to fund creation or refinement of new programs, so it is not so surprising that these studies are less likely to find positive outcomes. However, even programs that have been successfully evaluated in the past often do not replicate their positive findings in the large, rigorous evaluations required at the higher levels of i3/EIR and IES, and in all full-scale EEF studies. The problem is that positive outcomes may have been found in smaller studies in which hard-to-replicate levels of training or monitoring by program developers may have been possible, or in which measures made by developers or researchers were used, or where other study features made it easier to find positive outcomes.

The modest percentage of positive findings has caused some observers to question the value of all these rigorous studies. They wonder if this is a worthwhile investment of tax dollars.

One answer to this concern is to point out that while the percentage of all studies finding positive outcomes is modest, so many have been funded that the number of proven programs is growing rapidly. In our Evidence for ESSA website (www.evidenceforessa.org), we have found 111 programs that meet ESSA’s Strong, Moderate, or Promising standards in elementary and secondary reading or math. That’s a lot of proven programs, especially in elementary reading, where there were 62.

The situation is a bit like that in medicine. A very small percentage of rigorous studies of medicines or other treatments show positive effects. Yet so many are done that each year, new proven treatments for all sorts of diseases enter widespread use in medical practice. This dynamic is one explanation for the steady increases in life expectancy taking place throughout the world.

Further, high quality studies that fail to find positive outcomes also contribute to the science and practice of education. Some programs do not meet standards for statistical significance, but nevertheless they show promise overall or with particular subgroups. Programs that do not find clear positive outcomes but closely resemble other programs that do are another category worth further attention. Funders can take this into account in deciding whether to fund another study of programs that “just missed.”

On the other hand, there are programs that show profoundly zero impact, in categories that never or almost never find positive outcomes. I reported recently on benchmark assessments,  with an overall effect size of -0.01 across 10 studies. This might be a good candidate for giving up, unless someone has a markedly different approach unlike those that have failed so often. Another unpromising category is textbooks. Textbooks may be necessary, but the idea that replacing one textbook with another has failed many, many times. This set of negative results can be helpful to schools, enabling them to focus their resources on programs that do work. But giving up on categories of studies that hardly ever work would significantly reduce the 80% failure rate, and save money better spent on evaluating more promising approaches.

The findings of many studies of replicable programs can also reveal patterns that should help current or future developers create programs that meet modern standards of evidence. There are a few patterns I’ve seen across many programs and studies:

  1. I think developers (and funders) vastly underestimate the amount and quality of professional development needed to bring about significant change in teacher behaviors and student outcomes. Strong professional development requires top-quality initial training, including simulations and/or videos to show teachers how a program works, not just tell them. Effective PD almost always includes coaching visits to classrooms to give teachers feedback and new ideas. If teachers fall back into their usual routines due to insufficient training and follow-up coaching, why would anyone expect their students’ learning to improve in comparison to the outcomes they’ve always gotten? Adequate professional development can be expensive, but this cost is highly worthwhile if it improves outcomes.
  2. In successful programs, professional development focuses on classroom practices, not solely on improving teachers’ knowledge of curriculum or curriculum-specific pedagogy. Teachers standing at the front of the class using the same forms of teaching they’ve always used but doing it with more up-to-date or better-aligned content are not likely to significantly improve student learning. In contrast, professional development focused on tutoring, cooperative learning, and classroom management has a far better track record.
  3. Programs that focus on motivation and relationships between teachers and students and among students are more likely to enhance achievement than programs that focus on cognitive growth alone. Successful teaching focuses on students’ hearts and spirits, not just their minds.
  4. You can’t beat tutoring. Few approaches other than one-to-one or one-to-small group tutoring have consistent powerful impacts. There is much to learn about how to make tutoring maximally effective and cost-effective, but let’s start with the most effective and cost-effective tutoring models we have now and build out from there .
  5. Many, perhaps most failed program evaluations involve approaches with great potential (or great success) in commercial applications. This is one reason that so many evaluations fail; they assess textbooks or benchmark assessments or ordinary computer assisted instruction approaches. These often involve little professional development or follow-up, and they may not make important changes in what teachers do. Real progress in evidence-based reform will begin when publishers and software developers come to believe that only proven programs will succeed in the marketplace. When that happens, vast non-governmental resources will be devoted to development, evaluation, and dissemination of well-implemented forms of proven programs. Medicine was once dominated by the equivalent of Dr. Good’s Universal Elixir (mostly good-tasting alcohol and sugar). Very cheap, widely marketed, and popular, but utterly useless. However, as government began to demand evidence for medical claims, Dr. Good gave way to Dr. Proven.

Because of long-established policies and practices that have transformed medicine, agriculture, technology, and other fields, we know exactly what has to be done. IES, i3/EIR, and EEF are doing it, and showing great progress. This is not the time to get cold feet over the 80% failure rate. Instead, it is time to celebrate the fabulous 20% – programs that have succeeded in rigorous evaluations. Then we need to increase investments in evaluations of the most promising approaches.

 

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Benchmark Assessments: Weighing the Pig More Often?

There is an old saying about educational assessment: “If you want to fatten a pig, it doesn’t help to weigh it more often.”

To be fair, it may actually help to weigh pigs more often, so the farmer knows whether they are gaining weight at the expected levels. Then they can do something in time if this is not the case.

It is surely correct that weighing pigs does no good in itself, but it may serve a diagnostic purpose. What matters is not the weighing, but rather what the farmer or veterinarian does based on the information provided by the weighing.

blog_4-11-19_pigscale_500x432

This blog is not, however, about porcine policy, but educational policy. In schools, districts, and even whole states, most American children take “benchmark assessments” roughly three to six times a year. These assessments are intended to tell teachers, principals, and other school leaders how students are doing, especially in reading and math. Ideally, benchmark assessments are closely aligned with state accountability tests, making it possible for school leaders to predict how whole grade levels are likely to do on the state tests early enough in the year to enable them to provide additional assistance in areas of need. The information might be as detailed as “fourth graders need help in fractions” or “English learners need help in vocabulary.”

Benchmark assessments are only useful if they improve scores on state accountability tests. Other types of intervention may be beneficial even if they do not make any difference in state test scores, but it is hard to see why benchmark assessments would be valuable if they do not in fact have any impact on state tests, or other standardized tests.

So here is the bad news: Research finds that benchmark assessments do not make any difference in achievement.

High-quality, large scale randomized evaluations of benchmark assessments are relatively easy to do. Many have in fact been done. Use of benchmark assessments have been evaluated in elementary reading and math (see www.bestevidence.org). Here is a summary of the findings.

Number of Studies Mean Effect Size
Elementary Reading 6 -0.02
Elementary Math 4    .00
Study-weighted mean 10 -0.01

In a rational world, these findings would put an end to benchmark assessments, at least as they are used now. The average outcomes are not just small, they are zero. They use up a lot of student time and district money.

In our accountability-obsessed educational culture, how could use of benchmark assessments make no difference at all on the only measure they are intended to improve? I would suggest several possibilities.

First, perhaps the most likely, is that teachers and schools do not do much with the information from benchmark assessments. If you are trying to lose weight, you likely weigh yourself every day. But if you then make no systematic effort to change your diet or increase your exercise, then all those weighings are of little value. In education, the situation is much worse than in weight reduction, because teachers are each responsible for 20-30 students. Results of benchmark assessments are different for each student, so a school staff that learns that its fourth graders need improvement in fractions finds it difficult to act on this information. Some fourth graders in every school are excelling in fractions, some just need a little help, and some are struggling in fractions because they missed the prerequisite skills. “Teach more fractions” is not a likely solution except for some of that middle group, yet differentiating instruction for all students is difficult to do well.

Another problem is that it takes time to score and return benchmark assessments, so by the time a team of teachers decides how to respond to benchmark information, the situation has moved on.

Third, benchmark assessments may add little because teachers and principals already know a lot more about their students than any test can tell them. Imagine a principal receiving the information that her English learners need help in vocabulary. I’m going to guess that she already knows that. But more than that, she and her teachers know which English learners need what kind of vocabulary, and they have other measures and means of finding out. Teachers already give a lot of brief, targeted curriculum-linked assessments, and they always have. Further, wise teachers stroll around and listen in on students working in cooperative groups, or look at their tests or seatwork or progress on computer curriculum, to get a sophisticated understanding of why some students are having trouble, and ideas for what to do about it. For example, it is possible that English learners are lacking school-specific vocabulary, such as that related to science or social studies, and this observation may suggest solutions (e.g., teach more science and social studies). But what if some English learners are afraid or unwilling to express themselves in class, but sit quietly and never volunteer answers? A completely different set of solutions might be appropriate in this case, such as using cooperative learning or tutoring strategies to give students safe spaces in which to use the vocabulary they have, and gain motivation and opportunities to learn and use more.

Benchmark assessments fall into the enormous category of educational solutions that are simple, compelling, and wrong. Yes, teachers need to know what students are learning and what is needed to improve it, but they have available many more tools that are far more sensitive, useful, timely, and tied to actions teachers can take.

Eliminating benchmark assessments would save schools a lot of money. Perhaps that money could be redirected to professional development to help teachers use approaches actually proven to work. I know, that’s crazy talk. But perhaps if we looked at what students are actually doing and learning in class, we could stop weighing pigs and start improving teaching for all children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Government Plays an Essential Role in Diffusion of Innovations

Lately I’ve been hearing a lot of concern in reform circles about how externally derived evidence can truly change school practices and improve outcomes. Surveys of principals, for example, routinely find that principals rarely consult research in making key decisions, including decisions about adopting materials, software, or professional development intended to improve student outcomes. Instead, principals rely on their friends in similar schools serving similar students. In the whole process, research rarely comes up, and if it does, it is often generic research on how children learn rather than high-quality evaluations of specific programs they might adopt.

Principals and other educational leaders have long been used to making decisions without consulting research. It would be difficult to expect otherwise, because of three conditions that have prevailed roughly from the beginning of time to very recently: a) There was little research of practical value on practical programs; b) The research that did exist was of uncertain quality, and school leaders did not have the time or training to determine studies’ validity; c) There were no resources provided to schools to help them adopt proven programs, so doing so required that they spend their own scarce resources.

Under these conditions, it made sense for principals to ask around among their friends before selecting programs or practices. When no one knows anything about a program’s effectiveness, why not ask your friends, who at least (presumably) have your best interests at heart and know your context? Since conditions a, b, and c have defined the context for evidence use nearly up to the present, it is not surprising that school leaders have built a culture of distrust for anyone outside of their own circle when it comes to choosing programs.

However, all three of conditions a, b, and c have changed substantially in recent years, and they are continuing to change in a positive direction at a rapid rate:

a) High-quality research on practical programs for elementary and secondary schools is growing at an extraordinary rate. As shown in Figure 1, the number of rigorous randomized or quasi-experimental studies in elementary and secondary reading and in elementary math have skyrocketed since about 2003, due mostly to investments by the Institute for Education Sciences (IES) and Investing in Innovation (i3). There has been a similar explosion of evidence in England, due to funding from the Education Endowment Foundation (EEF). Clearly, we know a lot more about which programs work and which do not than we once did.

blog_1-10-19_graph2_1063x650

b) Principals, teachers, and the public can now easily find reliable and accessible information on practical programs on the What Works Clearinghouse (WWC), Evidence for ESSA, and other sites. No one can complain any more that information is inaccessible or incomprehensible.

c) Encouragement and funding are becoming available for schools eager to use proven programs. Most importantly, the federal ESSA law is providing school improvement funding for low-achieving schools that agree to implement programs that meet the top three ESSA evidence standards (strong, moderate, or promising). ESSA also provides preference points for applications for certain sources of federal funding if they promise to use the money to implement proven programs. Some states have extended the same requirement to apply to eligibility for state funding for schools serving students who are disadvantaged or are ethnic or linguistic minorities. Even schools that do not meet any of these demographic criteria are, in many states, being encouraged to use proven programs.

blog_1-10-19_uscapitol_500x375

Photo credit: Jorge Gallo [Public domain], from Wikimedia Commons

I think the current situation is like that which must have existed in, say, 1910, with cars and airplanes. Anyone could see that cars and airplanes were the future. But I’m sure many horse-owners pooh-poohed the whole thing. “Sure there are cars,” they’d say, “but who will build all those paved roads? Sure there are airplanes, but who will build airports?” The answer was government, which could see the benefits to the entire economy of systems of roads and airports to meet the needs of cars and airplanes.

Government cannot solve all problems, but it can create conditions to promote adoption and use of proven innovations. And in education, federal, state, and local governments are moving rapidly to do this. Principals may still prefer to talk to other principals, and that’s fine. But with ever more evidence on ever more programs and with modest restructuring of funds governments are already awarding, conditions are coming together to utterly transform the role of evidence in educational practice.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence, Standards, and Chicken Feathers

In 1509, John Damian, an alchemist in the court of James IV of Scotland proclaimed that he had developed a way for humans to fly. He made himself some wings from chicken feathers and jumped from the battlements of Stirling Castle, the Scottish royal residence at the time. His flight was brief but not fatal.  He landed in a pile of manure, and only broke his thigh.  Afterward, he explained that the problem was that he used the wrong kind of feathers.  If only he had used eagle feathers, he could have flown, he asserted.  Fortunately for him, he never tried flying again, with any kind of feathers.

blog_11-15-18_humanornithopter_500x314

The story of John Damian’s downfall is humorous, and in fact the only record of it is a contemporary poem making fun of it. Yet there are important analogies to educational policy today from this incident in Scottish history. These are as follows:

  1. Damian proclaimed the success of his plan for human flight before he or anyone else had tried it and found it effective.
  2. After his flight ended in the manure pile, he proclaimed (again without evidence) that if only he’d used eagle feathers, he would have succeeded. This makes sense, of course, because eagles are much better flyers than chickens.
  3. He was careful never to actually try flying with eagle feathers.

All of this is more or less what we do all the time in educational policy, with one big exception.  In education, based on Damian’s experience, we might have put forward policies stating that from now on human powered flight must only be done with eagle feathers, not chicken feathers.

What I am referring to in education is our obsession with standards as a basis for selecting textbooks, software, and professional development, and the relative lack of interest in evidence. Whole states and districts spend a lot of time devising standards and then reviewing materials and services to be sure that they align with these standards. In contrast, the idea of checking to see that texts, software, and PD have actually been evaluated and found to be effective in real classrooms with real teachers and students has been a hard slog.

Shouldn’t textbooks and programs that meet modern standards also produce higher student performance on tests closely aligned with those standards? This cannot be assumed. Not long ago, my colleagues and I examined every reading and math program rated “meets expectations” (the highest level) on EdReports, a website that rates programs in terms of their alignment with college- and career-ready standards.  A not so grand total of two programs had any evidence of effectiveness on any measure not made by the publishers. Most programs rated “meets expectations” had no evidence at all, and a smaller number had been evaluated and found to make no difference.

I am not in any way criticizing EdReports.  They perform a very valuable service in helping schools and districts know which programs meet current standards. It makes no sense for every state and district to do this for themselves, especially in the cases where there are very few or no proven programs. It is useful to at least know about programs aligned with standards.

There is a reason that so few products favorably reviewed on EdReports have any positive outcomes in rigorous research. Most are textbooks, and very few textbooks have evidence of effectiveness. Why? The fact is that standards or no standards, EdReports or no EdReports, textbooks do not differ very much from each other in aspects that matter for student learning. Textbooks differ (somewhat) in content, but if there is anything we have learned from our many reviews of research on what works in education, what matters is pedagogy, not content. Yet since decisions about textbooks and software depend on standards and content, decision makers almost invariably select textbooks and software that have never been successfully evaluated.

Even crazy John Damian did better than we do. Yes, he claimed success in flying before actually trying it, but at last he did try it. He concluded that his flying plan would have worked if he’d used eagle feathers, but he never imposed this untested standard on anyone.

Untested textbooks and software probably don’t hurt anyone, but millions of students desperately need higher achievement, and focusing resources on untested or ineffective textbooks, software, and PD does not move them forward. The goal of education is to help all students succeed, not to see that they use aligned materials. If a program has been proven to improve learning, isn’t that a lot more important than proving that it aligns with standards? Ideally, we’d want schools and districts to use programs that are both proven effective and aligned with standards, but if no programs meet both criteria, shouldn’t those that are proven effective be preferred? Without evidence, aren’t we just giving students and teachers eagle feathers and asking them to take a leap of faith?

Photo credit: Humorous portrayal of a man who flies with wings attached to his tunic, Unknown author [Public domain], via Wikimedia Commons/Library of Congress

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

Preschool is Not Magic. Here’s What Is.

If there is one thing that everyone knows about policy-relevant research in education, it is this: Participation in high-quality preschool programs (at age 4) has substantial and lasting effects on students’ academic and life success, especially for students from disadvantaged homes. The main basis for this belief is the findings of the famous Perry Preschool program, which randomly assigned 128 disadvantaged youngsters in Ypsilanti, Michigan, to receive intensive preschool services or not to receive these services. The Perry Preschool study found positive effects at the end of preschool, and long-term positive impacts on outcomes such as high school graduation, dependence on welfare, arrest rates, and employment (Schweinhart, Barnes, & Weikart, 1993).

blog_8-2-18_magicboy_500x333

But prepare to be disappointed.

Recently, a new study has reported a very depressing set of outcomes. Lipsey, Farran, & Durkin (2018) published a large, randomized study evaluating Tennessee’s statewide preschool program. 2990 four year olds were randomly assigned to participate in preschool, or not. As in virtually all preschool studies, children who were randomly assigned to preschool scored much better than those who were assigned to the control group. But these results diminished in kindergarten, and by first grade, no positive effects could be detected. By third grade, the control group actually scored significantly higher than the former preschool students in math and science, and non-significantly higher in reading!

Jon Baron of the Laura and John Arnold Foundation wrote an insightful commentary on this study, noting that when such a large, well-done, long-term, randomized study is reported, we have to take the results seriously, even if they disagree with our most cherished beliefs. At the end of Baron’s brief summary was a commentary by Dale Farran and Mark Lipsey, two the study’s authors, telling the story of the hostile reception to their paper in the early childhood research community and the difficulties they had getting this exemplary experiment published.

Clearly, the Tennessee study was a major disappointment. How could preschool have no lasting effects for disadvantaged children?

Having participated in several research reviews on this topic (e.g., Chambers, Cheung, & Slavin, 2016), as well as some studies of my own, I have several observations to make.

Although this may have been the first large, randomized evaluation of a state-funded preschool program in the U.S., there have been many related studies that have had the same results. These include a large, randomized study of 5000 children assigned to Head Start or not (Puma et al., 2010), which also found positive outcomes at the end of the pre-K year, but only scattered lasting effects after pre-K. Very similar outcomes (positive pre-k outcomes with little or no lasting impact) have been found in a randomized evaluation of a national program called Sure Start in England (Melhuish, Belsky, & Leyland, 2010), and one in Australia (Claessens & Garrett, 2014).

Ironically, the Perry Preschool study itself failed to find lasting impacts, until students were in high school. That is, its outcomes were similar to those of the Tennessee, Head Start, Sure Start, and Australian studies, for the first 12 years of the study. So I suppose it is possible that someday, the participants in the Tennessee study will show a major benefit of having attended preschool. However, this seems highly doubtful.

It is important to note that some large studies of preschool attendance do find positive and lasting effects. However, these are invariably matched, non-experimental studies of children who happened to attend preschool, compared to others who did not. The problem with such studies is that it is essentially impossible to statistically control for all the factors that would lead parents to enroll their child in preschool, or not to do so. So lasting effects of preschool may just be lasting effects of having the good fortune to be born into the sort of family that would enroll its children in preschool.

What Should We Do if Preschool is Not Magic?

Let’s accept for the moment the hard (likely) reality that one year of preschool is not magic, and is unlikely to have lasting effects of the kind reported by the Perry Preschool study (and no other randomized studies.) Do we give up?

No.  I would argue that rather than considering preschool magic-or-nothing, we should think of it the same way we think about any other grade in school. That is, a successful school experience should not be one terrific year, but fourteen years (pre-k to 12) of great instruction using proven programs and practices.

First comes the preschool year itself, or the two year period including pre-k and kindergarten. There are many programs that have been shown in randomized studies to be successful over that time span, in comparison to control groups of children who are also in school (see Chambers, Cheung, & Slavin, 2016). Then comes reading instruction in grades K-1, where randomized studies have also validated many whole-class, small group, and one-to-one tutoring methods (Inns et al., 2018). And so on. There are programs proven to be effective in randomized experiments, at least for reading and math, for every grade level, pre-k to 12.

The time has long passed since all we had in our magic hat was preschool. We now have quite a lot. If we improve our schools one grade at a time and one subject at a time, we can see accumulating gains, ones that do not require waiting for miracles. And then we can work steadily toward improving what we can offer children every year, in every subject, in every type of school.

No one ever built a cathedral by waving a wand. Instead, magnificent cathedrals are built one stone at a time. In the same way, we can build a solid structure of learning using proven programs every year.

References

Baron, J. (2018). Large randomized controlled trial finds state pre-k program has adverse effects on academic achievement. Straight Talk on Evidence. Retrieved from http://www.straighttalkonevidence.org/2018/07/16/large-randomized-controlled-trial-finds-state-pre-k-program-has-adverse-effects-on-academic-achievement/

Chambers, B., Cheung, A., & Slavin, R. (2016). Literacy and language outcomes of balanced and developmental-constructivist approaches to early childhood education: A systematic review. Educational Research Review 18, 88-111.

Claessens, A., & Garrett, R. (2014). The role of early childhood settings for 4-5 year old children in early academic skills and later achievement in Australia. Early Childhood Research Quarterly, 29, (4), 550-561.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Lipsey, Farran, & Durkin (2018). Effects of the Tennessee Prekindergarten Program on children’s achievement and behavior through third grade. Early Childhood Research Quarterly. https://doi.org/10.1016/j.ecresq.2018.03.005

Melhuish, E., Belsky, J., & Leyland, R. (2010). The impact of Sure Start local programmes on five year olds and their families. London: Jessica Kingsley.

Puma, M., Bell, S., Cook, R., & Heid, C. (2010). Head Start impact study: Final report.  Washington, DC: U.S. Department of Health and Human Services.

Schweinhart, L. J., Barnes, H. V., & Weikart, D. P. (1993). Significant benefits: The High/Scope Perry Preschool study through age 27 (Monographs of the High/Scope Educational Research Foundation No. 10) Ypsilanti, MI: High/Scope Press.

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.