# How Much Have Students Lost in The COVID-19 Shutdowns?

Everyone knows that school closures due to the COVID-19 pandemic are having a serious negative impact on student achievement, and that this impact is sure to be larger for disadvantaged students than for others. However, how large will the impact turn out to be? This is not a grim parlor game for statisticians, but could have real meaning for policy and practice. If the losses turn out to be modest comparable to the “summer slide” we are used to (but which may not exist), then one might argue that when schools open, they might continue where they left off, and students might eventually make up their losses, as they do with summer slide. If, on the other hand, losses are very large, then we need to take emergency action.

Some researchers have used data from summer losses and from other existing data on, for example, teacher strikes, to estimate COVID losses (e.g., Kuhfeld et al., 2020). But now we have concrete evidence, from a country similar to the U.S. in most ways.

A colleague came across a study that has, I believe, the first actual data on this question. It is a recent study from Belgium (Maldonado & DeWitte, 2020) that assessed COVID-19 losses among Dutch-speaking students in that country.

The researchers obtained end-of-year test scores from all sixth graders who attend publicly-funded Catholic schools, which are attended by most students in Dutch-speaking Belgium. Sixth grade is the final year of primary school, and while schools were mostly closed from March to June due to COVID, the sixth graders were brought back to their schools in late May to prepare for and take their end-of primary tests. Before returning, the sixth graders had missed about 30% of the days in their school year. They were offered on-line teaching at home, as in the U.S.

The researchers compared the June test scores to those of students in the same schools in previous years, before COVID. After adjustments for other factors, students scored an effect size of -0.19 in mathematics, and -0.29 in Dutch (reading, writing, language). Schools serving many disadvantaged students had significantly larger losses in both subjects; inequality within the schools increased by 17% in mathematics and 20% in Dutch, and inequality between schools increased by 7% in math and 18% in Dutch.

There is every reason to expect that the situation in the U.S. will be much worse than that in Belgium. Most importantly, although Belgium had one of the worst COVID-19 death rates in the world, it has largely conquered the disease by now (fall), and its schools are all open. In contrast, most U.S. schools are closed or partially closed this fall. Students are usually offered remote instruction, but many disadvantaged students lack access to technology and supervision, and even students who do have equipment and supervision do not seem to be learning much, according to anecdotal reports.

In many U.S. schools that have opened fully or partially, outbreaks of the disease are disrupting schooling, and many parents are refusing to send their children to school. Although this varies greatly by regions of the U.S., the average American student is likely to have missed several more effective months of in-person schooling by the time schools return to normal operation.

But even if average losses turn out to be no worse than those seen in Belgium, the consequences are terrifying, for Belgium as well as for the U.S. and other COVID-inflicted countries.

Effect sizes of -0.19 and -0.29 are very large. From the Belgian data on inequality, we might estimate that for disadvantaged students (those in the lowest 25% of socioeconomic status), losses could have been -0.29 in mathematics and -0.39 in Dutch. What do we have in our armamentarium that is strong enough to overcome losses this large?

In a recent blog, I compared average effect sizes from studies of various solutions currently being proposed to remedy students’ losses from COVID shutdowns: Extended school days, after-school programs, summer school, and tutoring. Only tutoring, both one-to-one and one-to-small group, in reading and mathematics, had an effect size larger than +0.10. In fact, there are several one-to-one and one-to-small group tutoring models with effect sizes of +0.40 or more, and averages are around +0.30. Research in both reading and mathematics has shown that well-trained teaching assistants using structured tutoring materials or software can obtain outcomes as good as those obtained by certified teachers as tutors. On the basis of these data, I’ve been writing about a “Marshall Plan” to hire thousands of tutors in every state to provide tutoring to students scoring far below grade level in reading and math, beginning with elementary reading (where the evidence is strongest).

I’ve also written about national programs in the Netherlands and in England to provide tutoring to struggling students. Clearly, we need a program of this kind in the U.S. And if our scores are like the Belgian scores, we need it as quickly as possible. Students who have fallen far below grade level cannot be left to struggle without timely and effective assistance, powerful enough to bring them at least to where they would have been without the COVID school closures. Otherwise, these students are likely to lose motivation, and to suffer lasting damage. An entire generation of students, harmed through no fault of their own, cannot be allowed to sink into failure and despair.

References

Kuhfeld, M., Soland, J., Tarasawa, B., Johnson, A., Ruzek, E., & Liu, J. (2020). Projecting the potential impacts of COVID-19 school closures on academic achievement. (EdWorkingPaper: 20-226). Retrieved from Annenberg Institute at Brown University: https://doi.org/10.26300/cdrv-yw05

Maldonado, J. E., & DeWitte, K. (2020). The effect of school closures on standardized student test outcomes.Leuven, Belgium: University of Leuven.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

# Extraordinary Gains: Making Them Last

One of the great frustrations of evidence-based reform in education is that while we do have some interventions that have a strong impact on students’ learning, these outcomes usually fade over time. The classic example is intensive, high-quality preschool programs. There is no question about the short-term impacts of quality preschool, but after fifty years, the Perry Preschool study remains the only case in which a randomized experiment found long-term positive impacts of preschool. I think the belief in the Perry Preschool’s long-term impacts conditioned many of us to expect amazing long-term impacts of early interventions of all kinds, but the Perry Preschool evaluation was flawed in several ways, and later randomized studies such as the Tennessee Voluntary Prekindergarten Program do not find such lasting impacts. There have been similar difficulties documenting long-term impacts of the Reading Recovery tutoring program. I have been looking at research on summer school (Neitzel et al., 2020), and found a few summer programs for kindergarteners and first graders that had exceptional impacts on end-of-summer reading effects, but these had faded by the following spring.

But what if we knew, as the evidence clearly suggests, that one year of Perry Preschool or 60 lessons of Reading Recovery or seven weeks of intensive reading summer school was not sufficient to ensure long-lasting gains in achievement? What could we do to see that successful investments in intensive early interventions are built upon in subsequent years, so that formerly at-risk students not only maintain what they learned, but continue afterwards to make exceptional gains?

Clearly, we could build on early gains by continuing to provide intensive intervention every year, if that is what is needed, but that would be extremely expensive. Instead, imagine that each school had within it a small group of teachers and teacher assistants, whose job was to provide initial tutoring for students at risk, and then to monitor students’ progress and to strategically intervene to keep students on track. For the moment, I’ll call them an Excellence in Learning Team (XLT). This team would keep close track of the achievement of all at-risk and formerly at-risk students on frequent assessments, at least in reading and math. These staff members would track students’ trajectories toward grade level performance. If students fall off of that trajectory, members of the XLT would provide tutoring to the students, as long as necessary. My assumption is that a student who made brilliant progress with 60 tutoring sessions, for example, would not need another 60 sessions each year to stay on track toward grade level, but that perhaps 10 or 20 sessions would be sufficient.

The XLT would need effective, targeted tools to quickly and efficiently help students whose progress is stumbling. For example, XLT tutors might have available computer-assisted tutoring modules to assist students who, for example, mastered phonics, but are having difficulty with fluency, or multi-syllabic words, or comprehension of narrative or factual text. In mathematics, they might have specific computer-assisted tutoring modules on place value, fractions, or word problems. The idea is precision and personalization, so that the time of every XLT member is used to maximum effect. From the students’ perspective, assistance from the XLT is not a designation (like special or remedial education), but rather time-limited assistance to enable all students to achieve ambitious and challenging goals.

XLT, would be most effective, I believe, if students have started with intensive tutoring, intensive summer school, or other focused interventions that can bring about rapid progress. This is essential early in students’ progression. Rapid progress at the outset not only sets students up for success, in an academic sense, but it also convinces the student and his or her teachers that he or she is capable of extraordinary progress. Such confidence is crucial.

As an analogy to what I am describing here, consider how you cook a stew. You first bring the stew to a boil, and then simmer for a long time. If you only brought the stew to a boil and then turned off the stove, the stew would never cook. If you only set the stove on simmer, but did not first bring the stew to a boil, it might take hours to cook, if it ever did. It is the sequence of intense energy followed by less intense but lengthy support that does the job. Or consider a rocket to the moon, which needs enormous energy to reach escape velocity, followed by continued but less intense energy to complete the trip.  In education, high-quality preschool or tutoring or intensive summer school can play the part of the boil, but this needs to be followed by long-term, lower-intensity, precisely targeted support.

I would love to see a program of research designed to figure out how to implement long-term support to enable at-risk students to experience rapid success and then build on that success for many years. This is how we will finally leverage our demonstrated ability to make big differences in intensive early intervention, by linking it to multi-year, life-changing services that ensure students’ success in the long term, where it really matters.

References

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (2020). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at *www.bestevidence.org. Manuscript submitted for publication. *This new review of research on elementary programs for struggling readers had to be taken down because it is under review at a journal.  For a copy of the current draft, contact Amanda Neitzel (aneitzel@jhu.edu).

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

# Preschool: A Step, Not a Journey

“A journey of a thousand miles begins with a single step.”

So said Lau Tzi (or Lau Tzu), the great Chinese scholar who lived in the 6th century BC.

For many years, especially since the extraordinary long-term outcomes of the Perry Preschool became known, many educators have seen high-quality preschool as an essential “first step” in a quality education. Truly, a first step in a journey of a thousand miles. Further, due to the Perry Preschool findings, educators, researchers, and policy makers have maintained that quality preschool is not only the first step in a quality education, but it is the most important, capable of making substantial differences in the lives of disadvantaged students.

I believe, based on the evidence, that high-quality preschool helps students enter kindergarten and, perhaps, first grade, with important advantages in academic and social skills. It is clear that quality preschool can provide a good start, and for this reason, I’d support investments in providing the best preschool experiences we can afford.

But the claims of most preschool advocates go far beyond benefits through kindergarten. We have been led to expect benefits that last throughout children’s lives.

Would that this were so, but it is not. The problem is that randomized studies rarely find long-term impacts. In such studies, children are randomly assigned to receive specific, high-quality preschool services or to serve in a control group, in which children may remain at home or may receive various daycare or preschool experiences of varying quality. In randomized long-term studies comparing students randomly assigned to preschool or business as usual, the usual pattern of findings shows positive effects on many measures at the end of the preschool year, fading effects at the end of kindergarten, and no differences in later years. One outstanding example is the Tennessee Voluntary Prekindergarten Program (Lipsey, Farran, & Durkin, 2018). A national study of Head Start by Puma, Bell, Cook, & Heid (2010) found the same pattern, as did randomized studies in England (Melhuish et al., 2010) and Australia (Claessens & Garrett, 2014). Reviews of research routinely identify this consistent pattern (Chambers, Cheung, & Slavin, 2017; Camilli et al., 2009; Melhuish et al., 2010).

So why do so many researchers and educators believe that there are long-term positive effects of preschool? There are two answers. One is the Perry Preschool, and the other is the use of matched rather than randomized study designs.

The Perry Preschool study (Schweinhart & Weikart, 1997) did use a randomized design, but it had many features that made it an interesting pilot rather than a conclusive demonstration of powerful and scalable impacts. First, the Perry Preschool study had a very small sample (initially, 123 students in a single school in Ypsilanti, Michigan). It allowed deviations from random assignment, such as assigning children whose mothers worked to the control group. It provided an extraordinary level of services, never intended to be broadly replicable. Further, the long-term effects were never seen on elementary achievement, but only appeared when students were in secondary school. It seems unlikely that powerful impacts could be seen after there were no detectable impacts in all of elementary school. No one can fully explain what happened, but it is important to note that no one has replicated anything like what the Perry Preschool did, in all the years since the program was implemented in 1962-1967.

With respect to matched study designs, which do sometimes find positive longitudinal effects, a likely explanation is that with preschool children, matching fails to adequately control for initial differences. Families that enroll their four-year-olds in preschool tend, on average, to be more positively oriented toward learning and more eager to promote their children’s academic success. Well-implemented matched designs in the elementary and secondary grades invariably control for prior achievement, and this usually does a good job of equalizing matched samples. With four-year-olds, however, early achievement or IQ tests are not very reliable or well-correlated with outcomes, so it is impossible to know how much matching has equalized the groups on key variables.

Preparing for a Journey

Lao Tzi’s observation reminds us that any great accomplishment is composed of many small, simple activities. Representing a student’s educational career as a journey, this fits. One grand intervention at one point in that journey may be necessary, but it is not sufficient to ensure the success of the journey. In the journey of education, it is surely important to begin with a positive experience, one that provides children with a positive orientation toward school, skills needed to get along with teachers and classmates, knowledge about how the world works, a love for books, stories, and drama, early mathematical ideas, and much more. This is the importance of preschool. Yet it is not enough. Major make-or-break objectives lie in the future. In the years after preschool, students must learn to read proficiently, they must learn basic concepts of mathematics, and they must continue to build social-emotional skills for the formal classroom setting. In the upper elementary grades, they must learn to use their reading and math skills to learn to write effectively, and to learn science and social studies. Then they must make a successful transition to master the challenges of secondary school, leading to successful graduation and entry into valued careers or post-secondary education. Each of these accomplishments, along with many others, requires the best teaching possible, and each is as important and as difficult to achieve for every child as is success in preschool.

A journey of a thousand miles may begin with a single step, but what matters is how the traveler negotiates all the challenges between the first step and the last one. This is true of education. We need to find effective and replicable methods to maximize the possibility that every student will succeed at every stage of the learning process. This can be done, and every year our profession finds more and better ways to improve outcomes at every grade level, in every subject. Preschool is only the first of a series of opportunities to enable all children to reach challenging goals. An important step, to be sure, but not the whole journey.

###### Photo courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action.

References

Camilli, G., Vargas, S., Ryan, S., & Barnett, S. (2009). Meta-analysis of the effects of early education interventions on cognitive and social development. Teachers College Record, 112 (3), 579-620.

Chambers, B., Cheung, A., & Slavin, R.E. (2016) Literacy and language outcomes of comprehensive and developmental-constructivist approaches to early childhood education: A systematic review. Educational Research Review, 18, 88-111..

Claessens, A., & Garrett, R. (2014). The role of early childhood settings for 4-5 year old children in early academic skills and later achievement in Australia. Early Childhood Research Quarterly, 29, (4), 550-561.

Lipsey, M., Farran, D., & Durkin, K. (2018). Effects of the Tennessee Prekindergarten Program on children’s achievement and behavior through third grade. Early Childhood Research Quarterly, 45 (4), 155-176.

Melhuish, E., Belsky, J., & Leyland, R. (2010). The impact of Sure Start local programmes on five year olds and their families. London: Department for Education.

Puma, M., Bell, S., Cook, R., & Heid, C. (2010). Head Start impact study: Final report. Washington, DC: U.S. Department of Health and Human Services.

Schweinhart, L. J., & Weikart, D. P. (1997). Lasting differences: The High/Scope Preschool curriculum comparison study through age 23 (Monographs of the High/Scope Educational Research Foundation No. 12) Ypsilanti, MI: High/Scope Press.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

# Preschool is Not Magic. Here’s What Is.

If there is one thing that everyone knows about policy-relevant research in education, it is this: Participation in high-quality preschool programs (at age 4) has substantial and lasting effects on students’ academic and life success, especially for students from disadvantaged homes. The main basis for this belief is the findings of the famous Perry Preschool program, which randomly assigned 128 disadvantaged youngsters in Ypsilanti, Michigan, to receive intensive preschool services or not to receive these services. The Perry Preschool study found positive effects at the end of preschool, and long-term positive impacts on outcomes such as high school graduation, dependence on welfare, arrest rates, and employment (Schweinhart, Barnes, & Weikart, 1993).

But prepare to be disappointed.

Recently, a new study has reported a very depressing set of outcomes. Lipsey, Farran, & Durkin (2018) published a large, randomized study evaluating Tennessee’s statewide preschool program. 2990 four year olds were randomly assigned to participate in preschool, or not. As in virtually all preschool studies, children who were randomly assigned to preschool scored much better than those who were assigned to the control group. But these results diminished in kindergarten, and by first grade, no positive effects could be detected. By third grade, the control group actually scored significantly higher than the former preschool students in math and science, and non-significantly higher in reading!

Jon Baron of the Laura and John Arnold Foundation wrote an insightful commentary on this study, noting that when such a large, well-done, long-term, randomized study is reported, we have to take the results seriously, even if they disagree with our most cherished beliefs. At the end of Baron’s brief summary was a commentary by Dale Farran and Mark Lipsey, two the study’s authors, telling the story of the hostile reception to their paper in the early childhood research community and the difficulties they had getting this exemplary experiment published.

Clearly, the Tennessee study was a major disappointment. How could preschool have no lasting effects for disadvantaged children?

Having participated in several research reviews on this topic (e.g., Chambers, Cheung, & Slavin, 2016), as well as some studies of my own, I have several observations to make.

Although this may have been the first large, randomized evaluation of a state-funded preschool program in the U.S., there have been many related studies that have had the same results. These include a large, randomized study of 5000 children assigned to Head Start or not (Puma et al., 2010), which also found positive outcomes at the end of the pre-K year, but only scattered lasting effects after pre-K. Very similar outcomes (positive pre-k outcomes with little or no lasting impact) have been found in a randomized evaluation of a national program called Sure Start in England (Melhuish, Belsky, & Leyland, 2010), and one in Australia (Claessens & Garrett, 2014).

Ironically, the Perry Preschool study itself failed to find lasting impacts, until students were in high school. That is, its outcomes were similar to those of the Tennessee, Head Start, Sure Start, and Australian studies, for the first 12 years of the study. So I suppose it is possible that someday, the participants in the Tennessee study will show a major benefit of having attended preschool. However, this seems highly doubtful.

It is important to note that some large studies of preschool attendance do find positive and lasting effects. However, these are invariably matched, non-experimental studies of children who happened to attend preschool, compared to others who did not. The problem with such studies is that it is essentially impossible to statistically control for all the factors that would lead parents to enroll their child in preschool, or not to do so. So lasting effects of preschool may just be lasting effects of having the good fortune to be born into the sort of family that would enroll its children in preschool.

What Should We Do if Preschool is Not Magic?

Let’s accept for the moment the hard (likely) reality that one year of preschool is not magic, and is unlikely to have lasting effects of the kind reported by the Perry Preschool study (and no other randomized studies.) Do we give up?

No.  I would argue that rather than considering preschool magic-or-nothing, we should think of it the same way we think about any other grade in school. That is, a successful school experience should not be one terrific year, but fourteen years (pre-k to 12) of great instruction using proven programs and practices.

First comes the preschool year itself, or the two year period including pre-k and kindergarten. There are many programs that have been shown in randomized studies to be successful over that time span, in comparison to control groups of children who are also in school (see Chambers, Cheung, & Slavin, 2016). Then comes reading instruction in grades K-1, where randomized studies have also validated many whole-class, small group, and one-to-one tutoring methods (Inns et al., 2018). And so on. There are programs proven to be effective in randomized experiments, at least for reading and math, for every grade level, pre-k to 12.

The time has long passed since all we had in our magic hat was preschool. We now have quite a lot. If we improve our schools one grade at a time and one subject at a time, we can see accumulating gains, ones that do not require waiting for miracles. And then we can work steadily toward improving what we can offer children every year, in every subject, in every type of school.

No one ever built a cathedral by waving a wand. Instead, magnificent cathedrals are built one stone at a time. In the same way, we can build a solid structure of learning using proven programs every year.

References

Chambers, B., Cheung, A., & Slavin, R. (2016). Literacy and language outcomes of balanced and developmental-constructivist approaches to early childhood education: A systematic review. Educational Research Review 18, 88-111.

Claessens, A., & Garrett, R. (2014). The role of early childhood settings for 4-5 year old children in early academic skills and later achievement in Australia. Early Childhood Research Quarterly, 29, (4), 550-561.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Lipsey, Farran, & Durkin (2018). Effects of the Tennessee Prekindergarten Program on children’s achievement and behavior through third grade. Early Childhood Research Quarterly. https://doi.org/10.1016/j.ecresq.2018.03.005

Melhuish, E., Belsky, J., & Leyland, R. (2010). The impact of Sure Start local programmes on five year olds and their families. London: Jessica Kingsley.

Puma, M., Bell, S., Cook, R., & Heid, C. (2010). Head Start impact study: Final report.  Washington, DC: U.S. Department of Health and Human Services.

Schweinhart, L. J., Barnes, H. V., & Weikart, D. P. (1993). Significant benefits: The High/Scope Perry Preschool study through age 27 (Monographs of the High/Scope Educational Research Foundation No. 10) Ypsilanti, MI: High/Scope Press.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

# Little Sleepers: Long-Term Effects of Preschool

In education research, a “sleeper effect” is not a way to get all of your preschoolers to take naps. Instead, it is an outcome of a program that appears not immediately after the end of the program, but some time afterwards, usually a year or more. For example, the mother of all sleeper effects was the Perry Preschool study, which found positive outcomes at the end of preschool but no differences throughout elementary school. Then positive follow-up outcomes began to show up on a variety of important measures in high school and beyond.

Sleeper effects are very rare in education research. To see why, consider a study of a math program for third graders that found no differences between program and control students at the end of third grade, but then a large and significant difference popped up in fourth grade or later. Long-term effects of effective programs are often seen, but how can there be long-term effects if there are no short-term effects on the way? Sleeper effects are so rare that many early childhood researchers have serious doubts about the validity of the long-term Perry Preschool findings.

I was thinking about sleeper effects recently because we have recently added preschool studies to our Evidence for ESSA website. In reviewing the key studies, I was once again reading an extraordinary 2009 study by Mark Lipsey and Dale Farran.

The study randomly assigned Head Start classes in rural Tennessee to one of three conditions. Some were assigned to use a program called Bright Beginnings, which had a strong pre-literacy focus. Some were assigned to use Creative Curriculum, a popular constructive/developmental curriculum with little emphasis on literacy. The remainder were assigned to a control group, in which teachers used whatever methods they ordinarily used.

Note that this design is different from the usual preschool studies frequently reported in the newspaper, which compare preschool to no preschool. In this study, all students were in preschool. What differed is only how they were taught.

The results immediately after the preschool program were not astonishing. Bright Beginnings students scored best on literacy and language measures (average effect size = +0.21 for literacy, +0.11 for language), though the differences were not significant at the school level. There were no differences at all between Creative Curriculum and control schools.

Where the outcomes became interesting was in the later years. Ordinarily in education research, outcomes measured after the treatments have finished diminish over time. In the Bright Beginnings/Creative Curriculum study the outcomes were measured again when students were in third grade, four years after they left school. Most students could be located because the test was the Tennessee standardized test, so scores could be found as long as students were still in Tennessee schools.

On third grade reading, former Bright Beginnings students now scored significantly better than former controls, and the difference was statistically significant and substantial (effect size = +0.27).

In a review of early childhood programs at www.bestevidence.org, our team found that across 16 programs emphasizing literacy as well as language, effect sizes did not diminish in literacy at the end of kindergarten, and they actually doubled on language measures (from +0.08 in preschool to +0.15 in kindergarten).

If sleeper effects (or at least maintenance on follow-up) are so rare in education research, why did they appear in these studies of preschool? There are several possibilities.

The most likely explanation is that it is difficult to measure outcomes among four year-olds. They can be squirrely and inconsistent. If a pre-kindergarten program had a true and substantial impact on children’s literacy or language, measures at the end of preschool may not detect it as well as measures a year later, because kindergartners and kindergarten skills are easier to measure.

Whatever the reason, the evidence suggests that effects of particular preschool approaches may show up later than the end of preschool. This observation, and specifically the Bright Beginnings evaluation, may indicate that in the long run it matters a great deal how students are taught in preschool. Until we find replicable models of preschool, or pre-k to 3 interventions, that have long-term effects on reading and other outcomes, we cannot sleep. Our little sleepers are counting on us to ensure them a positive future.

This blog is sponsored by the Laura and John Arnold Foundation

# You Can Step Twice in the Same River: Systems in Education

You can never step twice in the same river.  At least that is what Greek philosopher Heraclitus said a long time ago, when Socrates was just a pup.  What he meant, of course, was that a river is constantly changing, for reasons large and small, so the river you waded across yesterday, or even a minute ago, is not the same one you wade in now.

This proposition is both obvious and wrong.  Sure, rivers are never 100% the same.  But does it matter?  Imagine, for example, that you somehow drained all the water out of a river.  Within a few days or weeks, it would entirely revive itself.  The reason is that a river is not a “thing.”  It is a system.  In other words, a river exists because there is a certain level of rainfall or groundwater or water from upstream, and then a certain topography (rivers are in low-lying areas, compared to surrounding land).  Those factors create the river, and as long as they exist, the river exists.  So when you wade into a river, you are wading into a system, and (sorry, Heraclitus) it is always the same system, because even if the river is higher or lower or muddier or clearer than usual, the system is always the same, unless something pretty dramatic happens upstream.

So why am I rattling on about rivers?  The point I hope to make is that genuine and lasting change in a school depends on changing the system in which the school operates, not just small parts of the school that will be swept away if the system stays unchanged.

Here’s what I mean from an education reform perspective.  Teachers’ daily practices in classrooms are substantially determined by powerful systems.  Whatever innovations you introduce in a school, no matter how effective in the short term, will be eliminated and forgotten if the rest of the system does not change.  For example, if a school implements a great new math program but does not solve classroom management or attendance problems, the school may not maintain its math reform.  Lasting change in math, for example, might require attending to diversity in achievement levels by providing effective tutoring or small-group assistance.  It might require providing eyeglasses to children who need them.  It might require improving reading performance as well as math.  It might require involving parents.  It might require constant monitoring of students’ math performance and targeted responses to solve problems.  It might require recruiting volunteers, or making good use of after school or summer time.  It might require mobilizing department heads or other math leaders within the school to support implementation, and to help maintain the effective program when (predictable) turmoil threatens it.  Policy changes at the district, state, and national levels may also help, but I’m just focusing for the moment on aspects of the system that an individual school or district can implement on its own.  Attending to all of these factors at once may increase the chances that in five or ten years, the effective program remains in place and stays effective, even if the original principal, department head, teachers, and special funds are no longer at the school.

It’s not that every school has to do all of these things to improve math performance over time, but I would argue that lasting impact will depend on some constellation of supports that change the system in which the math reform operates.  Otherwise, the longstanding system of the school will return, washing away the reform and taking the school back to its pre-reform behaviors and policies.

A problem in all of this is that educational development and research often work against systemic change.  In particular, academic researchers are rewarded for publishing articles, and it helps if they evaluate approaches that purely represent a given theory.  Pragmatically, an approach with many components may be more expensive and more difficult to put in place.  As a result, a lot of proven programs available to educators are narrow, focused on the main objective but not on the broader system of the school.  This may be fine in the short run, but in the long run the narrowly focused treatment may not maintain over time.

Seen as a system, a river will never change its course until the key elements that determine its course themselves change.  Unless that happens, we’ll always be stepping into the same river, over and over again, and getting the same results.

# Keep Up the Good Work (To Keep Up the Good Outcomes)

I just read an outstanding study that contains a hard but crucially important lesson. The study, by Woodbridge et al. (2014), evaluated a behavior management program for students with behavior problems. The program, First Step to Success, has been successfully evaluated many times. In the Woodbridge et al. study, 200 children in grades 1 to 3 with serious behavior problems were randomly assigned to experimental or control groups. On behavior and achievement measures, students in the experimental group scored much higher, with effect sizes of +0.44 to +0.87. Very impressive.

The researchers came back a year later to see if the outcomes were still there. Despite the substantial impacts seen at posttest, none of three prosocial/adaptive behavior measures, only one of three problem/maladaptive behaviors, and none of four academic achievement measures showed positive outcomes.

These findings were distressing to the researchers, but they contain a message. In this study, students passed from teachers who had been trained in the First Step method to teachers who had not. The treatment is well-established and inexpensive. Why should it ever be seen as a one-year intervention with a follow-up? Instead, imagine that all teachers in the school learned the program and all continued to implement it for many years. In this circumstance, it would be highly likely that the first-year positive impacts would be sustained and most likely improved over time.

Follow-up assessments are always interesting, and for interventions that are very expensive it may be crucial to demonstrate lasting impacts. But so often in education effective treatments can be maintained for many years, creating more effective school-wide environments and lasting impacts over time. Much as we might like to have one-shot treatments with long-lasting impacts, this does not correspond to the nature of children. The personal, family, or community problems that led children to have problems at a given point in time are likely to lead to problems in the future, too. But the solution is clear. Keep up the good work to keep up the good outcomes!

# Preschools and Evidence: A Child Will Lead Us

These are exciting times for people who care about preschool, for people who care about evidence, and especially for people who care about both. President Obama advocated for expanding high-quality preschool opportunities, Bill de Blasio, the new Mayor of New York City, is proposing new taxes on the wealthy for this purpose, and many states are moving toward universal preschool, or at least considering it. The recently passed Omnibus Budget had \$250 million in it for states to add to or improve their preschool programs.

What is refreshing is that after thirty years of agreement among researchers that it’s only high-quality preschools that have long-term positive effects, the phrase “high quality” has become part of the political dialogue. At a minimum, “high quality” means “not just underpaid, poorly educated preschool teachers.” But beyond this, “high quality” is easy to agree on, difficult to define.

This is where evidence comes in. We have good evidence about long-term effects of very high-quality preschool programs compared to no preschool, but identifying exceptionally effective, replicable programs (in comparison to run-of-the-mill preschools) has been harder.

The importance of identifying preschool programs that actually work is being recognized not only in academia, but in the general press as well. In the January 29 New York Times, Daniel Willingham and David Grissmer advocated local and national randomized experiments to find out what works in preschool. On January 30, Nicholas Kristof wrote about rigorous research supporting long-term effects of preschool. Two articles on randomized experiments in education would be a good week for Education Week, much less the New York Times.

With President Obama, John Boehner, and the great majority of Americans favoring expansion of high-quality preschools, this might be an extraordinarily good time for the U.S. Department of Education to sponsor development and evaluation of promising preschool models. At the current rate it will take a long time to get to universal pre-K, so in the meantime let’s learn what works.

The U. S. Department of Education did such a study several years ago called Preschool Curriculum Evaluation Research (PCER), in which various models were compared to ordinary preschool approaches. PCER found that only a few models did better than their control groups, but there was a clear pattern to the ones that did. These were models that provided teachers with extensive professional development and materials with a definite structure designed to build vocabulary, phonemic awareness, early math concepts, and school skills. They were not just early introduction of kindergarten, but focused on play, themes, rhymes, songs, stories, and counting games with specific purposes well understood by teachers.

In a new R & D effort, innovators might be asked to create new, practical models, perhaps based on the PCER findings, and evaluate them in rigorous studies. Within a few years, we’d have many proven approaches to preschool, ones that would justify the optimism being expressed by politicians of all stripes.

Historically, preschool is one of the few areas of educational practice or policy in which politicians and the public consider evidence to have much relevance. Perhaps if we get this one right, they will begin to wonder, if evidence is good for four year olds, why shouldn’t we consult it for the rest of education policy? If evidence is to become important for all of education, perhaps it has to begin with a small child leading us.

# Education Innovation: What It Is and Why We Need More of It

NOTE: This is a guest post from Jim Shelton, Assistant Deputy Secretary of the Office of Innovation and Improvement at the U.S. Department of Education.

Whether for reasons of economic growth, competitiveness, social justice or return on tax-payer investment, there is little rational argument over the need for significant improvement in U.S. educational outcomes. Further, it is irrefutable that the country has made limited improvement on most educational outcomes over the last several decades, especially when considered in the context of the increased investment over the same period. In fact, the total cost of producing each successful high school and college graduate has increased substantially over time instead of decreasing – creating what some argue is an inverted learning curve.

This analysis stands in stark contrast to the many anecdotes of teachers, schools and occasionally whole systems “beating the odds” by producing educational outcomes well beyond “reasonable” expectations. And, therein lies the challenge and the rationale for a very specific definition of educational innovation.

Education not only needs new ideas and inventions that shatter the performance expectations of today’s status quo; to make a meaningful impact, these new solutions must also “scale”, that is grow large enough, to serve millions of students and teachers or large portions of specific under-served populations. True educational innovations are those products, processes, strategies and approaches that improve significantly upon the status quo and reach scale.

Systems and programs at the local, state and national level, in their quest to improve, should be in the business of identifying and scaling what works. Yet, we traditionally have lacked the discipline, infrastructure, and incentives to systematically identify breakthroughs, vet them and support their broad adoption – a process referred to as field scans. Programs like the Department of Education’s Investing in Innovation Fund (i3) are designed as field scans; but i3 is tiny in comparison to both the need and the opportunity. To achieve our objectives, larger funding streams will need to drive the identification, evaluation, and adoption of effective educational innovations.

Field scans are only one of three connected pathways to education innovation, and they build on the most recognized pathway – basic and applied research. The time to produce usable tools and resources from this pathway can be long – just as in medicine where development and approval of new drugs and devices can take 12-15 years – but, with more and better leveraged resources, more focus, and more discipline, this pathway can accelerate our understanding of teaching and learning and production of performance enhancing practices and tools.

The third pathway focuses specifically on accelerating transformational breakthroughs, which require a different approach – directed development. Directed development processes identify cutting edge research and technology (technology generically, not specifically referring to software or hardware) and use a uniquely focused approach to accelerate the pace at which specific game changing innovations reach learners and teachers. Directed development within the federal government is most associated with DARPA (the Defense Advanced Research Projects Agency), which used this unique and aggressive model of R&D to produce technologies that underlie the Internet, GPS, and the unmanned aircraft (drone). Education presents numerous opportunities for such work. For example: (1) providing teachers with tools that identify each student’s needs and interests and match them to the optimal instructional resources or (2) cost-effectively achieving the 2 standard deviations of improvement that one-to-one human tutors generate. In 2010, the President’s Council of Advisors on Science and Technology recommended the creation of an ARPA for Education to pursue directed development in these and other areas of critical need and opportunity.

Each of these pathways -the field scan, basic and applied research and directed development – will be essential to improving and ultimately transforming learning from cradle through career. If done well, we will redefine “the possible” and reclaim American educational leadership while addressing inequity at home and abroad. At that point, we may be able to rely on a simpler definition of innovation:

“An innovation is one of those things that society looks at and says, if we make this part of the way we live and work, it will change the way we live and work.”

-Dean Kamen

-Jim Shelton

Note: The Office of Innovation and Improvement at the U.S. Department of Education administers more than 25 discretionary grant programs, including the Investing in Innovation Program, Charter Schools Program, and Technology in Education.

# A Commitment to Research Yields Improvements in Charter Network

Note: This is a guest post by Richard Barth, CEO and President of the KIPP FoundationMathematica
In his inaugural post for this blog, Robert Slavin wrote, “We did not manage our way to the moon, we invented our way to the moon.” I hear echoes of this statement throughout my work. Like other national charter school leaders, I am committed to making sure innovation can blossom and spread, throughout our own network and public schools nationwide.

But along with innovation we must insist on research and results. Across the 31 KIPP regions nationally, for example, we give schools autonomy to innovate as they see fit, as long as they can demonstrate that they are producing results for our students.

So how does a charter network like ours make sure schools are producing results? Not only do we assess our own schools on a regular basis, with publications like our yearly Report Card, but we also make a practice of inviting independent researchers to evaluate our results.

By building a solid body of evidence for what works–including independent reports about student achievement in our schools–we are able to set and maintain a high bar for achievement in our schools. The evidence then helps us build on what is working and to make adjustments where the
research has identified areas where we need to improve. For example, a study by Mathematica found that KIPP middle schools students make statistically significant gains in math and reading, even though students enter KIPP with lower average test scores than their neighboring peers in district schools. The same Mathematica report also found that KIPP schools are serving fewer special-education and Limited English Proficient (LEP), students than the average for neighboring district schools. This is a challenge for many charter schools and something we are making a priority throughout our network. So where we find we are doing well in both numbers of students served and their results -like the KIPP Academy Lynn near Boston, Mass., which is highlighted in a 2010 working paper from the National Bureau of Economic Research–we have an opportunity to zero in on what’s working and spread this news to our network and charter schools nationwide.

As more of our students move on to college, research can also help us keep tabs on how they are faring. We are just starting to examine the college completion rates of our students. In April we released our first-ever College Completion Report, which looked at the college graduation rates of KIPP’s earliest graduates from the mid-1990’s. Thirty-three percent of these KIPP students had finished college by their mid-twenties which is above the national average and four times the rate of their peers from low-income communities. This is far short of our goal of 75 percent, which is the average college completion rate for kids from affluent families.

By sharing these results we hope to encourage a national dialogue about how to improve college completion rates in America, especially among low income students. But we need school districts and charter school to start publicly reporting college completion rates fully–including those of eighth grade graduates, not just high school graduates or college freshmen, a practice that fails to give us a true picture.

This process of improvement is hard work; there’s no question. But by committing to research and accountability, we can set off a more vigorous and transparent conversation among public educators across the country about what we need to do to ensure success for all of our schools and students.

-Richard Barth

KIPP, the Knowledge Is Power Program, is a national network of free, open-enrollment, college-preparatory public charter schools. There are currently 109 KIPP schools in 20 states and the District of Columbia serving more than 32,000 students.