How do Textbooks Fit Into Evidence-Based Reform?

In a blog I wrote recently, “Evidence, Standards, and Chicken Feathers,” I discussed my perception that states, districts, and schools, in choosing textbooks and other educational materials, put a lot of emphasis on alignment with standards, and very little on evidence of effectiveness.  My colleague Steve Ross objected, at least in the case of textbooks.  He noted that it was very difficult for a textbook to prove its effectiveness, because textbooks so closely resemble other textbooks that showing a difference between them is somewhere between difficult and impossible.  Since the great majority of classrooms use textbooks (paper or digital) or sets of reading materials that collectively resemble textbooks, the control group in any educational experiment is almost certainly also using a textbook (or equivalents).  So as evidence becomes more and more important, is it fair to hold textbooks to such a difficult standard of evidence? Steve and I had an interesting conversation about this point, so I thought I would share it with other readers of my blog.


First, let me define a couple of key words.  Most of what schools purchase could be called commodities.  These include desks, lighting, carpets, non-electronic whiteboards, playground equipment, and so on. Schools need these resources to provide students with safe, pleasant, attractive places in which to learn. I’m happy to pay taxes to ensure that every child has all of the facilities and materials they need. However, no one should expect such expenditures to make a measurable difference in achievement beyond ordinary levels.

In contrast, other expenditures are interventions.  These include teacher preparation, professional development, innovative technology, tutoring, and other services clearly intended to improve achievement beyond ordinary levels.   Educators would generally agree that such investments should be asked to justify themselves by showing their effectiveness in raising achievement scores, since that is their goal.

By analogy, hospitals invest a great deal in their physical plants, furniture, lighting, carpets, and so on. These are all necessary commodities.   No one should have to go to a hospital that is not attractive, bright, airy, comfortable, and convenient, with plenty of parking.  These things may contribute to patients’ wellness in subtle ways, but no one would expect them to make major differences in patient health.  What does make a measurable difference is the preparation and training provided to the staff, medicines, equipment, and procedures, all of which can be (and are) constantly improved through ongoing research, development, and dissemination.

So is a textbook a commodity or an intervention?  If we accept that every classroom must have a textbook or its equivalent (such as a digital text), then a textbook is a commodity, just an ordinary, basic requirement for every classroom.  We would expect textbooks-as-commodities to be well written, up-to-date, attractive, and pedagogically sensible, and, if possible, aligned with state and national standards.  But it might be unfair and perhaps futile to expect textbooks-as-commodities to significantly increase student achievement in comparison to business as usual, because they are, in effect, business as usual.

If, somehow, a print or digital textbook, with associated professional development, digital add-ons, and so forth, turns out to be significantly more effective than alternative, state-of-the-art textbooks, then a textbook could also be considered an intervention, and marketed as such.  It would then be considered in comparison to other interventions that exist only, or primarily, to increase achievement beyond ordinary levels.

The distinction between commodities and interventions would be academic but for the appearance of the ESSA evidence standards.  The ESSA law requires that schools seeking school improvement funding select and implement programs that meet one of the top three standards (strong, moderate, or promising). It gives preference points on other federal grants, especially Title II (professional development), to applicants who promise to implement proven programs. Some states have applied more stringent criteria, and some have extended use of the standards to additional funding initiatives, including state initiatives.  These are all very positive developments. However, they are making textbook publishers anxious. How are they going to meet the new standards, given that their products are not so different from others now in use?

My answer is that I do not think it was the intent of the ESSA standards to forbid schools from using textbooks that lack evidence of effectiveness. To do so would be unrealistic, as it would wipe out at least 90% of textbooks.  Instead, the purpose of the ESSA evidence standards was to encourage and incentivize the use of interventions proven to be effective.  The concept, I think, was to assume that other funding (especially state and local funds) would support the purchase of commodities, including ordinary textbooks.  In contrast, the federal role was intended to focus on interventions to boost achievement in high-poverty and low-achieving schools.  Ordinary textbooks that are no more effective than any others are clearly not appropriate for those purposes, where there is an urgent need for approaches proven to have significantly greater impacts than methods in use today.

It would be a great step forward if federal, state, and local funding intended to support major improvements in student outcomes were held to tough standards of evidence.  Such programs should be eligible for generous and strategic funding from federal, state, and local sources dedicated to the enhancement of student outcomes.  But no one should limit schools in spending their funds on attractive desks, safe and fun playground equipment, and well-written textbooks, even though these necessary commodities are unlikely to accelerate student achievement beyond current expectations.

A Warm Welcome From Babe Ruth’s Home Town to the Registry of Efficacy and Effectiveness Studies (REES)

Every baseball season, many home runs are hit by various players across the major leagues. But in all of history, there is one home run that stands out for baseball fans. In the 1932 World Series, Babe Ruth (born in Baltimore!) pointed to the center field fence. He then hit the next pitch over that fence, exactly where he said he would.

Just 86 years later, the U.S. Department of Education, in collaboration with the Society for Research on Educational Effectiveness (SREE), launched a new (figurative) center field fence for educational evaluation. It’s called the Registry of Efficacy and Effectiveness Studies (REES). The purpose of REES is to ask evaluators of educational programs to register their research designs, measures, analyses, and other features in advance. This is roughly the equivalent of asking researchers to point to the center field fence, announcing their intention to hit the ball right there. The reason this matters is that all too often, evaluators carry out evaluations that do not produce desired, positive outcomes on some measures or some analyses. They then report outcomes only on the measures that did show positive outcomes, or they might use different analyses from those initially planned, or only report outcomes for a subset of their full sample. On this last point, I remember a colleague long ago who obtained and re-analyzed data from a large and important national study that studied several cities but only reported data for Detroit. In her analyses of data from the other cities, she found that the results the authors claimed were seen only in Detroit, not in any other city.

REES pre-registration will, over time, make it possible for researchers, reviewers, and funders to find out whether evaluators are reporting all of the findings and all of the analyses as they originally planned them.  I would assume that within a period of years, review facilities such as the What Works Clearinghouse will start requiring pre-registration before accepting studies for its top evidence categories. We will certainly do so for Evidence for ESSA. As pre-registration becomes common (as it surely will, if IES is suggesting or requiring it), review facilities such as WWC and Evidence for ESSA will have to learn how to use the pre-registration information. Obviously, minor changes in research designs or measures may be allowed, especially small changes made before posttests are known. For example, if some schools named in pre-registration are not in the posttest sample, the evaluators might explain that the schools closed (not a problem if this did not upset pretest equivalence), but if they withdrew for other reasons, reviewers would want to know why, and would insist that withdrawn schools be included in any intent-to-treat (ITT) analysis. Other fields, including much of medical research, have been using pre-registration for many years, and I’m sure REES and review facilities in education could learn from their experiences and policies.

What I find most heartening in REES and pre-registration is that it is an indication of how much and how rapidly educational research has matured in a short time. Ten years ago REES could not have been realistically proposed. There was too little high-quality research to justify it, and frankly, few educators or policy makers cared very much about the findings of rigorous research. There is still a long way to go in this regard, but embracing pre-registration is one way we say to our profession and ourselves that the quality of evidence in education can stand up to that in any other field, and that we are willing to hold ourselves accountable for the highest standards.


In baseball history, Babe Ruth’s “pre-registered” home run in the 1932 series is referred to as the “called shot.” No one had ever done it before, and no one ever did it again. But in educational evaluation, we will soon be calling our shots all the time. And when we say in advance exactly what we are going to do, and then do it, just as we promised, showing real benefits for children, then educational evaluation will take a major step forward in increasing users’ confidence in the outcomes.




How Tutor/Health Mentors Could Help Ensure Success for All Students

I’d like to introduce you to Janelle Wilson, a tutor/health mentor (THM) at a Baltimore elementary school.  She provides computer-assisted tutoring to groups of four to six second and third graders at a time, in seven daily forty-minute sessions. Another tutor/health monitor does similar work with grades k-1, and another, grades 4-5. But that’s not all they do.

As Ms. Wilson walks through the intermediate wing of the school, you notice something immediately.  She knows every kid, every teacher, and every parent she encounters. And they know and respect her.  As she walks down the hall; she greets kids by name, celebrating their successes in tutoring and gently teasing them.  But listen in on her conversation.  “Hey, Terrell! Super job on your math!  But wait a minute, where are your glasses?”  Terrell looks for them.  “Sorry, Ms. W.!” he says, “I left them in class.” “Well, go get them” says Ms. Wilson, “You can’t become the superstar I know you can be without your glasses!”

What Ms. Wilson does, beyond her role as a tutor, is to make sure that all students who need glasses, hearing aids, asthma medication, or other specialized accommodations, are consistently using them. She also keeps parents up-to-date to help them help their children succeed.


Ms. Wilson is not a teacher, not a school nurse, not a health aide, not a parent liaison, but she has aspects of all these roles.  A year ago, she was finishing her B.A. in theater at a local university.  But today, after intensive training and mentoring for her role, she is responsible for the unique educational and health needs of 140 students in grades 2-3, in partnership with their teachers, their parents, medical professionals, and others who care about the same kids she works with.  On any given day, she is tutoring about 35 of those students, but over time she will tutor or otherwise interact with many more.

Ms. Wilson is hard to catch, but finally you get a word with her. “What’s the difference between what you do and what teachers do?” you ask. Ms. Wilson smiles. “My job is to try to make sure that each child’s unique needs is being met. Teachers do a great job, but there are only so many hours in the day. I try to be an extra right arm for all of the teachers in grade 2 and 3, focused on making sure all students succeed at reading.  That is the most crucial task in the early grades. It is hard for a teacher with 25 or 30 students to make sure that every struggling reader is getting tutoring or wearing their eyeglasses or taking their medicine. I can help make sure that each child gets what he or she needs to be a successful reader. That means educational needs, especially tutoring, but also glasses, hearing aids, even asthma medication. If there is anything a child needs to succeed beyond classroom teaching, that’s my job!”

Ms. Wilson does not exist, and as far as I know, few if any educators anywhere do what I am describing. If Ms. Wilson’s role did exist, combining the use of proven tutoring approaches with a structured role in maintaining children’s health and well-being, she could make an enormous difference in increasing the achievement of struggling learners, and putting them on the path to success in school and beyond.

Beyond Tutoring

Constant readers may have noticed that I’ve been writing a lot in recent blogs about tutoring: One-to-one and one-to-small group, by teachers and by paraprofessionals.  This got started because I have been working with colleagues on quantitative syntheses of research on effective programs for students struggling with elementary reading (Inns et al., 2018), secondary reading (Baye et al., in press, 2018), and elementary math (Pellegrini et al., 2018). In every case, outcomes for tutoring, including tutoring by paraprofessionals and tutoring to groups of two to six students, produced achievement outcomes far larger than anything else.  Since then, I’ve been writing about ways to enhance the cost-effectiveness and practicality of tutoring.  I even described a state-wide plan to use cost-effective tutoring to substantially reduce gaps and accelerate achievement.

I’ve also written a lot about the importance of ensuring that all students in high-poverty schools receive, wear, and maintain eyeglasses, if they need them.  We have been working in Baltimore and Chicago on plans to do this.  What we have found is that it is not enough to give children glasses.  The key is getting students to wear them every day, to take care of them, and to replace them if they are lost or broken.  All of this requires that someone keep track of who needs glasses and who is wearing them (or not). Today, only teachers can do this, because they are the only people who see every child every day. But it is not reasonable to add one more task on top of everything else teachers have to do.

What if schools recruited paraprofessionals and trained them to be responsible not only for tutoring small groups of students, but also for making sure that those who need glasses get them, wear them, and take care of them? A teacher/health mentor (THM) could work with parents to get necessary permissions to receive vision testing, for example, and support and then work with the children they tutor to make sure they have and wear glasses. They might also attend to children who have hearing aids, or have to take medications, such as asthma inhalers.  These are not medical tasks, but just require good organization skills and most importantly, good relationships with children, parents, and teachers. Medical professionals would, of course, be needed to assess students’ vision, hearing, and medical needs to prescribe treatment, but for problems with vision, hearing, or asthma, for example, the medical solutions are inexpensive and straightforward, but ensuring that the solutions actually solve the problems takes 180 days a year of monitoring and coordinating. Who better to do this than someone like Ms. Wilson, who tutors many students, knows them and their parents well, and has the dedicated time to make sure that students are using their glasses or taking their medication, if that is what they need?

Tutor/heath mentors like Ms. Wilson could take responsibility for ensuring that students’ routine medical needs are being met as part of their work in the school, especially during times (such as the beginning and end of the school day) when tutoring is impractical.

THMs could not and should not replace either teachers or school nurses. Instead, their job would be to make sure that students receive and then actually utilize educational and medical services tailored to their needs that are most critical for reading success, to make sure that teachers’ educational efforts are not undermined by an inability to meet the specific idiosyncratic needs of individual children.

A THM providing computer-assisted tutoring to groups of 4 to 6 for 40 minutes a day should be able to teach 7 groups of 28 to 42 children a day. A school of 500 students could, therefore, tutor 20% of its students (100 students) on any given day with three THMs. These staff members would still have time to check on students who need health mentoring. Knowing the educational impact of tutoring, that’s very important work on its own terms, but adding simple health mentoring tasks to ensure the effectiveness of medical services adds a crucial dimension to the tutoring role.

I’m sure a lot of details and legalities would have to be worked out, but it seems possible to make effective use of inexpensive resources to ensure the educational and visual, auditory, and other health well-being of disadvantaged students. It certainly seems worth trying!


Succeeding Faster in Education

“If you want to increase your success rate, double your failure rate.” So said Thomas Watson, the founder of IBM. What he meant, of course, is that people and organizations thrive when they try many experiments, even though most experiments fail. Failing twice as often means trying twice as many experiments, leading to twice as many failures—but also, he was saying, many more successes.

Thomas Watson

In education research and innovation circles, many people know this quote, and use it to console colleagues who have done an experiment that did not produce significant positive outcomes. A lot of consolation is necessary, because most high-quality experiments in education do not produce significant positive outcomes. In studies funded by the Institute for Education Sciences (IES), Investing in Innovation (i3), and England’s Education Endowment Foundation (EEF), all of which require very high standards of evidence, fewer than 20% of experiments show significant positive outcomes.

The high rate of failure in educational experiments is often shocking to non-researchers, especially the government agencies, foundations, publishers, and software developers who commission the studies. I was at a conference recently in which a Peruvian researcher presented the devastating results of an experiment in which high-poverty, mostly rural schools in Peru were randomly assigned to receive computers for all of their students, or to continue with usual instruction. The Peruvian Ministry of Education was so confident that the computers would be effective that they had built a huge model of the specific computers used in the experiment and attached it to the Ministry headquarters. When the results showed no positive outcomes (except for the ability to operate computers), the Ministry quietly removed the computer statue from the top of their building.

Improving Success Rates

Much as I believe Watson’s admonition (“fail more”), there is another principle that he was implying, or so I expect: We have to learn from failure, so we can increase the rate of success. It is not realistic to expect government to continue to invest substantial funding in high-quality educational experiments if the success rate remains below 20%. We have to get smarter, so we can succeed more often. Fortunately, qualitative measures, such as observations, interviews, and questionnaires, are becoming required elements of funded research, facilitating finding out what happened so that researchers can find out what went wrong. Was the experimental program faithfully implemented? Were there unexpected responses toward the program by teachers or students?

In the course of my work reviewing positive and disappointing outcomes of educational innovations, I’ve noticed some patterns that often predict that a given program is likely or unlikely to be effective in a well-designed evaluation. Some of these are as follows.

  1. Small changes lead to small (or zero) impacts. In every subject and grade level, researchers have evaluated new textbooks, in comparison to existing texts. These almost never show positive effects. The reason is that textbooks are just not that different from each other. Approaches that do show positive effects are usually markedly different from ordinary practices or texts.
  2. Successful programs almost always provide a lot of professional development. The programs that have significant positive effects on learning are ones that markedly improve pedagogy. Changing teachers’ daily instructional practices usually requires initial training followed by on-site coaching by well-trained and capable coaches. Lots of PD does not guarantee success, but minimal PD virtually guarantees failure. Sufficient professional development can be expensive, but education itself is expensive, and adding a modest amount to per-pupil cost for professional development and other requirements of effective implementation is often the best way to substantially enhance outcomes.
  3. Effective programs are usually well-specified, with clear procedures and materials. Rarely do programs work if they are unclear about what teachers are expected to do, and helped to do it. In the Peruvian study of one-to-one computers, for example, students were given tablet computers at a per-pupil cost of $438. Teachers were expected to figure out how best to use them. In fact, a qualitative study found that the computers were considered so valuable that many teachers locked them up except for specific times when they were to be used. They lacked specific instructional software or professional development to create the needed software. No wonder “it” didn’t work. Other than the physical computers, there was no “it.”
  4. Technology is not magic. Technology can create opportunities for improvement, but there is little understanding of how to use technology to greatest effect. My colleagues and I have done reviews of research on effects of modern technology on learning. We found near-zero effects of a variety of elementary and secondary reading software (Inns et al., 2018; Baye et al., in press), with a mean effect size of +0.05 in elementary reading and +0.00 in secondary. In math, effects were slightly more positive (ES=+0.09), but still quite small, on average (Pellegrini et al., 2018). Some technology approaches had more promise than others, but it is time that we learned from disappointing as well as promising applications. The widespread belief that technology is the future must eventually be right, but at present we have little reason to believe that technology is transformative, and we don’t know which form of technology is most likely to be transformative.
  5. Tutoring is the most solid approach we have. Reviews of elementary reading for struggling readers (Inns et al., 2018) and secondary struggling readers (Baye et al., in press), as well as elementary math (Pellegrini et al., 2018), find outcomes for various forms of tutoring that are far beyond effects seen for any other type of treatment. Everyone knows this, but thinking about tutoring falls into two camps. One, typified by advocates of Reading Recovery, takes the view that tutoring is so effective for struggling first graders that it should be used no matter what the cost. The other, also perhaps thinking about Reading Recovery, rejects this approach because of its cost. Yet recent research on tutoring methods is finding strategies that are cost-effective and feasible. First, studies in both reading (Inns et al., 2018) and math (Pellegrini et al., 2018) find no difference in outcomes between certified teachers and paraprofessionals using structured one-to-one or one-to-small group tutoring models. Second, although one-to-one tutoring is more effective than one-to-small group, one-to-small group is far more cost-effective, as one trained tutor can work with 4 to 6 students at a time. Also, recent studies have found that tutoring can be just as effective in the upper elementary and middle grades as in first grade, so this strategy may have broader applicability than it has in the past. The real challenge for research on tutoring is to develop and evaluate models that increase cost-effectiveness of this clearly effective family of approaches.

The extraordinary advances in the quality and quantity of research in education, led by investments from IES, i3, and the EEF, have raised expectations for research-based reform. However, the modest percentage of recent studies meeting current rigorous standards of evidence has caused disappointment in some quarters. Instead, all findings, whether immediately successful or not, should be seen as crucial information. Some studies identify programs ready for prime time right now, but the whole body of work can and must inform us about areas worthy of expanded investment, as well as areas in need of serious rethinking and redevelopment. The evidence movement, in the form it exists today, is completing its first decade. It’s still early days. There is much more we can learn and do to develop, evaluate, and disseminate effective strategies, especially for students in great need of proven approaches.


Beyond the Spaghetti Bridge: Why Response to Intervention is Not Enough

I know an engineer at Johns Hopkins University who invented the Spaghetti Bridge Challenge. Teams of students are given dry, uncooked spaghetti and glue, and are challenged to build a bridge over a 500-millimeter gap. The bridge that can support the most weight wins.


Spaghetti Bridge tournaments are now held all over the world, and they are wonderful for building interest in engineering. But I don’t think any engineer would actually build a real bridge based on a winning spaghetti bridge prototype. Much as spaghetti bridges do resemble the designs of real bridges, there are many more factors a real engineer has to take into account: Weight of materials, tensile strength, flexibility (in case of high winds or earthquakes), durability, and so on.

In educational innovation and reform, we have lots of great ideas that resemble spaghetti bridges. That’s because they would probably work great if only their components were ideal. An example like this is Response to Intervention (RTI), or its latest version, Multi-Tiered Systems of Supports (MTSS). Both RTI and MTSS start with a terrific idea: Instead of just testing struggling students to decide whether or not to assign them to special education, provide them with high-quality instruction (Tier 1), supplemented by modest assistance if that is not sufficient (Tier 2), supplemented by intensive instruction if Tier 2 is not sufficient (Tier 3). In law, or at least in theory, struggling readers must have had a chance to succeed in high-quality Tier 1, Tier 2, and Tier 3 instruction before they can be assigned to special education.

The problem is that there is no way to ensure that struggling students truly received high-quality instruction at each tier level. Teachers do their best, but it is difficult to make up effective approaches from scratch. MTSS or RTI is a great idea, but their success depends on the effectiveness of whatever struggling students receive as Tier 1, 2, and 3 instruction.

This is where spaghetti bridges come in. Many bridge designs can work in theory (or in spaghetti), but whether or not a bridge really works in the real world depends on how it is made, and with what materials in light of the demands that will be placed on it.

The best way to ensure that all components of RTI or MTSS policy are likely to be effective is to select approaches for each tier that have themselves been proven to work. Fortunately, there is now a great deal of research establishing the effectiveness of programs, proven effective for struggling students that use whole-school or whole-class methods (Tier 1), one-to-small group tutoring (Tier 2), or one-to-one tutoring (Tier 3). Many of these tutoring models are particularly cost-effective because they successfully provide struggling readers with tutoring from well-qualified paraprofessionals, usually ones with bachelor’s degrees but not teaching certificates. Research on both reading and math tutoring has clearly established that such paraprofessional tutors, using structured models, have tutees who gain at least as much as do tutors who are certified teachers. This is important not only because paraprofessionals cost about half as much as teachers, but also because there are chronic teacher shortages in high-poverty areas, such as inner-city and rural locations, so certified teacher tutors may not be available at any cost.

If schools choose proven components for their MTSS/RTI models, and implement them with thought and care, they are sure to see enhanced outcomes for their struggling students. The concept of MTSS/RTI is sound, and the components are proven. How could the outcomes be less than stellar? And in addition to improved achievement for vulnerable learners, hiring many paraprofessionals to serve as tutors in disadvantaged schools could enable schools to attract and identify capable, caring young people with bachelor’s degrees to offer accelerated certification, enriching the local teaching force.

With a spaghetti bridge, a good design is necessary but not sufficient. The components of that design, its ingredients, and its implementation, determine whether the bridge stands or falls in practice. So it is with MTSS and RTI. An approach based on strong evidence of effectiveness is essential to enable these good designs achieve their goals.

First There Must be Love. Then There Must be Technique.

I recently went to Barcelona. This was my third time in this wonderful city, and for the third time I visited La Sagrada Familia, Antoni Gaudi’s breathtaking church. It was begun in the 1880s, and Gaudi worked on it from the time he was 31 until he died in 1926 at 74. It is due to be completed in 2026.

Every time I go, La Sagrada Familia has grown even more astonishing. In the nave, massive columns branching into tree shapes hold up the spectacular roof. The architecture is extremely creative, and wonders lie around every corner.


I visited a new museum under the church. At the entrance, it had a Gaudi quote:

First there must be love.

Then there must be technique.

This quote sums up La Sagrada Familia. Gaudi used complex mathematics to plan his constructions. He was a master of technique. But he knew that it all meant nothing without love.

In writing about educational research, I try to remind my readers of this from time to time. There is much technique to master in creating educational programs, evaluating them, and fairly summarizing their effects. There is even more technique in implementing proven programs in schools and classrooms, and in creating policies to support use of proven programs. But what Gaudi reminds us of is just as essential in our field as it was in his. We must care about technique because we care about children. Caring about technique just for its own sake is of little value. Too many children in our schools are failing to learn adequately. We cannot say, “That’s not my problem, I’m a statistician,” or “that’s not my problem, I’m a policymaker,” or “that’s not my problem, I’m an economist.” If we love children and we know that our research can help them, then it’s all of our problems. All of us go into education to solve real problems in real classrooms. That’s the structure we are all building together over many years. Building this structure takes technique, and the skilled efforts of many researchers, developers, statisticians, superintendents, principals, and teachers.

Each of us brings his or her own skills and efforts to this task. None of us will live to see our structure completed, because education keeps growing in techniques and capability. But as Gaudi reminds us, it’s useful to stop from time to time and remember why we do what we do, and for whom.

The Mill and The School


On a recent trip to Scotland, I visited some very interesting oat mills. I always love to visit medieval mills, because I find it endlessly fascinating how people long ago used natural forces and materials – wind, water, and fire, stone, wood, and metal – to create advanced mechanisms that had a profound impact on society.

In Scotland, it’s all about oat mills (almost everywhere else, it’s wheat). These grain mills date back to the 10th century. In their time, they were a giant leap in technology. A mill is very complicated, but at its heart are two big innovations. In the center of the mill, a heavy millstone turns on top of another. The grain is poured through a hole in the top stone for grinding. The miller’s most difficult task is to maintain an exact distance between the stones. A few millimeters too far apart and no milling happens. A few millimeters too close and the heat of friction can ruin the machinery, possibly causing a fire.

The other key technology is the water wheel (except in windmills, of course). The water mill is part of a system that involves a carefully controlled flow of water from a millpond, which the miller uses to provide exactly the right amount of water to turn a giant wooden wheel, which powers the top millstone.


The medieval grain mill is not a single innovation, but a closely integrated system of innovations. Millers learned to manage this complex technology in a system of apprenticeship over many years.

Mills enabled medieval millers to obtain far more nutrition from an acre of grain than was possible before. This made it possible for land to support many more people, and the population surged. The whole feudal system was built around the economics of mills, and mills thrived through the 19th century.

What does the mill have to with the school? Mills only grind well-behaved grain into well-behaved flour, while schools work with far more complex children, families, and all the systems that surround them. The products of schools must include joy and discovery, knowledge and skills.

Yet as different as they are, mills have something to teach us. They show the importance of integrating diverse systems that can then efficiently deliver desired outcomes. Neither a mill nor an effective school comes into existence because someone in power tells it to. Instead, complex systems, mills or schools, must be created, tested, adapted to local needs, and constantly improved. Once we know how to create, manage, and disseminate effective mills or schools, policies can be readily devised to support their expansion and improvement.

Important progress in societies and economies almost always comes about from development of complex, multi-component innovations that, once developed, can be disseminated and continuously improved. The same is true of schools. Changes in governance or large-scale policies can enhance (or inhibit) the possibility of change, but the reality of reform depends on creation of complex, integrated systems, from mills to ships to combines to hospitals to schools.

For education, what this means is that system transformation will come only when we have whole-school improvement approaches that are known to greatly increase student outcomes. Whole-school change is necessary because many individual improvements are needed to make big changes, and these must be carefully aligned with each other. Just as the huge water wheel and the tiny millstone adjustment mechanism and other components must work together in the mill, the key parts of a school must work together in synchrony to produce maximum impact, or the whole system fails to work as well as it should.

For example, if you look at research on proven programs, you’ll find effective strategies for school management, for teaching, and for tutoring struggling readers. These are all well and good, but they work so much better if they are linked to each other.

To understand this, first consider tutoring. Especially in the elementary grades, there is no more effective strategy. Our recent review of research on programs for struggling readers finds that well-qualified teaching assistants can be as effective as teachers in tutoring struggling readers, and that while one-to-four tutoring is less effective than one-to-one, it is still a lot more effective than no tutoring. So an evidence-oriented educator might logically choose to implement proven one-to-one and/or one-to-small group tutoring programs to improve school outcomes.

However, tutoring only helps the students who receive it, and it is expensive. A wise school administrator might reason that tutoring alone is not sufficient, but improving the quality of classroom instruction is also essential, both to improve outcomes for students who do not need tutoring and to reduce the number of students who do need tutoring. There is an array of proven classroom methods the principal or district might choose to improve student outcomes in all subjects and grade levels (see

But now consider students who are at risk because they are not attending regularly, or have behavior problems, or need eyeglasses but do not have them. Flexible school-level systems are necessary to ensure that students are in school, eager to learn, well-behaved, and physically prepared to succeed.

In addition, there is a need to have school principals and other leaders learn strategies for making effective use of proven programs. These would include managing professional development, coaching, monitoring implementation and outcomes of proven programs, distributed leadership, and much more. Leadership also requires jointly setting school goals with all school staff and monitoring progress toward these goals.

These are all components of the education “mill” that have to be designed, tested, and (if effective) disseminated to ever-increasing numbers of schools. Like the mill, an effective school design integrates individual parts, makes them work in synchrony, constantly assesses their functioning and output, and adjusts procedures when necessary.

Many educational theorists argue that education will only change when systems change. Ferocious battles rage about charters vs. ordinary public schools, about adopting policies of countries that do well on international tests, and so on. These policies can be important, but they are unlikely to create substantial and lasting improvement unless they lead to development and dissemination of proven whole-school approaches.

Effective school improvement is not likely to come about from let-a-thousand-flowers-bloom local innovation, nor from top-level changes in policy or governance. Sufficient change will not come about by throwing individual small innovations into schools and hoping they will collectively make a difference. Instead, effective improvement will take root when we learn how to reliably create effective programs for schools, implement them in a coordinated and planful way, find them effective, and then disseminate them. Once such schools are widespread, we can build larger policies and systems around their needs.

Coordinated, schoolwide improvement approaches offer schools proven strategies for increasing the achievement and success of their children. There should be many programs of this kind, among which schools and districts can choose. A school is not the same as mill, but the mill provides at least one image of how creating complex, integrated replicable systems can change whole societies and economies. We should learn from this and many other examples of how to focus our efforts to improve outcomes for all children.

