A Powerful Hunger for Evidence-Proven Technology

I recently saw a 1954 video of B. F. Skinner showing off a classroom full of eager students using teaching machines. In it, Skinner gave all the usual reasons that teaching machines were soon going to be far superior to ordinary teaching: They were scientifically made to enable students to experience constant success in small steps. They were adapted to students’ needs, so fast students did not need to wait for their slower classmates, and the slower classmates could have the time to solidify their understanding, rather than being whisked from one half-learned topic to the next, never getting a chance to master anything and therefore sinking into greater and greater failure.

Here it is 65 years later and “teaching machines,” now called computer-assisted instruction, are ubiquitous. But are they effective? Computers are certainly effective at teaching students to use technology, but can they teach the core curriculum of elementary or secondary schools? In a series of reviews in the Best Evidence Encyclopedia (BEE; www.bestevidence.org), my colleagues and I have reviewed research on the impacts of technology-infused methods on reading, mathematics, and science, in elementary and secondary schools. Here is a quick summary of my findings:

Mean Effect Sizes for Technology-Based Programs in Recent Reviews
Review Topic No. of Studies Mean Effect Size
Inns et al., in preparation Elementary Reading 23 +0.09
Inns et al., 2019 Struggling Readers 6 +0.06
Baye et al., 2018 Secondary Reading 23 -0.01
Pellegrini et al., 2019 Elementary Mathematics 14 +0.06

If you prefer “months of learning,” these are all about one month, except for secondary reading, which is zero. A study-weighted average across these reviews is an effect size of +0.05. That’s not nothing, but it’s not much. Nothing at all like what Skinner and countless other theorists and advocates have been promising for the past 65 years. I think that even the most enthusiastic fans of technology use in education are beginning to recognize that while technology may be useful in improving achievement on traditional learning outcomes, it has not yet had a revolutionary impact on learning of reading or mathematics.

How can we boost the impact of technology in education?

Whatever you think the effects of technology-based education might be for typical school outcomes, no one could deny that it would be a good thing if that impact were larger than it is today. How could government, the educational technology industry, researchers in and out of ed tech, and practicing educators work together to make technology applications more effective than they are now?

In order to understand how to proceed, it is important to acknowledge a serious problem in the world of ed tech today. Educational technology is usually developed by commercial companies. Like all commercial companies, they must serve their market. Unfortunately, the market for ed tech products is not terribly interested in the evidence supporting technology-based programs. Instead, they tend to pay attention to sales reps or marketing, or they seek opinions from their friends and colleagues, rather than looking at evidence. Technology decision makers often value attractiveness, ease of use, low cost, and current trends or fads, over evidence (see Morrison, Ross & Cheung, 2019, for documentation of these choice strategies).

Technology providers are not uncaring people, and they want their products to truly improve outcomes for children. However, they know that if they put a lot of money into developing and researching an innovative approach to education that happens to use technology, and their method requires a lot of professional development to produce substantially positive effects, their programs might be considered too expensive, and less expensive products that ask less of teachers and other educators would dominate the sector. These problems resemble those faced by textbook publishers, who similarly may have great ideas to increase the effectiveness of their textbooks or to add components that require professional development. Textbook designers are prisoners of their markets just as technology developers are.

The solution, I would propose, requires interventions by government designed to nudge education markets toward use of evidence. Government (federal, state, and local) has a real interest in improving outcomes of education. So how could government facilitate the use of technology-based approaches that are known to enhance student achievement more than those that exist today?

blog_5-24-18_DistStudents_500x332

How government could promote use of proven technology approaches

Government could lead the revolution in educational technology that market-driven technology developers cannot do on their own. It could do this by emphasizing two main strategies: providing funding to assist technology developers of all kinds (e.g., for-profit, non-profit, or universities), providing encouragement and incentives to motivate schools, districts, and states to use programs proven effective in rigorous research, and funding development, evaluation, and dissemination of proven technology-based programs.

Encouraging and incentivizing use of proven technology-based programs

The most important thing government must do to expand the use of proven technology-based approaches (as well as non-technology approaches) is to build a powerful hunger for them among educators, parents, and the public at large. Yes, I realize that this sounds backward; shouldn’t government sponsor development, research, and dissemination of proven programs first? Yes it should, and I’ll address this topic in a moment. Of course we need proven programs. No one will clamor for an empty box. But today, many proven programs already exist, and the bigger problem is getting them (and many others to come) enthusiastically adopted by schools. In fact, we must eventually get to the point where educational leaders value not only individual programs supported by research, but value research itself. That is, when they start looking for technology-based programs, their first step would be to find out what programs are proven to work, rather than selecting programs in the usual way and only then trying to find evidence to support the choice they have already made.

Government at any level could support such a process, but the most likely leader in this would be the federal government. It could provide incentives to schools that select and implement proven programs, and build off of this multifaceted outreach efforts to build hype around proven approaches and the idea that approaches should be proven.

A good example of what I have in mind was the Comprehensive School Reform (CSR) grants of the late 1990s. Schools that adopted whole-school reform models that met certain requirements could receive grants of up to $50,000 per year for three years. By the end of CSR, about 1000 schools got grants in a competitive process, but CSR programs were used in an estimated 6000 schools nationwide. In other words, the hype generated by the CSR grants process led many schools that never got a grant to find other resources to adopt these whole school programs. I should note that only a few of the adopted programs had evidence of effectiveness; in CSR, the core idea was whole-school reform, not evidence (though some had good evidence of effectiveness). But a process like CSR, with highly visible grants and active support from government, illustrates a process that built a powerful hunger for whole-school reform, which could work just as well, I think, if applied to building a powerful hunger for proven technology-based programs and other proven approaches.

“Wait a minute,” I can hear you saying. “Didn’t the ESSA evidence standards already do this?”

This was indeed the intention of ESSA, which established “strong,” “moderate,” and “promising” levels of evidence (as well as lower categories). ESSA has been a great first step in building interest in evidence. However, the only schools that could obtain additional funding for selecting proven programs were among the lowest-achieving schools in the country, so ordinary Title I schools, not to mention non-Title I schools, were not much affected. CSR gave extra points to high-poverty schools, but a much wider variety of schools could get into that game. There is a big different between creating interest in evidence, which ESSA has definitely done, and creating a powerful hunger for proven programs. ESSA was passed four years ago, and it is only now beginning to build knowledge and enthusiasm among schools.

Building many more proven technology-based programs

Clearly, we need many more proven technology-based programs. In our Evidence for ESSA website (www.evidenceforessa.org), we list 113 reading and mathematics programs that meet any of the three top ESSA standards. Only 28 of these (18 reading, 10 math) have a major technology component. This is a good start, but we need a lot more proven technology-based programs. To get them, government needs to continue its productive Institute for Education Sciences (IES) and Education Innovation Research (EIR) initiatives. For for-profit companies, Small Business Innovation Research (SBIR) plays an important role in early development of technology solutions. However, the pace of development and research focused on practical programs for schools needs to accelerate, and to learn from its own successes and failures to increase the success rate of its investments.

Communicating “what works”

There remains an important need to provide school leaders with easy-to-interpret information on the evidence base for all existing programs schools might select. The What Works Clearinghouse and our Evidence for ESSA website do this most comprehensively, but these and other resources need help to keep up with the rapid expansion of evidence that has appeared in the past 10 years.

Technology-based education can still produce the outcomes Skinner promised in his 1954 video, the ones we have all been eagerly awaiting ever since. However, technology developers and researchers need more help from government to build an eager market not just for technology, but for proven achievement outcomes produced by technology.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (2019). Effective reading programs for secondary students. Reading Research Quarterly, 54 (2), 133-166.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (in preparation). A synthesis of quantitative research on elementary reading. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

Morrison, J. R., Ross, S.M., & Cheung, A.C.K. (2019). From the market to the classroom: How ed-tech products are procured by school districts interacting with vendors. Educational Technology Research and Development, 67 (2), 389-421.

Pellegrini, M., Inns, A., Lake, C., & Slavin, R. (2019). Effective programs in elementary mathematics: A best-evidence synthesis. Available at www.bestevidence.com. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Do School Districts Really Have Difficulty Meeting ESSA Evidence Standards?

The Center for Educational Policy recently released a report on how school districts are responding to the Every Student Succeeds Act (ESSA) requirement that schools seeking school improvement grants select programs that meet ESSA’s strong, moderate, or promising standards of evidence. Education Week ran a story on the CEP report.

The report noted that many states, districts, and schools are taking the evidence requirements seriously, and are looking at websites and consulting with researchers to help them identify programs that meet the standards. This is all to the good.

However, the report also notes continuing problems districts and schools are having finding out “what works.” Two particular problems were cited. One was that districts and schools were not equipped to review research to find out what works. The other was that rural districts and schools found few programs proven effective in rural schools.

I find these concerns astounding. The same concerns were expressed when ESSA was first passed, in 2015. But that was almost four years ago. Since 2015, the What Works Clearinghouse has added information to help schools identify programs that meet the top two ESSA evidence categories, strong and moderate. Our own Evidence for ESSA, launched in February, 2017, has up-to-date information on virtually all PK-12 reading and math programs currently in dissemination. Among hundreds of programs examined, 113 meet ESSA standards for strong, moderate, or promising evidence of effectiveness. WWC, Evidence for ESSA, and other sources are available online at no cost. The contents of the entire Evidence for ESSA website were imported into Ohio’s own website on this topic, and dozens of states, perhaps all of them, have informed their districts and schools about these sources.

The idea that districts and schools could not find information on proven programs if they wanted to do so is difficult to believe, especially among schools eligible for school improvement grants. Such schools, and the districts in which they are located, write a lot of grant proposals for federal and state funding. The application forms for school improvement grants always explain the evidence requirements, because that is the law. Someone in every state involved with federal funding knows about the WWC and Evidence for ESSA websites. More than 90,000 unique users have used Evidence for ESSA, and more than 800 more sign on each week.

blog_10-10-19_generickids_500x333

As to rural schools, it is true that many studies of educational programs have taken place in urban areas. However, 47 of the 113 programs qualified by Evidence for ESSA were validated in at least one rural study, or a study including a large enough rural sample to enable researchers to separately report program impacts for rural students. Also, almost all widely disseminated programs have been used in many rural schools. So rural districts and schools that care about evidence can find programs that have been evaluated in rural locations, or at least that were evaluated in urban or suburban schools but widely disseminated in rural schools.

Also, it is important to note that if a program was successfully evaluated only in urban or suburban schools, the program still meets the ESSA evidence standards. If no studies of a given outcome were done in rural locations, a rural school in need of better outcomes could, in effect, be asked to choose between a program proven to work somewhere and probably used in dissemination in rural schools, or they could choose a program not proven to work anywhere. Every school and district has to make the best choices for their kids, but if I were a rural superintendent or principal, I’d read up on proven programs, and then go visit some rural schools using that program nearby. Wouldn’t you?

I have no reason to suspect that the CEP survey is incorrect. There are many indications that district and school leaders often do feel that the ESSA evidence rules are too difficult to meet. So what is really going on?

My guess is that there are many district and school leaders who do not want to know about evidence on proven programs. For example, they may have longstanding, positive relationships with representatives of publishers or software developers, or they may be comfortable and happy with the materials and services they are already using, evidence-proven or not. If they do not have evidence of effectiveness that would pass muster with WWC or Evidence for ESSA, the publishers and software developers may push hard on state and district officials, put forward dubious claims for evidence (such as studies with no control groups), and do their best to get by in a system that increasingly demands evidence that they lack. In my experience, district and state officials often complain about having inadequate staff to review evidence of effectiveness, but their concern may be less often finding out what works as it is defending themselves from publishers, software developers, or current district or school users of programs, who maintain that they have been unfairly rated by WWC, Evidence for ESSA, or other reviews. State and district leaders who stand up to this pressure may have to spend a lot of time reviewing evidence or hearing arguments.

On the plus side, at the same time that publishers and software producers may be seeking recognition for their current products, many are also sponsoring evaluations of some of their products that they feel are mostly likely to perform well in rigorous evaluations. Some may be creating new programs that resemble programs that have met evidence standards. If the federal ESSA law continues to demand evidence for certain federal funding purposes, or even to expand this requirement to additional parts of federal grant-making, then over time the ESSA law will have its desired effect, rewarding the creation and evaluation of programs that do meet standards by making it easier to disseminate such programs. The difficulties the evidence movement is experiencing are likely to diminish over time as more proven programs appear, and as federal, state, district, and school leaders get comfortable with evidence.

Evidence-based reform was always going to be difficult, because of the amount of change it entails and the stakes involved. But sooner or later, it is the right thing to do, and leaders who insist on evidence will see increasing levels of learning among their students, at minimal cost beyond what they already spend on untested or ineffective approaches. Medicine went through a similar transition in 1962, when the U.S. Congress first required that medicines be rigorously evaluated for effectiveness and safety. At first, many leaders in the medical profession resisted the changes, but after a while, they came to insist on them. The key is political leadership willing to support the evidence requirement strongly and permanently, so that educators and vendors alike will see that the best way forward is to embrace evidence and make it work for kids.

Photo courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence and Policy: If You Want to Make a Silk Purse, Why Not Start With…Silk?

Everyone knows that you can’t make a silk purse out of a sow’s ear. This proverb goes back to the 1500s. Yet in education policy, we are constantly trying to achieve stellar results using school and classroom programs of unknown effectiveness, or even those known to be ineffective, even though proven effective programs are readily available.

Note that I am not criticizing teachers. They do the best they can with the tools they have. What I am concerned about is the quality of those tools, the programs, and professional development teachers receive to help them succeed with their children.

An excellent case in point was School Improvement Grants (SIG), a major provision of No Child Left Behind (NCLB). SIG provided major grants to schools scoring in the lowest 5% of their states. For most of its existence, SIG required schools seeking funding to choose among four models. Two of these, school closure and charterization, were rarely selected. Instead, most SIG schools selected either “turnaround” (replacing the principal and at least 50% of the staff), or the most popular, “transformation” (replacing the principal, using data to inform instruction, lengthening the school day or year, and evaluating teachers based on the achievement growth of their students). However, a major, large-scale evaluation of SIG by Mathematica showed no achievement benefits for schools that received SIG grants, compared to similar schools that did not. Ultimately, SIG spent more than $7 billion, an amount that we in Baltimore, at least, consider to be a lot of money. The tragedy, however, is not just the waste of so much money, but the dashing of so many hopes for meaningful improvement.

This is where the silk purse/sow’s ear analogy comes in. Each of the options among which SIG schools had to choose was composed of components that either lacked evidence of effectiveness or actually had evidence of ineffectiveness. If the components of each option are not known to be effective, then why would anyone expect a combination of them to be effective?

Evidence on school closure has found that this strategy diminishes student achievement for a few years, after which student performance returns to where it was before. Research on charter schools by CREDO (2013) has found an average effect size of zero for charters. The exception is “no-excuses” charters, such as KIPP and Success Academies, but these charters only accept students whose parents volunteer, not whole failing schools. Turnaround and transformation schools both require a change of principal, which introduces chaos and, as far as I know, has never been found to improve achievement. The same is true of replacing at least 50% of the teachers. Lots of chaos, no evidence of effectiveness. The other required elements of the popular “transformation” model have been found to have either no impact (e.g., benchmark assessments to inform teachers about progress; Inns et al., 2019), or small effects (e.g., lengthening the school day or year; Figlio et al., 2018). Most importantly, to blog_9-26-19_pig_500x336my knowledge, no one ever did a randomized evaluation of the entire transformation model, with all components included. We did not find out what the joint effect was until the Mathematica study. Guess what? Sewing together swatches of sows’ ears did not produce a silk purse. With a tiny proportion of $7 billion, the Department of Education could have identified and tested out numerous well-researched, replicable programs and then offered SIG schools a choice among the ones that worked best. A selection of silk purses, all made from 100% pure silk. Doesn’t that sound like a better idea?

In later blogs I’ll say more about how the federal government could ensure the success of educational initiatives by ensuring that schools have access to federal resources to adopt and implement proven programs designed to accomplish the goals of the legislation.

References

Figlio, D., Holden, K. L., & Ozek, U. (2018). Do students benefit from longer school days? Regression discontinuity evidence from Florida’s additional hour of literacy instruction. Economics of Education Review, 67, 171-183.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Superman and Statistics

In the 1978 movie “Superman,” Lois Lane, star journalist, crash-lands in a helicopter on top of a 50-story skyscraper.   The helicopter is hanging by a strut to the edge of the roof, and Lois is hanging on to a microphone cord.  Finally, the cord breaks, and Lois falls 45 floors before (of course) she is swooped up by Superman, who flies her back to the roof and sets her down gently. Then he says to her:

“I hope this doesn’t put you off of flying. Statistically speaking, it is the safest form of travel.”

She faints.

blog_8-29-19_superman_333x500
Don’t let the superhero thing fool you: The “S” is for “statistics.”

I’ve often had the very same problem whenever I do public speaking.  As soon as I mention statistics, some of the audience faints dead away. Or perhaps they are falling asleep. But either way, saying the word “statistics” is not usually a good way to make friends and influence people.

 

The fact is, most people don’t like statistics.  Or more accurately, people don’t like statistics except when the statistical findings agree with their prejudices.  At an IES meeting several years ago, a well-respected superintendent was invited to speak to what is perhaps the nerdiest, most statistically-minded group in all of education, except for an SREE conference.  He actually said, without the slightest indication of humor or irony, that “GOOD research is that which confirms what I have always believed.  BAD research is that which disagrees with what I have always believed.”  I’d guess that the great majority of superintendents and other educational leaders would agree, even if few would say so out loud to an IES meeting.

If educational leaders only attend to statistics that confirm their prior beliefs, one might argue that, well, at least they do attend to SOME research.  But research in an applied field like education is of value only if it leads to positive changes in practice.  If influential educators only respect research that confirms their previous beliefs, then they never change their practices or policies because of research, and policies and practices stay the same forever, or change only due to politics, marketing, and fads. Which is exactly how most change does in fact happen in education.  If you wonder why educational outcomes change so slowly, if at all, you need look no further than this.

Why is it that educators pay so little attention to research, whatever its outcomes, much in contrast to the situation in many other fields?  Some people argue that, unlike medicine, where doctors are well trained in research, educators lack such training.  Yet agriculture makes far more practical use of evidence than education does, and most farmers, while outstanding in their fields, are not known for their research savvy.

Farmers are, however, very savvy business owners, and they can clearly see that their financial success depends on using seeds, stock, methods, fertilizers, and insecticides proven to be effective, cost-effective, and sustainable.  Similarly, research plays a crucial role in technology, engineering, materials science, and every applied field in which better methods, with proven outcomes, lead to increased profits.

So one major reason for limited use of research in education is that adopting proven methods in education rarely leads to enhanced profit.  Even in parts of the educational enterprise where profit is involved, economic success still depends far more on politics, marketing, and fads, than on evidence. Outcomes of adopting proven programs or practices may not have an obvious impact on overall school outcomes because achievement is invariably tangled up with factors such as social class of children and schools’ abilities to attract skilled teachers and principals.  Ask parents whether they would rather have their child to go to a school in which all students have educated, upper-middle class parents, or to a school that uses proven instructional strategies in every subject and grade level.  The problem is that there are only so many educated, upper-middle class parents to go around, so schools and parents often focus on getting the best possible demographics in their school rather than on adopting proven teaching methods.

How can education begin to make the rapid, irreversible improvements characteristic of agriculture, technology, and medicine?  The answer has to take into account the fundamental fact that education is a government monopoly.  I’m not arguing whether or not this is a good thing, but it is certain to be true for many years, perhaps forever.  The parts of education that are not part of government are private schools, and these are very few in number (charter schools are funded by government, of course).

Because government funds nearly all schools, it has both the responsibility and the financial capacity to do whatever is feasible to make schools as effective as it possibly can.  This is true of all levels of government, federal, state, and local.  Because it is in charge of all federal research funding, the federal government is the most logical organization to lead any efforts to increase use of proven programs and practices in education, but forward-looking state and local government could also play a major role if they chose to do so.

Government can and must take on the role that profit plays in other research-focused fields, such as agriculture, medicine, and engineering.   As I’ve argued many times, government should use national funding to incentivize schools to adopt proven programs.  For example, the federal government could provide funding to schools to enable them to pay the costs of adopting programs found to be effective in rigorous research.  Under ESSA, it is already doing this, but right now the main focus is only on Title I school improvement grants.   These go to schools that are among the lowest performers in their states.  School improvement is a good place to start, but it affects a modest number of extremely disadvantaged schools.  Such schools do need substantial funding and expertise to make the substantial gains they are asked to make, but they are so unlike the majority of Title I schools that they are not sufficient examples of what evidence-based reform could achieve.  Making all Title I schools eligible for incentive funding to implement proven programs, or at least working toward this goal over time, would arouse the interest and enthusiasm of a much greater set of schools, virtually all of which need major changes in practices to reach national standards.

To make this policy work, the federal government would need to add considerably to the funding it provides for educational research and development, and it would need to rigorously evaluate programs that show the greatest promise to make large, pragmatically important differences in schools’ outcomes in key areas, such as reading, mathematics, science, and English for English learners.  One way to do this cost-effectively would be to allow districts (or consortia of districts) to put forward pairs of matched schools for potential funding.   Districts or consortia awarded grants might then be evaluated by federal contractors, who would randomly assign one school in each pair to receive the program, while the pair members not selected would serve as a control group.  In this way, programs that had been found effective in initial research might have their evaluations replicated many times, at a very low evaluation cost.  This pair evaluation design could greatly increase the number of schools using proven programs, and could add substantially to the set of programs known to be effective.  This design could also give many more districts experience with top-quality experimental research, building support for the idea that research is of value to educators and students.

Getting back to Superman and Lois Lane, it is only natural to expect that Lois might be reluctant to get on another helicopter anytime soon, no matter what the evidence says.  However, when we are making decisions on behalf of children, it’s not enough to just pay attention to our own personal experience.  Listen to Superman.  The evidence matters.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

The Farmer and the Moon Rocks: What Did the Moon Landing Do For Him?

Many, many years ago, during the summer after my freshman year in college, I hitchhiked from London to Iran.  This was the summer of 1969, so Apollo 11 was also traveling.   I saw television footage of the moon landing in Heraklion, Crete, where a television store switched on all of its sets and turned them toward the sidewalk.  A large crowd watched the whole thing.  This was one of the few times I recall when it was really cool to be an American abroad.

After leaving Greece, I went on to Turkey, and then Iran.  In Teheran, I got hold of an English-language newspaper.  It told an interesting story.  In rural Iran, many people believed that the moon was a goddess.  Obviously, a spaceship cannot land on a goddess, so many people concluded that the moon landing must be a hoax.

A reporter from the newspaper interviewed a number of people about the moon landing.  Some were adamant that the landing could not have happened.  However, one farmer was more pragmatic.  He asked the reporter, “I hear the astronauts brought back moon rocks.  Is that right?”

“That’s what they say!” replied the reporter.

“I am fixing my roof, and I could sure use a few of those moon rocks.  Do you think they might give me some?”

blog_8-1-19_moonfarmer_500x432 (002)

The moon rock story illustrates a daunting problem in the dissemination of educational research. Researchers do high-quality research on topics of great importance to the practice of education. They publish this research in top journals, and get promotions and awards for it, but in most cases, their research does not arouse even the slightest bit of interest among the educators for whom it was intended.

The problem relates to the farmer repairing his roof.  He had a real problem to solve, and he needed help with it.  A reporter comes and tells him about the moon landing. The farmer does not think, “How wonderful!  What a great day for science and discovery and the future of mankind!”  Instead, he thinks, “What does this have to do with me?”  Thinking back on the event, I sometimes wonder if he really expected any moon rocks, or if he was just sarcastically saying, “I don’t care.”

Educators care deeply about their students, and they will do anything they can to help them succeed.  But if they hear about research that does not relate to their children, or at least to children like theirs, they are unlikely to care very much.  Even if the research is directly applicable to their students, they are likely to reason, perhaps from long experience, that they will never get access to this research, because it costs money or takes time or upsets established routines or is opposed by powerful groups or whatever.  The result is status quo as far as the eye can see, or implementation of small changes that are currently popular but unsupported by evidence of effectiveness.  Ultimately, the result is cynicism about all research.

Part of the problem is that education is effectively a government monopoly, so entrepreneurship or responsible innovation are difficult to start or maintain.  However, the fact that education is a government monopoly can also be made into a positive, if government leaders are willing to encourage and support evidence-based reform.

Imagine that government decided to provide incentive funding to schools to help them adopt programs that meet a high standard of evidence.  This has actually happened under the ESSA law, but only in a very narrow slice of schools, those very low achieving schools that qualify for school improvement.  Imagine that the government provided a lot more support to schools to help them learn about, adopt, and effectively implement proven programs, and then gradually expanded the categories of schools that could qualify for this funding.

Going back to the farmer and the moon rocks, such a policy would forge a link between exciting research on promising innovations and the real world of practice.  It could cause educators to pay much closer attention to research on practical programs of relevance to them, and to learn how to tell the difference between valid and biased research.  It could help educators become sophisticated and knowledgeable consumers of evidence and of programs themselves.

One of the best examples of the transformation such policies could bring about is agriculture.  Research has a long history in agriculture, and from colonial times, government has encouraged and incentivized farmers to pay attention to evidence about new practices, new seeds, new breeds of animals, and so on.  By the late 19th century, the U.S. Department of Agriculture was sponsoring research, distributing information designed to help farmers be more productive, and much more.  Today, research in agriculture is a huge enterprise, constantly making important discoveries that improve productivity and reduce costs.  As a result, world agriculture, especially American agriculture, is able to support far larger populations at far lower costs than anyone ever thought possible.  The Iranian farmer talking about the moon rocks could not see how advances in science could possibly benefit him personally.  Today, however, in every developed economy, farmers have a clear understanding of the connection between advances in science and their own success.  Everyone knows that agriculture can have bad as well as good effects, as when new practices lead to pollution, but when governments decide to solve those problems, they turn to science. Science is not inherently good or bad, but if it is powerful, then democracies can direct it to do what is best for people.

Agriculture has made dramatic advances over the past hundred years, and continues to make rapid progress by linking science to practice.  In education, we are just starting to make the link between evidence and practice.  Isn’t it time to learn from the experiences of medicine, technology, and agriculture, among many other evidence based fields, to achieve more rapid progress in educational practice and outcomes?

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Educational Policies vs. Educational Programs: Evidence from France

Ask any parent what their kids say when they ask them what they did in school today. Invariably, they respond, “Nuffin,” or some equivalent. My four-year-old granddaughter always says, “I played with my fwends.” All well and good.

However, in educational policy, policy makers often give the very same answer when asked, “What did the schools not using the (insert latest policy darling) do?”

“Nuffin’”. Or they say, “Whatever they usually do.” There’s nothing wrong with the latter answer if it’s true. But given the many programs now known to improve student achievement (see www.evidenceforessa.org), why don’t evaluators compare outcomes of new policy initiatives to those of proven educational programs known to improve the same outcomes the policy innovation is supposed to improve, perhaps at far lower cost per student? The evaluations should also compare to “business as usual,” but adding proven programs to evaluations of large policy innovations would help avoid declaring policy innovations to be successful when they are in fact just slightly more effective than “business as usual,” and much less effective or less cost-effective than alternative proven approaches? For example, when evaluating charter schools, why not routinely compare them to whole-school reform models that have similar objectives? When evaluating extending the school day or school year to help high-poverty schools, why not compare these innovations to using the same amount of additional money to hiring tutors to use proven tutoring models to help struggling students? In evaluating policies in which students are held back if they do not read at grade level by third grade, why not compare these approaches to intensive phonics instruction and tutoring in grades K-3, which are known to greatly improve student reading achievement?

blog_7-25-19_LeoandAdaya_375x500
There is nuffin like a good fwend.

As one example of research comparing a policy intervention to a promising educational intervention, I recently saw a very interesting pair of studies from France. Ecalle, Gomes, Auphan, Cros, & Magnan (2019) compared two interventions applied in special priority areas with high poverty levels. Both interventions focused on reading in first grade.

One of the interventions involved halving class size, from approximately 24 students to 12. The other provided intensive reading instruction in small groups (4-6 children) to students who were struggling in reading, as well as less intensive interventions to larger groups (10-12 students). Low achievers got two 30-minute interventions each day for a year, while the higher-performing readers got one 30-minute intervention each day. In both cases, the focus of instruction was on phonics. In all cases, the additional interventions were provided by the students’ usual teachers.

The students in small classes were compared to students in ordinary-sized classes, while the students in the educational intervention were compared to students in same-sized classes who did not get the group interventions. Similar measures and analyses were used in both comparisons.

The results were nearly identical for the class size policy and the educational intervention. Halving class size had effect sizes of +0.14 for word reading and +0.22 for spelling. Results for the educational intervention were +0.13 for word reading, +0.12 for spelling, +0.14 for a group test of reading comprehension, +0.32 for an individual test of comprehension, and +0.19 for fluency.

These studies are less than perfect in experimental design, but they are nevertheless interesting. Most importantly, the class size policy required an additional teacher for each class of 24. Using Maryland annual teacher salaries and benefits ($84,000), that means the cost in our state would be about $3500 per student. The educational intervention required one day of training and some materials. There was virtually no difference in outcomes, but the differences in cost were staggering.

The class size policy was mandated by the Ministry of Education. The educational intervention was offered to schools and provided by a university and a non-profit. As is so often the case, the policy intervention was simplistic, easy to describe in the newspaper, and minimally effective. The class size policy reminds me of a Florida program that extended the school schedule by an hour every day in high-poverty schools, mainly to provide more time for reading instruction. The cost per child was about $800 per year. The outcomes were minimal (ES=+0.05).

After many years of watching what schools do and reviewing research on outcomes of innovations, I find it depressing that policies mandated on a substantial scale are so often found to be ineffective. They are usually far more expensive than much more effective, rigorously evaluated programs that are, however, a bit more difficult to describe, and rarely arouse great debate in the political arena. It’s not that anyone is opposed to the educational intervention, but it is a lot easier to carry a placard saying “Reduce Class Size Now!” than to carry one saying “Provide Intensive Phonics in Small Groups with More Supplemental Teaching for the Lowest Achievers Now!” The latter just does not fit on a placard, and though easy to understand if explained, it does not lend itself to easy communication. Actually, there are much more effective first grade interventions than the one evaluated in France (see www.evidenceforessa.org). At a cost much less than $3500 per student, several one-to-one tutoring programs using well-trained teaching assistants as tutors would have been able to produce an effect size of more than +0.50 for all first graders on average. This would even fit on a placard: “Tutoring Now!”

I am all in favor of trying out policy innovations. But when parents of kids in a proven-program comparison group are asked what they did in school today, they shouldn’t say “nuffin’”. They should say, “My tooter taught me to read. And I played with my fwends.”

References

Ecalle, J., Gomes, C., Auphan, P., Cros, L., & Magnan, A. (2019). Effects of policy and educational interventions intended to reduce difficulties in literacy skills in grade 1. Studies in Educational Evaluation, 61, 12-20.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Benchmark Assessments: Weighing the Pig More Often?

There is an old saying about educational assessment: “If you want to fatten a pig, it doesn’t help to weigh it more often.”

To be fair, it may actually help to weigh pigs more often, so the farmer knows whether they are gaining weight at the expected levels. Then they can do something in time if this is not the case.

It is surely correct that weighing pigs does no good in itself, but it may serve a diagnostic purpose. What matters is not the weighing, but rather what the farmer or veterinarian does based on the information provided by the weighing.

blog_4-11-19_pigscale_500x432

This blog is not, however, about porcine policy, but educational policy. In schools, districts, and even whole states, most American children take “benchmark assessments” roughly three to six times a year. These assessments are intended to tell teachers, principals, and other school leaders how students are doing, especially in reading and math. Ideally, benchmark assessments are closely aligned with state accountability tests, making it possible for school leaders to predict how whole grade levels are likely to do on the state tests early enough in the year to enable them to provide additional assistance in areas of need. The information might be as detailed as “fourth graders need help in fractions” or “English learners need help in vocabulary.”

Benchmark assessments are only useful if they improve scores on state accountability tests. Other types of intervention may be beneficial even if they do not make any difference in state test scores, but it is hard to see why benchmark assessments would be valuable if they do not in fact have any impact on state tests, or other standardized tests.

So here is the bad news: Research finds that benchmark assessments do not make any difference in achievement.

High-quality, large scale randomized evaluations of benchmark assessments are relatively easy to do. Many have in fact been done. Use of benchmark assessments have been evaluated in elementary reading and math (see www.bestevidence.org). Here is a summary of the findings.

Number of Studies Mean Effect Size
Elementary Reading 6 -0.02
Elementary Math 4    .00
Study-weighted mean 10 -0.01

In a rational world, these findings would put an end to benchmark assessments, at least as they are used now. The average outcomes are not just small, they are zero. They use up a lot of student time and district money.

In our accountability-obsessed educational culture, how could use of benchmark assessments make no difference at all on the only measure they are intended to improve? I would suggest several possibilities.

First, perhaps the most likely, is that teachers and schools do not do much with the information from benchmark assessments. If you are trying to lose weight, you likely weigh yourself every day. But if you then make no systematic effort to change your diet or increase your exercise, then all those weighings are of little value. In education, the situation is much worse than in weight reduction, because teachers are each responsible for 20-30 students. Results of benchmark assessments are different for each student, so a school staff that learns that its fourth graders need improvement in fractions finds it difficult to act on this information. Some fourth graders in every school are excelling in fractions, some just need a little help, and some are struggling in fractions because they missed the prerequisite skills. “Teach more fractions” is not a likely solution except for some of that middle group, yet differentiating instruction for all students is difficult to do well.

Another problem is that it takes time to score and return benchmark assessments, so by the time a team of teachers decides how to respond to benchmark information, the situation has moved on.

Third, benchmark assessments may add little because teachers and principals already know a lot more about their students than any test can tell them. Imagine a principal receiving the information that her English learners need help in vocabulary. I’m going to guess that she already knows that. But more than that, she and her teachers know which English learners need what kind of vocabulary, and they have other measures and means of finding out. Teachers already give a lot of brief, targeted curriculum-linked assessments, and they always have. Further, wise teachers stroll around and listen in on students working in cooperative groups, or look at their tests or seatwork or progress on computer curriculum, to get a sophisticated understanding of why some students are having trouble, and ideas for what to do about it. For example, it is possible that English learners are lacking school-specific vocabulary, such as that related to science or social studies, and this observation may suggest solutions (e.g., teach more science and social studies). But what if some English learners are afraid or unwilling to express themselves in class, but sit quietly and never volunteer answers? A completely different set of solutions might be appropriate in this case, such as using cooperative learning or tutoring strategies to give students safe spaces in which to use the vocabulary they have, and gain motivation and opportunities to learn and use more.

Benchmark assessments fall into the enormous category of educational solutions that are simple, compelling, and wrong. Yes, teachers need to know what students are learning and what is needed to improve it, but they have available many more tools that are far more sensitive, useful, timely, and tied to actions teachers can take.

Eliminating benchmark assessments would save schools a lot of money. Perhaps that money could be redirected to professional development to help teachers use approaches actually proven to work. I know, that’s crazy talk. But perhaps if we looked at what students are actually doing and learning in class, we could stop weighing pigs and start improving teaching for all children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.