New Sections on Social Emotional Learning and Attendance in Evidence for ESSA!

We are proud to announce the launch of two new sections of our Evidence for ESSA website (www.evidenceforessa.org): K-12 social-emotional learning and attendance. Funded by a grant from the Bill and Melinda Gates Foundation, the new sections represent our first foray beyond academic achievement.

blog_2-6-20_evidenceessa_500x333

The social-emotional learning section represents the greatest departure from our prior work. This is due to the nature of SEL, which combines many quite diverse measures. We identified 17 distinct measures, which we grouped in four overarching categories, as follows:

Academic Competence

  • Academic performance
  • Academic engagement

Problem Behaviors

  • Aggression/misconduct
  • Bullying
  • Disruptive behavior
  • Drug/alcohol abuse
  • Sexual/racial harassment or aggression
  • Early/risky sexual behavior

Social Relationships

  • Empathy
  • Interpersonal relationships
  • Pro-social behavior
  • Social skills
  • School climate

Emotional Well-Being

  • Reduction of anxiety/depression
  • Coping skills/stress management
  • Emotional regulation
  • Self-esteem/self-efficacy

Evidence for ESSA reports overall effect sizes and ratings for each of the four categories, as well as the 17 individual measures (which are themselves composed of many measures used by various qualifying studies). So in contrast to reading and math, where programs are rated based on the average of all qualifying  reading or math measures, an SEL program could be rated “strong” in one category, “promising” in another, and “no qualifying evidence” or “qualifying studies found no significant positive effects” on others.

Social-Emotional Learning

The SEL review, led by Sooyeon Byun, Amanda Inns, Cynthia Lake, and Liz Kim at Johns Hopkins University, located 24 SEL programs that both met our inclusion standards and had at least one study that met strong, moderate, or promising standards on at least one of the four categories of outcomes.

There is much more evidence at the elementary and middle school levels than at the high school level. Recognizing that some programs had qualifying outcomes at multiple levels, there were 7 programs with positive evidence for pre-K/K, 10 for 1-2, 13 for 3-6, and 9 for middle school. In contrast, there were only 4 programs with positive effects in senior high schools. Fourteen studies took place in urban locations, 5 in suburbs, and 5 in rural districts.

The outcome variables most often showing positive impacts include social skills (12), school climate (10), academic performance (10), pro-social behavior (8), aggression/misconduct (7), disruptive behavior (7), academic engagement (7), interpersonal relationships (7), anxiety/depression (6), bullying (6), and empathy (5). Fifteen of the programs targeted whole classes or schools, and 9 targeted individual students.

Several programs stood out in terms of the size of the impacts. Take the Lead found effect sizes of +0.88 for social relationships and +0.51 for problem behaviors. Check, Connect, and Expect found effect sizes of +0.51 for emotional well-being, +0.29 for problem behaviors, and +0.28 for academic competence. I Can Problem Solve found effect sizes of +0.57 on school climate. The Incredible Years Classroom and Parent Training Approach reported effect sizes of +.57 for emotional regulation, +0.35 for pro-social behavior, and +0.21 for aggression/misconduct. The related Dinosaur School classroom management model reported effect sizes of +0.31 for aggression/misbehavior. Class-Wide Function-Related Intervention Teams (CW-FIT), an intervention for elementary students with emotional and behavioral disorders, had effect sizes of +0.47 and +0.30 across two studies for academic engagement and +0.38 and +0.21 for disruptive behavior. It also reported effect sizes of +0.37 for interpersonal relationships, +0.28 for social skills, and +0.26 for empathy. Student Success Skills reported effect sizes of +0.30 for problem behaviors, +0.23 for academic competence, and +0.16 for social relationships.

In addition to the 24 highlighted programs, Evidence for ESSA lists 145 programs that were no longer available, had no qualifying studies (e.g., no control group), or had one or more qualifying studies but none that met the ESSA Strong, Moderate, or Promising criteria. These programs can be found by clicking on the “search” bar.

There are many problems inherent to interpreting research on social-emotional skills. One is that some programs may appear more effective than others because they use measures such as self-report, or behavior ratings by the teachers who taught the program. In contrast, studies that used more objective measures, such as independent observations or routinely collected data, may obtain smaller impacts. Also, SEL studies typically measure many outcomes and only a few may have positive impacts.

In the coming months, we will be doing analyses and looking for patterns in the data, and will have more to say about overall generalizations. For now, the new SEL section provides a guide to what we know now about individual programs, but there is much more to learn about this important topic.

Attendance

Our attendance review was led by Chenchen Shi, Cynthia Lake, and Amanda Inns. It located ten attendance programs that met our standards. Only three of these reported on chronic absenteeism, which refers to students missing more than 10% of days. Many more focused on average daily attendance (ADA). Among programs focused on average daily attendance, a Milwaukee elementary school program called SPARK had the largest impact (ES=+0.25). This is not an attendance program per se, but it uses AmeriCorps members to provide tutoring services across the school, as well as involving families. SPARK has been shown to have strong effects on reading, as well as its impressive effects on attendance. Positive Action is another schoolwide approach, in this case focused on SEL. It has been found in two major studies in grades K-8 to improve student reading and math achievement, as well as overall attendance, with a mean effect size of +0.20.

The one program to report data on both ADA and chronic absenteeism is called Attendance and Truancy Intervention and Universal Procedures, or ATI-UP. It reported an effect size in grades K-6 of +0.19 for ADA and +0.08 for chronic attendance. Talent Development High School (TDHS) is a ninth grade intervention program that provides interdisciplinary learning communities and “double dose” English and math classes for students who need them. TDHS reported an effect size of +0.17.

An interesting approach with a modest effect size but very modest cost is now called EveryDay Labs (formerly InClass Today). This program helps schools organize and implement a system to send postcards to parents reminding them of the importance of student attendance. If students start missing school, the postcards include this information as well. The effect size across two studies was a respectable +0.16.

As with SEL, we will be doing further work to draw broader lessons from research on attendance in the coming months. One pattern that seems clear already is that effective attendance improvement models work on building close relationships between at-risk students and concerned adults. None of the effective programs primarily uses punishment to improve attendance, but instead they focus on providing information to parents and students and on making it clear to students that they are welcome in school and missed when they are gone.

Both SEL and attendance are topics of much discussion right now, and we hope these new sections will be useful and timely in helping schools make informed choices about how to improve social-emotional and attendance outcomes for all students.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Do School Districts Really Have Difficulty Meeting ESSA Evidence Standards?

The Center for Educational Policy recently released a report on how school districts are responding to the Every Student Succeeds Act (ESSA) requirement that schools seeking school improvement grants select programs that meet ESSA’s strong, moderate, or promising standards of evidence. Education Week ran a story on the CEP report.

The report noted that many states, districts, and schools are taking the evidence requirements seriously, and are looking at websites and consulting with researchers to help them identify programs that meet the standards. This is all to the good.

However, the report also notes continuing problems districts and schools are having finding out “what works.” Two particular problems were cited. One was that districts and schools were not equipped to review research to find out what works. The other was that rural districts and schools found few programs proven effective in rural schools.

I find these concerns astounding. The same concerns were expressed when ESSA was first passed, in 2015. But that was almost four years ago. Since 2015, the What Works Clearinghouse has added information to help schools identify programs that meet the top two ESSA evidence categories, strong and moderate. Our own Evidence for ESSA, launched in February, 2017, has up-to-date information on virtually all PK-12 reading and math programs currently in dissemination. Among hundreds of programs examined, 113 meet ESSA standards for strong, moderate, or promising evidence of effectiveness. WWC, Evidence for ESSA, and other sources are available online at no cost. The contents of the entire Evidence for ESSA website were imported into Ohio’s own website on this topic, and dozens of states, perhaps all of them, have informed their districts and schools about these sources.

The idea that districts and schools could not find information on proven programs if they wanted to do so is difficult to believe, especially among schools eligible for school improvement grants. Such schools, and the districts in which they are located, write a lot of grant proposals for federal and state funding. The application forms for school improvement grants always explain the evidence requirements, because that is the law. Someone in every state involved with federal funding knows about the WWC and Evidence for ESSA websites. More than 90,000 unique users have used Evidence for ESSA, and more than 800 more sign on each week.

blog_10-10-19_generickids_500x333

As to rural schools, it is true that many studies of educational programs have taken place in urban areas. However, 47 of the 113 programs qualified by Evidence for ESSA were validated in at least one rural study, or a study including a large enough rural sample to enable researchers to separately report program impacts for rural students. Also, almost all widely disseminated programs have been used in many rural schools. So rural districts and schools that care about evidence can find programs that have been evaluated in rural locations, or at least that were evaluated in urban or suburban schools but widely disseminated in rural schools.

Also, it is important to note that if a program was successfully evaluated only in urban or suburban schools, the program still meets the ESSA evidence standards. If no studies of a given outcome were done in rural locations, a rural school in need of better outcomes could, in effect, be asked to choose between a program proven to work somewhere and probably used in dissemination in rural schools, or they could choose a program not proven to work anywhere. Every school and district has to make the best choices for their kids, but if I were a rural superintendent or principal, I’d read up on proven programs, and then go visit some rural schools using that program nearby. Wouldn’t you?

I have no reason to suspect that the CEP survey is incorrect. There are many indications that district and school leaders often do feel that the ESSA evidence rules are too difficult to meet. So what is really going on?

My guess is that there are many district and school leaders who do not want to know about evidence on proven programs. For example, they may have longstanding, positive relationships with representatives of publishers or software developers, or they may be comfortable and happy with the materials and services they are already using, evidence-proven or not. If they do not have evidence of effectiveness that would pass muster with WWC or Evidence for ESSA, the publishers and software developers may push hard on state and district officials, put forward dubious claims for evidence (such as studies with no control groups), and do their best to get by in a system that increasingly demands evidence that they lack. In my experience, district and state officials often complain about having inadequate staff to review evidence of effectiveness, but their concern may be less often finding out what works as it is defending themselves from publishers, software developers, or current district or school users of programs, who maintain that they have been unfairly rated by WWC, Evidence for ESSA, or other reviews. State and district leaders who stand up to this pressure may have to spend a lot of time reviewing evidence or hearing arguments.

On the plus side, at the same time that publishers and software producers may be seeking recognition for their current products, many are also sponsoring evaluations of some of their products that they feel are mostly likely to perform well in rigorous evaluations. Some may be creating new programs that resemble programs that have met evidence standards. If the federal ESSA law continues to demand evidence for certain federal funding purposes, or even to expand this requirement to additional parts of federal grant-making, then over time the ESSA law will have its desired effect, rewarding the creation and evaluation of programs that do meet standards by making it easier to disseminate such programs. The difficulties the evidence movement is experiencing are likely to diminish over time as more proven programs appear, and as federal, state, district, and school leaders get comfortable with evidence.

Evidence-based reform was always going to be difficult, because of the amount of change it entails and the stakes involved. But sooner or later, it is the right thing to do, and leaders who insist on evidence will see increasing levels of learning among their students, at minimal cost beyond what they already spend on untested or ineffective approaches. Medicine went through a similar transition in 1962, when the U.S. Congress first required that medicines be rigorously evaluated for effectiveness and safety. At first, many leaders in the medical profession resisted the changes, but after a while, they came to insist on them. The key is political leadership willing to support the evidence requirement strongly and permanently, so that educators and vendors alike will see that the best way forward is to embrace evidence and make it work for kids.

Photo courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Proven Programs Can’t Replicate, Just Like Bees Can’t Fly

In the 1930’s, scientists in France announced that based on principles of aerodynamics, bees could not fly. The only evidence to the contrary was observational, atheoretical, quasi-scientific reports that bees do in fact fly.

The widely known story about bees’ ability to fly came up in a discussion about the dissemination of proven programs in education. Many education researchers and policy makers maintain that the research-development-evaluation-dissemination sequence relied upon for decades to create better ways to educate children has failed. Many observers note that few practitioners seek out research when they consider selection of programs intended to improve student learning or other important outcomes. Research Practice Partnerships, in which researchers work in partnership with local educators to solve problems of importance to the educators, is largely based on the idea that educators are unlikely to use programs or practices unless they personally were involved in creating them. Opponents of evidence-based education policies invariably complain that because schools are so diverse, they are unlikely to adopt programs developed and researched elsewhere, and this is why few research-based programs are widely disseminated.

Dissemination of proven programs is in fact difficult, and there is little evidence of how proven programs might be best disseminated. Recognizing these and many other problems, however, it is important to note one small fact in all this doom and gloom: Proven programs are disseminated. Among the 113 reading and mathematics programs that have met the stringent standards of Evidence for ESSA (www.evidenceforessa.org), most have been disseminated to dozens, hundreds, or thousands of schools. In fact, we do not accept programs that are not in active dissemination (because it is not terribly useful for educators, our target audience, to find out that a proven program is no longer available, or never was). Some (generally newer) programs may only operate in a few schools, but they intend to grow. But most programs, supported by non-profit or commercial organizations, are widely disseminated.

Examples of elementary reading programs with strong, moderate, or promising evidence of effectiveness (by ESSA standards) and wide dissemination include Reading Recovery, Success for All, Sound Partners, Lindamood, Targeted Reading Intervention, QuickReads, SMART, Reading Plus, Spell Read, Acuity, Corrective Reading, Reading Rescue, SuperKids, and REACH. For middle/high, effective and disseminated reading programs include SIM, Read180, Reading Apprenticeship, Comprehension Circuit Training, BARR, ITSS, Passport Reading Journeys, Expository Reading and Writing Course, Talent Development, Collaborative Strategic Reading, Every Classroom Every Day, and Word Generation.

In elementary math, effective and disseminated programs include Math in Focus, Math Expressions, Acuity, FocusMath, Math Recovery, Time to Know, Jump Math, ST Math, and Saxon Math. Middle/high school programs include ASSISTments, Every Classroom Every Day, eMINTS, Carnegie Learning, Core-Plus, and Larson Pre-Algebra.

These are programs that I know have strong, moderate, or promising evidence and are widely disseminated. There may be others I do not know about.

I hope this list convinces any doubters that proven programs can be disseminated. In light of this list, how can it be that so many educators, researchers, and policy makers think that proven educational programs cannot be disseminated?

One answer may be that dissemination of educational programs and practices almost never happens the way many educational researchers wish it did. Researchers put enormous energy into doing research and publishing their results in top journals. Then they are disappointed to find out that publishing in a research journal usually has no impact whatever on practice. They then often try to make their findings more accessible by writing them in plain English in more practitioner-oriented journals. Still, this usually has little or no impact on dissemination.

But writing in journals is rarely how serious dissemination happens. The way it does happen is that the developer or an expert partner (such as a publisher or software company) takes the research ideas and makes them into a program, one that solves a problem that is important to educators, is attractive, professional, and complete, and is not too expensive. Effective programs almost always provide extensive professional development, materials, and software. Programs that provide excellent, appealing, effective professional development, materials, and software become likely candidates for dissemination. I’d guess that virtually every one of the programs I listed earlier took a great idea and made it into an appealing program.

A depressing part of this process is that programs that have no evidence of effectiveness, or even have evidence of ineffectiveness, follow the same dissemination process as do proven programs. Until the 2015 ESSA evidence standards appeared, evidence had a very limited role in the whole development-dissemination process. So far, ESSA has pointed more of a spotlight on evidence of effectiveness, but it is still the case that having strong evidence of effectiveness does not provide a program with a decisive advantage over programs lacking positive evidence. Regardless of their actual evidence bases, most programs today make claims that their programs are “evidence-based” or at least “evidence-informed,” so users can easily be fooled.

However, this situation is changing. First, the government itself is identifying programs with evidence of effectiveness, and may publicize them. Government initiatives such as Investing in Innovation (i3; now called EIR) actually provide funding to proven programs to enable them to begin to scale up their programs. The What Works Clearinghouse (https://ies.ed.gov/ncee/wwc/), Evidence for ESSA (www.evidenceforessa.org), and other sources provide easy access to information on proven programs. In other words, government is starting to intervene to nudge the longstanding dissemination process toward programs proven to work.

blog_10-3-19_Bee_art_500x444Back to the bees, the 1930 conclusion that bees should not be able to fly was overturned in 2005, when American researchers observed what bees actually do when they fly, and discovered that bees do not flap their wings like birds. Instead, they push air forward and back with their wings, creating a low pressure zone above them. This pressure keeps them in the air.

In the same way, educational researchers might stop theorizing about how disseminating proven programs is impossible, but instead, observe several programs that have actually done it. Then we can design government policies to further assist proven programs to build the capital and the organizational capacity to effectively disseminate, and to provide incentives and assistance to help schools in need of proven programs to learn about and adopt them.

Perhaps we could call this Plan Bee.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Why Not the Best?

In 1879, Thomas Edison invented the first practical lightbulb. The main problem he faced was in finding a filament that would glow, but not burn out too quickly. To find it, he tried more than 6000 different substances that had some promise as filaments. The one he found was carbonized cotton, which worked far better than all the others (tungsten, which we use now, came much later).

Of course, the incandescent light changed the world. It replaced far more expensive gas lighting systems, and was much more versatile. The lightbulb captured the evening and nighttime hours for every kind of human activity.

blog_9-19-19_lightbulb_500x347Yet if the lightbulb had been an educational innovation, it probably would have been proclaimed a dismal failure. Skeptics would have noted that only one out of six thousand filaments worked. Meta-analysts would have averaged the effect sizes for all 6000 experiments and concluded that the average effect size across the 6000 filaments was only +0.000000001. Hardly worthwhile. If Edison’s experiments were funded by government, politicians would have complained that 5,999 of Edison’s filaments were a total waste of taxpayers’ money. Economists would have computed benefit-cost ratios and concluded that even if Edison’s light worked, the cost of making the first one was astronomical, not to mention the untold cost of setting up electrical generation and wiring systems.

This is all ridiculous, you must be saying. But in the world of evidence-based education, comparable things happen all the time. In 2003, Borman et al. did a meta-analysis of 300 studies of 29 comprehensive (whole-school) reform designs. They identified three as having solid evidence of effectiveness. Rather than celebrating and disseminating those three (and continuing research and development to identify more of them), the U.S. Congress ended its funding for dissemination of comprehensive school reform programs. Turn out the light before you leave, Mr. Edison!

Another common practice in education is to do meta-analyses averaging outcomes across an entire category of programs or policies, and ignoring the fact that some distinctively different and far more effective programs are swallowed up in the averages. A good example is charter schools. Large-scale meta-analyses by Stanford’s CREDO (2013) found that the average effect sizes for charter schools are effectively zero. A 2015 analysis found better, but still very small effect sizes in urban districts (ES = +0.04 in reading, +0.05 in math). The What Works Clearinghouse published a 2010 review that found slight negative effects of middle school charters. These findings are useful in disabusing us of the idea that charter schools are magic, and get positive outcomes just because they are charter schools. However, they do nothing to tell us about extraordinary charter schools using methods that other schools (perhaps including non-charters) could also use. There is more positive evidence relating to “no-excuses” schools, such as KIPP and Success Academies, but among the thousands of charters that now exist, is this the only type of charter worth replicating? There must be some bright lights among all these bulbs.

As a third example, there are now many tutoring programs used in elementary reading and math with struggling learners. The average effect sizes for all forms of tutoring average about +0.30, in both reading and math. But there are reading tutoring approaches with effect sizes of +0.50 or more. If these programs are readily available, why would schools adopt programs less effective than the best? The average is useful for research purposes, and there are always considerations of costs and availability, but I would think any school would want to ignore the average for all types of programs and look into the ones that can do the most for their kids, at a reasonable cost.

I’ve often heard teachers and principals point out that “parents send us the best kids they have.” Yes they do, and for this reason it is our responsibility as educators to give those kids the best programs we can. We often describe educating students as enlightening them, or lifting the lamp of learning, or fiat lux. Perhaps the best way to fiat a little more lux is to take a page from Edison, the great luxmeister: Experiment tirelessly until we find what works. Then use the best we have.

Reference

Borman, G.D., Hewes, G. M., Overman, L.T., & Brown, S. (2003). Comprehensive school reform and achievement: A meta-analysis. Review of Educational Research, 73(2), 125-230.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

Superman and Statistics

In the 1978 movie “Superman,” Lois Lane, star journalist, crash-lands in a helicopter on top of a 50-story skyscraper.   The helicopter is hanging by a strut to the edge of the roof, and Lois is hanging on to a microphone cord.  Finally, the cord breaks, and Lois falls 45 floors before (of course) she is swooped up by Superman, who flies her back to the roof and sets her down gently. Then he says to her:

“I hope this doesn’t put you off of flying. Statistically speaking, it is the safest form of travel.”

She faints.

blog_8-29-19_superman_333x500
Don’t let the superhero thing fool you: The “S” is for “statistics.”

I’ve often had the very same problem whenever I do public speaking.  As soon as I mention statistics, some of the audience faints dead away. Or perhaps they are falling asleep. But either way, saying the word “statistics” is not usually a good way to make friends and influence people.

 

The fact is, most people don’t like statistics.  Or more accurately, people don’t like statistics except when the statistical findings agree with their prejudices.  At an IES meeting several years ago, a well-respected superintendent was invited to speak to what is perhaps the nerdiest, most statistically-minded group in all of education, except for an SREE conference.  He actually said, without the slightest indication of humor or irony, that “GOOD research is that which confirms what I have always believed.  BAD research is that which disagrees with what I have always believed.”  I’d guess that the great majority of superintendents and other educational leaders would agree, even if few would say so out loud to an IES meeting.

If educational leaders only attend to statistics that confirm their prior beliefs, one might argue that, well, at least they do attend to SOME research.  But research in an applied field like education is of value only if it leads to positive changes in practice.  If influential educators only respect research that confirms their previous beliefs, then they never change their practices or policies because of research, and policies and practices stay the same forever, or change only due to politics, marketing, and fads. Which is exactly how most change does in fact happen in education.  If you wonder why educational outcomes change so slowly, if at all, you need look no further than this.

Why is it that educators pay so little attention to research, whatever its outcomes, much in contrast to the situation in many other fields?  Some people argue that, unlike medicine, where doctors are well trained in research, educators lack such training.  Yet agriculture makes far more practical use of evidence than education does, and most farmers, while outstanding in their fields, are not known for their research savvy.

Farmers are, however, very savvy business owners, and they can clearly see that their financial success depends on using seeds, stock, methods, fertilizers, and insecticides proven to be effective, cost-effective, and sustainable.  Similarly, research plays a crucial role in technology, engineering, materials science, and every applied field in which better methods, with proven outcomes, lead to increased profits.

So one major reason for limited use of research in education is that adopting proven methods in education rarely leads to enhanced profit.  Even in parts of the educational enterprise where profit is involved, economic success still depends far more on politics, marketing, and fads, than on evidence. Outcomes of adopting proven programs or practices may not have an obvious impact on overall school outcomes because achievement is invariably tangled up with factors such as social class of children and schools’ abilities to attract skilled teachers and principals.  Ask parents whether they would rather have their child to go to a school in which all students have educated, upper-middle class parents, or to a school that uses proven instructional strategies in every subject and grade level.  The problem is that there are only so many educated, upper-middle class parents to go around, so schools and parents often focus on getting the best possible demographics in their school rather than on adopting proven teaching methods.

How can education begin to make the rapid, irreversible improvements characteristic of agriculture, technology, and medicine?  The answer has to take into account the fundamental fact that education is a government monopoly.  I’m not arguing whether or not this is a good thing, but it is certain to be true for many years, perhaps forever.  The parts of education that are not part of government are private schools, and these are very few in number (charter schools are funded by government, of course).

Because government funds nearly all schools, it has both the responsibility and the financial capacity to do whatever is feasible to make schools as effective as it possibly can.  This is true of all levels of government, federal, state, and local.  Because it is in charge of all federal research funding, the federal government is the most logical organization to lead any efforts to increase use of proven programs and practices in education, but forward-looking state and local government could also play a major role if they chose to do so.

Government can and must take on the role that profit plays in other research-focused fields, such as agriculture, medicine, and engineering.   As I’ve argued many times, government should use national funding to incentivize schools to adopt proven programs.  For example, the federal government could provide funding to schools to enable them to pay the costs of adopting programs found to be effective in rigorous research.  Under ESSA, it is already doing this, but right now the main focus is only on Title I school improvement grants.   These go to schools that are among the lowest performers in their states.  School improvement is a good place to start, but it affects a modest number of extremely disadvantaged schools.  Such schools do need substantial funding and expertise to make the substantial gains they are asked to make, but they are so unlike the majority of Title I schools that they are not sufficient examples of what evidence-based reform could achieve.  Making all Title I schools eligible for incentive funding to implement proven programs, or at least working toward this goal over time, would arouse the interest and enthusiasm of a much greater set of schools, virtually all of which need major changes in practices to reach national standards.

To make this policy work, the federal government would need to add considerably to the funding it provides for educational research and development, and it would need to rigorously evaluate programs that show the greatest promise to make large, pragmatically important differences in schools’ outcomes in key areas, such as reading, mathematics, science, and English for English learners.  One way to do this cost-effectively would be to allow districts (or consortia of districts) to put forward pairs of matched schools for potential funding.   Districts or consortia awarded grants might then be evaluated by federal contractors, who would randomly assign one school in each pair to receive the program, while the pair members not selected would serve as a control group.  In this way, programs that had been found effective in initial research might have their evaluations replicated many times, at a very low evaluation cost.  This pair evaluation design could greatly increase the number of schools using proven programs, and could add substantially to the set of programs known to be effective.  This design could also give many more districts experience with top-quality experimental research, building support for the idea that research is of value to educators and students.

Getting back to Superman and Lois Lane, it is only natural to expect that Lois might be reluctant to get on another helicopter anytime soon, no matter what the evidence says.  However, when we are making decisions on behalf of children, it’s not enough to just pay attention to our own personal experience.  Listen to Superman.  The evidence matters.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

The Farmer and the Moon Rocks: What Did the Moon Landing Do For Him?

Many, many years ago, during the summer after my freshman year in college, I hitchhiked from London to Iran.  This was the summer of 1969, so Apollo 11 was also traveling.   I saw television footage of the moon landing in Heraklion, Crete, where a television store switched on all of its sets and turned them toward the sidewalk.  A large crowd watched the whole thing.  This was one of the few times I recall when it was really cool to be an American abroad.

After leaving Greece, I went on to Turkey, and then Iran.  In Teheran, I got hold of an English-language newspaper.  It told an interesting story.  In rural Iran, many people believed that the moon was a goddess.  Obviously, a spaceship cannot land on a goddess, so many people concluded that the moon landing must be a hoax.

A reporter from the newspaper interviewed a number of people about the moon landing.  Some were adamant that the landing could not have happened.  However, one farmer was more pragmatic.  He asked the reporter, “I hear the astronauts brought back moon rocks.  Is that right?”

“That’s what they say!” replied the reporter.

“I am fixing my roof, and I could sure use a few of those moon rocks.  Do you think they might give me some?”

blog_8-1-19_moonfarmer_500x432 (002)

The moon rock story illustrates a daunting problem in the dissemination of educational research. Researchers do high-quality research on topics of great importance to the practice of education. They publish this research in top journals, and get promotions and awards for it, but in most cases, their research does not arouse even the slightest bit of interest among the educators for whom it was intended.

The problem relates to the farmer repairing his roof.  He had a real problem to solve, and he needed help with it.  A reporter comes and tells him about the moon landing. The farmer does not think, “How wonderful!  What a great day for science and discovery and the future of mankind!”  Instead, he thinks, “What does this have to do with me?”  Thinking back on the event, I sometimes wonder if he really expected any moon rocks, or if he was just sarcastically saying, “I don’t care.”

Educators care deeply about their students, and they will do anything they can to help them succeed.  But if they hear about research that does not relate to their children, or at least to children like theirs, they are unlikely to care very much.  Even if the research is directly applicable to their students, they are likely to reason, perhaps from long experience, that they will never get access to this research, because it costs money or takes time or upsets established routines or is opposed by powerful groups or whatever.  The result is status quo as far as the eye can see, or implementation of small changes that are currently popular but unsupported by evidence of effectiveness.  Ultimately, the result is cynicism about all research.

Part of the problem is that education is effectively a government monopoly, so entrepreneurship or responsible innovation are difficult to start or maintain.  However, the fact that education is a government monopoly can also be made into a positive, if government leaders are willing to encourage and support evidence-based reform.

Imagine that government decided to provide incentive funding to schools to help them adopt programs that meet a high standard of evidence.  This has actually happened under the ESSA law, but only in a very narrow slice of schools, those very low achieving schools that qualify for school improvement.  Imagine that the government provided a lot more support to schools to help them learn about, adopt, and effectively implement proven programs, and then gradually expanded the categories of schools that could qualify for this funding.

Going back to the farmer and the moon rocks, such a policy would forge a link between exciting research on promising innovations and the real world of practice.  It could cause educators to pay much closer attention to research on practical programs of relevance to them, and to learn how to tell the difference between valid and biased research.  It could help educators become sophisticated and knowledgeable consumers of evidence and of programs themselves.

One of the best examples of the transformation such policies could bring about is agriculture.  Research has a long history in agriculture, and from colonial times, government has encouraged and incentivized farmers to pay attention to evidence about new practices, new seeds, new breeds of animals, and so on.  By the late 19th century, the U.S. Department of Agriculture was sponsoring research, distributing information designed to help farmers be more productive, and much more.  Today, research in agriculture is a huge enterprise, constantly making important discoveries that improve productivity and reduce costs.  As a result, world agriculture, especially American agriculture, is able to support far larger populations at far lower costs than anyone ever thought possible.  The Iranian farmer talking about the moon rocks could not see how advances in science could possibly benefit him personally.  Today, however, in every developed economy, farmers have a clear understanding of the connection between advances in science and their own success.  Everyone knows that agriculture can have bad as well as good effects, as when new practices lead to pollution, but when governments decide to solve those problems, they turn to science. Science is not inherently good or bad, but if it is powerful, then democracies can direct it to do what is best for people.

Agriculture has made dramatic advances over the past hundred years, and continues to make rapid progress by linking science to practice.  In education, we are just starting to make the link between evidence and practice.  Isn’t it time to learn from the experiences of medicine, technology, and agriculture, among many other evidence based fields, to achieve more rapid progress in educational practice and outcomes?

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Educational Policies vs. Educational Programs: Evidence from France

Ask any parent what their kids say when they ask them what they did in school today. Invariably, they respond, “Nuffin,” or some equivalent. My four-year-old granddaughter always says, “I played with my fwends.” All well and good.

However, in educational policy, policy makers often give the very same answer when asked, “What did the schools not using the (insert latest policy darling) do?”

“Nuffin’”. Or they say, “Whatever they usually do.” There’s nothing wrong with the latter answer if it’s true. But given the many programs now known to improve student achievement (see www.evidenceforessa.org), why don’t evaluators compare outcomes of new policy initiatives to those of proven educational programs known to improve the same outcomes the policy innovation is supposed to improve, perhaps at far lower cost per student? The evaluations should also compare to “business as usual,” but adding proven programs to evaluations of large policy innovations would help avoid declaring policy innovations to be successful when they are in fact just slightly more effective than “business as usual,” and much less effective or less cost-effective than alternative proven approaches? For example, when evaluating charter schools, why not routinely compare them to whole-school reform models that have similar objectives? When evaluating extending the school day or school year to help high-poverty schools, why not compare these innovations to using the same amount of additional money to hiring tutors to use proven tutoring models to help struggling students? In evaluating policies in which students are held back if they do not read at grade level by third grade, why not compare these approaches to intensive phonics instruction and tutoring in grades K-3, which are known to greatly improve student reading achievement?

blog_7-25-19_LeoandAdaya_375x500
There is nuffin like a good fwend.

As one example of research comparing a policy intervention to a promising educational intervention, I recently saw a very interesting pair of studies from France. Ecalle, Gomes, Auphan, Cros, & Magnan (2019) compared two interventions applied in special priority areas with high poverty levels. Both interventions focused on reading in first grade.

One of the interventions involved halving class size, from approximately 24 students to 12. The other provided intensive reading instruction in small groups (4-6 children) to students who were struggling in reading, as well as less intensive interventions to larger groups (10-12 students). Low achievers got two 30-minute interventions each day for a year, while the higher-performing readers got one 30-minute intervention each day. In both cases, the focus of instruction was on phonics. In all cases, the additional interventions were provided by the students’ usual teachers.

The students in small classes were compared to students in ordinary-sized classes, while the students in the educational intervention were compared to students in same-sized classes who did not get the group interventions. Similar measures and analyses were used in both comparisons.

The results were nearly identical for the class size policy and the educational intervention. Halving class size had effect sizes of +0.14 for word reading and +0.22 for spelling. Results for the educational intervention were +0.13 for word reading, +0.12 for spelling, +0.14 for a group test of reading comprehension, +0.32 for an individual test of comprehension, and +0.19 for fluency.

These studies are less than perfect in experimental design, but they are nevertheless interesting. Most importantly, the class size policy required an additional teacher for each class of 24. Using Maryland annual teacher salaries and benefits ($84,000), that means the cost in our state would be about $3500 per student. The educational intervention required one day of training and some materials. There was virtually no difference in outcomes, but the differences in cost were staggering.

The class size policy was mandated by the Ministry of Education. The educational intervention was offered to schools and provided by a university and a non-profit. As is so often the case, the policy intervention was simplistic, easy to describe in the newspaper, and minimally effective. The class size policy reminds me of a Florida program that extended the school schedule by an hour every day in high-poverty schools, mainly to provide more time for reading instruction. The cost per child was about $800 per year. The outcomes were minimal (ES=+0.05).

After many years of watching what schools do and reviewing research on outcomes of innovations, I find it depressing that policies mandated on a substantial scale are so often found to be ineffective. They are usually far more expensive than much more effective, rigorously evaluated programs that are, however, a bit more difficult to describe, and rarely arouse great debate in the political arena. It’s not that anyone is opposed to the educational intervention, but it is a lot easier to carry a placard saying “Reduce Class Size Now!” than to carry one saying “Provide Intensive Phonics in Small Groups with More Supplemental Teaching for the Lowest Achievers Now!” The latter just does not fit on a placard, and though easy to understand if explained, it does not lend itself to easy communication. Actually, there are much more effective first grade interventions than the one evaluated in France (see www.evidenceforessa.org). At a cost much less than $3500 per student, several one-to-one tutoring programs using well-trained teaching assistants as tutors would have been able to produce an effect size of more than +0.50 for all first graders on average. This would even fit on a placard: “Tutoring Now!”

I am all in favor of trying out policy innovations. But when parents of kids in a proven-program comparison group are asked what they did in school today, they shouldn’t say “nuffin’”. They should say, “My tooter taught me to read. And I played with my fwends.”

References

Ecalle, J., Gomes, C., Auphan, P., Cros, L., & Magnan, A. (2019). Effects of policy and educational interventions intended to reduce difficulties in literacy skills in grade 1. Studies in Educational Evaluation, 61, 12-20.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

How Evidence-Based Reform Saved Patrick

Several years ago, I heard a touching story. There was a fourth grader in a school in Southern Maryland who had not learned to read. I’ll call him Patrick. A proven reading program came to the school and replaced the school’s haphazard reading approach with a systematic, phonetic model, with extensive teacher training and coaching. By the end of the school year, Patrick was reading near grade level.

Toward the end of the year, Patrick’s mother came to the school to thank his teacher for what she’d done for him. She showed Patrick’s teacher a box in which Patrick had saved every one of his phonetic readers. “Patrick calls this his treasure box,” she said. “He says he is going to keep these books forever, so that if he ever has a child of his own, he can teach him how to read.”

blog_5-23-19_happygirl_375x500

If you follow my blogs, or other writings on evidence-based practice, they often sound a little dry, full of effect sizes and wonkiness. Yet all of those effect sizes and policy proposals mean nothing unless they are changing the lives of children.

Traditional educational practices are perhaps fine for most kids, but there are millions of kids like Patrick who are not succeeding in school but could be, if they experienced proven programs and practices. In particular, there is no problem in education we know more about than early reading failure. A recent review we just released on programs for struggling readers identified 61 very high-quality studies of 48 programs. 22 of these programs meet the “strong” or “moderate” effectiveness standards for ESSA. Eleven programs had effect sizes from +0.30 to +0.86. There are proven one-to-one and small-group tutoring programs, classroom interventions, and whole-school approaches. They differ in costs, impacts, and practicability in various settings, but it is clear that reading failure can be prevented or remediated before third grade for nearly all children. Yet most struggling young readers do not receive any of these programs.

Patrick, at age 10, had the foresight to prepare to someday help his own child avoid the pain and humiliation he had experienced. Why is it so hard for caring grownups in positions of authority to come to the same understanding?

Patrick must be about 30 by now. Perhaps he has a child of his own. Wherever he is, I’m certain he remembers how close he came to a life of illiteracy and failure. I wonder if he still has his treasure box with the books inside it.

Patrick probably does not know where those books came from, the research supporting their use, or the effect sizes from the many evaluations. He doesn’t need to be a researcher to understand what happened to him. What he does know is that someone cared enough to give him an opportunity to learn to read.

Why does what happened to Patrick have to be such a rare occurrence? If you understand what the evidence means and you see educators and policy makers continuing to ignore it, shouldn’t you be furious?

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Make No Small Plans

“Make no little plans; they have no magic to stir men’s blood, and probably themselves will not be realized. Make big plans, aim high in hope and work, remembering that a noble, logical diagram, once recorded, will never die…”

-Daniel Burnham, American architect, 1910

More than 100 years ago, architect Daniel Burnham expressed an important insight. “Make no little plans,” he said. Many people have said that, one way or another. But Burnham’s insight was that big plans matter because they “have magic to stir men’s blood.” Small plans do not, and for this reason may never even be implemented. Burnham believed that even if big plans fail, they have influence into the future, as little plans do not.

blog_6-27-19_Great Wall of China
Make no small plans.

In education, we sometimes have big plans. Examples include comprehensive school reform in the 1990s, charter schools in the 2000s, and evidence-based reform today. None of these have yet produced revolutionary positive outcomes, but all of them have captured the public imagination. Even if you are not an advocate of any of these, you cannot ignore them, as they take on a life of their own. When conditions are right, they will return many times, in many forms, and may eventually lead to substantial impacts. In medicine, it was demonstrated in the mid-1800s that germs caused disease and that medicine could advance through rigorous experimentation (think Lister and Pasteur, for example). Yet sterile procedures in operations and disciplined research on practical treatments took 100 years to prevail. The medical profession resisted sterile procedures and evidence-based medicine for many years. Sterile procedures and evidence-based medicine were big ideas. It took a long time for them to take hold, but they did prevail, and remained big ideas through all that time.

Big Plans in Education

In education, as in medicine long ago, we have thousands of important problems, and good work continues (and needs to continue) on most of them. However, at least in American education, there is one crucial problem that dwarfs all others and lends itself to truly big plans. This is the achievement gap between students from middle class backgrounds and those from disadvantaged backgrounds. As noted in my April 25 blog, the achievement gap between students who qualify for free lunch and those who do not, between African American and White students, and between Hispanic students and non-Hispanic White students, all average an effect size of about 0.50. This presents a serious challenge. However, as I pointed out in that blog, there are several programs in existence today capable of adding an effect size of +0.50 to the reading or math achievement of students at risk. All programs that can do this involve one-to-one or one-to-small group tutoring. Tutoring is expensive, but recent research has found that well-trained and well-supervised tutors with BAs, but not necessarily teaching certificates, can obtain the same outcomes as certified teachers do, at half the cost. Using our own Success for All program with six tutors per school (K-5), high-poverty African American elementary schools in Baltimore obtained effect sizes averaging +0.50 for all students and +0.75 for students in the lowest 25% of their grades (Madden et al., 1993). A follow-up to eighth grade found that achievement outcomes maintained and both retentions and special education placements were cut in half (Borman & Hewes, 2003). We have not had the opportunity to once again implement Success for All with so much tutoring included, but even with fewer tutors, Success for All has had substantial impacts. Cheung et al. (2019) found an average effect size of +0.27 across 28 randomized and matched studies, a more than respectable outcome for a whole-school intervention. For the lowest-achieving students, the average was +0.56.

Knowing that Success for All can achieve these outcomes is important in itself, but it is also an indication that substantial positive effects can be achieved for whole schools, and with sufficient tutors, can equal the entire achievement gaps according to socio-economic status and race. If one program can do this, why not many others?

Imagine that the federal government or other large funders decided to support the development and evaluation of several different ideas. Funders might establish a goal of increasing reading achievement by an effect size of +0.50, or as close as possible to this level, working with high-poverty schools. Funders would seek organizations that have already demonstrated success at an impressive level, but not yet +0.50, who could describe a compelling strategy to increase their impact to +0.50 or more. Depending on the programs’ accomplishments and needs, they might be funded to experiment with enhancements to their promising model. For example, they might add staff, add time (e.g., continue for multiple years), or add additional program components likely to strengthen the overall model. Once programs could demonstrate substantial outcomes in pilots, they might be funded to do a cluster randomized trial. If this experiment shows positive effects approaching +0.50 or more, the developers might receive funding for scale-up. If the outcomes are substantially positive but significantly less than +0.50, the funders might decide to help the developers make changes leading up to a second randomized experiment.

There are many details to be worked out, but the core idea could capture the imagination and energy of educators and public-spirited citizens alike. This time, we are not looking for marginal changes that can be implemented cheaply. This time, we will not quit until we have many proven, replicable programs, each of which is so powerful that it can, over a period of years, remedy the entire achievement gap. This time, we are not making changes in policy or governance and hoping for the best. This time, we are going directly to the schools where the disadvantaged kids are, and we are not declaring victory until we can guarantee such students gains that will give them the same outcomes as those of the middle class kids in the suburbs.

Perhaps the biggest idea of all is the idea that we need big ideas with big outcomes!

Anyway, this is my big plan. What’s yours?

————

Note: Just as I was starting on this blog, I got an email from Ulrich Boser at the Center for American Progress. CAP and the Thomas Fordham Foundation are jointly sponsoring an “Education Moonshot,” including a competition with a grand prize of $10,000 for a “moonshot idea that will revolutionize schooling and dramatically improve student outcomes.” For more on this, please visit the announcement site. Submissions are due August 1st at this online portal and involve telling them in 500 words your, well, big plan.

 

References

Borman, G., & Hewes, G. (2003).  Long-term effects and cost effectiveness of Success for All.  Educational Evaluation and Policy Analysis, 24 (2), 243-266.

Cheung, A., Xie, C., Zhuang, T., & Slavin, R. E. (2019). Success for All: A quantitative synthesis of evaluations. Manuscript submitted for publication.

Madden, N.A., Slavin, R.E., Karweit, N.L., Dolan, L.J., & Wasik, B.A. (1993).  Success for All:  Longitudinal effects of a restructuring program for inner-city elementary schools.  American Educational Research Journal, 30, 123-148.

 

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence For Revolution

In the 1973 movie classic “Sleeper,” Woody Allen plays a New York health food store owner who wakes up 200 years in the future, in a desolate environment.

“What happened to New York?” he asks the character played by Diane Keaton.  She replies, “It was destroyed.  Some guy named Al Shanker got hold of a nuclear weapon.”

I think every member of the American Federation of Teachers knows this line.  Firebrand educator Al Shanker, founder of the AFT, would never have hurt anyone.  But short of that, he would do whatever it took to fight for teachers’ rights, and most importantly, for the rights of students to receive a great education.  In fact, he saw that the only way for teachers to receive the respect, fair treatment, and adequate compensation they deserved, and still deserve, was to demonstrate that they had skills not possessed by the general public that could have powerful impacts on students’ learning.  Physicians are much respected and well paid because they have special knowledge of how to prevent and cure disease, and to do this they have available a vast armamentarium of drugs, devices, and procedures, all proven to work in rigorous research.

Shanker was a huge fan of evidence in education, first because evidence-based practice helps students succeed, but also because teachers using proven programs and practices show that they deserve respect and fair compensation because they have specialized knowledge backed by proven methods able to ensure the success of students.

The Revolutionary Potential of Evidence in Education

The reality is that in most school districts, especially large ones, most power resides in the central office, not in individual schools.  The district chooses textbooks, computer technology, benchmark assessments, and much more.  There are probably principals and teachers on the committees that make these decisions, but once the decisions are made, the building-level staff is supposed to fall in line and do as they are told.  When I speak to principals and teachers, they are astonished to learn that they can easily look up on www.evidenceforessa.org just about any program their district is using and find out what the evidence base for that program is.  Most of the time, the programs they have been required to use by their school administrations either have no valid evidence of effectiveness, or they have concrete evidence that they do not work.  Further, in almost all categories, effective programs or approaches do exist, and could have been selected as practical alternatives to the ones that were adopted.  Individual schools could have been allowed to choose proven programs, instead of being required to use programs they know not to be proven effective.

Perhaps schools should always be given the freedom to select and implement programs other than those mandated by the district, as long as the programs they want to implement have stronger evidence of effectiveness than the district’s programs.

blog_6-27-19_delacroix_500x391

How the Revolution Might Happen

Imagine that principals, teachers, parent activists, enlightened school board members, and others in a given district were all encouraged to use Evidence for ESSA or other reviews of evaluations of educational programs.  Imagine that many of these people just wrote letters to the editor, or letters to district leaders, letters to education reporters, or perhaps, if these are not sufficient, they might march on the district offices with placards reading something like “Use What Works” or “Our Children Deserve Proven Programs.”  Who could be against that?

One of three things might happen.  First, the district might allow individual schools to use proven programs in place of the standard programs, and encourage any school to come forward with evidence from a reliable source if its staff or leadership wants to use a proven program not already in use.  That would be a great outcome.  Second, the district leadership might start using proven programs districtwide, and working with school leaders and teachers to ensure successful implementation.  This retains the top-down structure, but it could greatly improve student outcomes.  Third, the district might ignore the protesters and the evidence, or relegate the issue to a very slow study committee, which may be the same thing.  That would be a distressing outcome, though no worse than what probably happens now in most places.  It could still be the start of a positive process, if principals, teachers, school board members, and parent activists keep up the pressure, helpfully informing the district leaders about proven programs they could select when they are considering a change.

If this process took place around the country, it could have a substantial positive impact beyond the individual districts involved, because it could scare the bejabbers out of publishers, who would immediately see that if they are going to succeed in the long run, they need to design programs that will likely work in rigorous evaluations, and then market them based on real evidence.  That would be revolutionary indeed.  Until the publishers get firmly on board, the evidence movement is just tapping at the foundations of a giant fortress with a few ball peen hammers.  But there will come a day when that fortress will fall, and all will be beautiful. It will not require a nuclear weapon, just a lot of committed and courageous educators and advocates, with a lot of persistence, a lot of information on what works in education, and a lot of ball peen hammers.

Picture Credit: Liberty Leading the People, Eugène Delacroix [Public domain]

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.