Evidence and Policy: If You Want to Make a Silk Purse, Why Not Start With…Silk?

Everyone knows that you can’t make a silk purse out of a sow’s ear. This proverb goes back to the 1500s. Yet in education policy, we are constantly trying to achieve stellar results using school and classroom programs of unknown effectiveness, or even those known to be ineffective, even though proven effective programs are readily available.

Note that I am not criticizing teachers. They do the best they can with the tools they have. What I am concerned about is the quality of those tools, the programs, and professional development teachers receive to help them succeed with their children.

An excellent case in point was School Improvement Grants (SIG), a major provision of No Child Left Behind (NCLB). SIG provided major grants to schools scoring in the lowest 5% of their states. For most of its existence, SIG required schools seeking funding to choose among four models. Two of these, school closure and charterization, were rarely selected. Instead, most SIG schools selected either “turnaround” (replacing the principal and at least 50% of the staff), or the most popular, “transformation” (replacing the principal, using data to inform instruction, lengthening the school day or year, and evaluating teachers based on the achievement growth of their students). However, a major, large-scale evaluation of SIG by Mathematica showed no achievement benefits for schools that received SIG grants, compared to similar schools that did not. Ultimately, SIG spent more than $7 billion, an amount that we in Baltimore, at least, consider to be a lot of money. The tragedy, however, is not just the waste of so much money, but the dashing of so many hopes for meaningful improvement.

This is where the silk purse/sow’s ear analogy comes in. Each of the options among which SIG schools had to choose was composed of components that either lacked evidence of effectiveness or actually had evidence of ineffectiveness. If the components of each option are not known to be effective, then why would anyone expect a combination of them to be effective?

Evidence on school closure has found that this strategy diminishes student achievement for a few years, after which student performance returns to where it was before. Research on charter schools by CREDO (2013) has found an average effect size of zero for charters. The exception is “no-excuses” charters, such as KIPP and Success Academies, but these charters only accept students whose parents volunteer, not whole failing schools. Turnaround and transformation schools both require a change of principal, which introduces chaos and, as far as I know, has never been found to improve achievement. The same is true of replacing at least 50% of the teachers. Lots of chaos, no evidence of effectiveness. The other required elements of the popular “transformation” model have been found to have either no impact (e.g., benchmark assessments to inform teachers about progress; Inns et al., 2019), or small effects (e.g., lengthening the school day or year; Figlio et al., 2018). Most importantly, to blog_9-26-19_pig_500x336my knowledge, no one ever did a randomized evaluation of the entire transformation model, with all components included. We did not find out what the joint effect was until the Mathematica study. Guess what? Sewing together swatches of sows’ ears did not produce a silk purse. With a tiny proportion of $7 billion, the Department of Education could have identified and tested out numerous well-researched, replicable programs and then offered SIG schools a choice among the ones that worked best. A selection of silk purses, all made from 100% pure silk. Doesn’t that sound like a better idea?

In later blogs I’ll say more about how the federal government could ensure the success of educational initiatives by ensuring that schools have access to federal resources to adopt and implement proven programs designed to accomplish the goals of the legislation.

References

Figlio, D., Holden, K. L., & Ozek, U. (2018). Do students benefit from longer school days? Regression discontinuity evidence from Florida’s additional hour of literacy instruction. Economics of Education Review, 67, 171-183.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Why Not the Best?

In 1879, Thomas Edison invented the first practical lightbulb. The main problem he faced was in finding a filament that would glow, but not burn out too quickly. To find it, he tried more than 6000 different substances that had some promise as filaments. The one he found was carbonized cotton, which worked far better than all the others (tungsten, which we use now, came much later).

Of course, the incandescent light changed the world. It replaced far more expensive gas lighting systems, and was much more versatile. The lightbulb captured the evening and nighttime hours for every kind of human activity.

blog_9-19-19_lightbulb_500x347Yet if the lightbulb had been an educational innovation, it probably would have been proclaimed a dismal failure. Skeptics would have noted that only one out of six thousand filaments worked. Meta-analysts would have averaged the effect sizes for all 6000 experiments and concluded that the average effect size across the 6000 filaments was only +0.000000001. Hardly worthwhile. If Edison’s experiments were funded by government, politicians would have complained that 5,999 of Edison’s filaments were a total waste of taxpayers’ money. Economists would have computed benefit-cost ratios and concluded that even if Edison’s light worked, the cost of making the first one was astronomical, not to mention the untold cost of setting up electrical generation and wiring systems.

This is all ridiculous, you must be saying. But in the world of evidence-based education, comparable things happen all the time. In 2003, Borman et al. did a meta-analysis of 300 studies of 29 comprehensive (whole-school) reform designs. They identified three as having solid evidence of effectiveness. Rather than celebrating and disseminating those three (and continuing research and development to identify more of them), the U.S. Congress ended its funding for dissemination of comprehensive school reform programs. Turn out the light before you leave, Mr. Edison!

Another common practice in education is to do meta-analyses averaging outcomes across an entire category of programs or policies, and ignoring the fact that some distinctively different and far more effective programs are swallowed up in the averages. A good example is charter schools. Large-scale meta-analyses by Stanford’s CREDO (2013) found that the average effect sizes for charter schools are effectively zero. A 2015 analysis found better, but still very small effect sizes in urban districts (ES = +0.04 in reading, +0.05 in math). The What Works Clearinghouse published a 2010 review that found slight negative effects of middle school charters. These findings are useful in disabusing us of the idea that charter schools are magic, and get positive outcomes just because they are charter schools. However, they do nothing to tell us about extraordinary charter schools using methods that other schools (perhaps including non-charters) could also use. There is more positive evidence relating to “no-excuses” schools, such as KIPP and Success Academies, but among the thousands of charters that now exist, is this the only type of charter worth replicating? There must be some bright lights among all these bulbs.

As a third example, there are now many tutoring programs used in elementary reading and math with struggling learners. The average effect sizes for all forms of tutoring average about +0.30, in both reading and math. But there are reading tutoring approaches with effect sizes of +0.50 or more. If these programs are readily available, why would schools adopt programs less effective than the best? The average is useful for research purposes, and there are always considerations of costs and availability, but I would think any school would want to ignore the average for all types of programs and look into the ones that can do the most for their kids, at a reasonable cost.

I’ve often heard teachers and principals point out that “parents send us the best kids they have.” Yes they do, and for this reason it is our responsibility as educators to give those kids the best programs we can. We often describe educating students as enlightening them, or lifting the lamp of learning, or fiat lux. Perhaps the best way to fiat a little more lux is to take a page from Edison, the great luxmeister: Experiment tirelessly until we find what works. Then use the best we have.

Reference

Borman, G.D., Hewes, G. M., Overman, L.T., & Brown, S. (2003). Comprehensive school reform and achievement: A meta-analysis. Review of Educational Research, 73(2), 125-230.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

Charter Schools? Smarter Schools? Why Not Both?

I recently saw an editorial in the May 29 Washington Post, entitled “Denying Poor Children a Chance,” a pro-charter school opinion piece that makes dire predictions about the damage to poor and minority students that would follow if charter expansion were to be limited.  In education, it is common to see evidence-free opinions for and against charter schools, so I was glad to see actual data in the Post editorial.   In my view, if charter schools could routinely and substantially improve student outcomes, especially for disadvantaged students, I’d be a big fan.  My response to charter schools is the same as my response to everything else in education: Show me the evidence.

The Washington Post editorial cited a widely known 2015 Stanford CREDO study comparing urban charter schools to matched traditional public schools (TPS) in the same districts.  Evidence always attracts my attention, so I decided to look into this and other large, multi-district studies. Despite the Post’s enthusiasm for the data, the average effect size was only +0.055 for math and +0.04 for reading.  By anyone’s standards, these are very, very small outcomes.  Outcomes for poor, urban, African American students were somewhat higher, at +0.08 for math and +0.06 for reading, but on the other hand, average effect sizes for White students were negative, averaging -0.05 for math and -0.02 for reading.  Outcomes were also negative for Native American students: -0.10 for math, zero for reading.  With effect sizes so low, these small differences are probably just different flavors of zero.  A CREDO (2013) study of charter schools in 27 states, including non-urban as well as urban schools, found average effect sizes of +0.01 for math and -0.01 for reading. How much smaller can you get?

In fact, the CREDO studies have been widely criticized for using techniques that inflate test scores in charter schools.  They compare students in charter schools to students in traditional public schools, matching on pretests and ethnicity.  This ignores the obvious fact that students in charter schools chose to go there, or their parents chose for them to go.  There is every reason to believe that students who choose to attend charter schools are, on average, higher-achieving, more highly motivated, and better behaved than students who stay in traditional public schools.  Gleason et al. (2010) found that students who applied to charter schools started off 16 percentage points higher in reading and 13 percentage points higher in math than others in the same schools who did not apply.  Applicants were more likely to be White and less likely to be African American or Hispanic, and they were less likely to qualify for free lunch.  Self-selection is a particular problem in studies of students who choose or are sent to “no-excuses” charters, such as KIPP or Success Academies, because the students or their parents know students will be held to very high standards of behavior and accomplishment, and may be encouraged to leave the school if they do not meet those standards (this is not a criticism of KIPP or Success Academies, but when such charter systems use lotteries to select students, the students who show up for the lotteries were at least motivated to participate in a lottery to attend a very demanding school).

Well-designed studies of charter schools usually focus on schools that use lotteries to select students, and then they compare the students who were successful in the lottery to those who were not so lucky.  This eliminates the self-selection problem, as students were selected by a random process.  The CREDO studies do not do this, and this may be why their studies report higher (though still very small) effect sizes than those reported by syntheses of studies of students who all applied to charters, but may have been “lotteried in” or “lotteried out” at random.  A very rigorous WWC synthesis of such studies by Gleason et al. (2010) found that middle school students who were lotteried into charter schools in 32 states performed non-significantly worse than those lotteried out, in math (ES=-0.06) and in reading (ES=-0.08).  A 2015 update of the WWC study found very similar, slightly negative outcomes in reading and math.

It is important to note that “no-excuses” charter schools, mentioned earlier, have had more positive outcomes than other charters.  A recent review of lottery studies by Cheng et al. (2017) found effect sizes of +0.25 for math and +0.17 for reading.  However, such “no-excuses” charters are a tiny percentage of all charters nationwide.

blog_6-5-19_schoolmortorbd_500x422

Other meta-analyses of studies of achievement outcomes of charter schools also exist, but none found effect sizes as high as the CREDO urban study.  The means of +0.055 for math and +0.04 for reading represent upper bounds for effects of urban charter schools.

Charter Schools or Smarter Schools?

So far, every study of achievement effects of charters has focused on impacts of charters on achievement compared to those of traditional public schools.  However, this should not be the only question.  “Charters” and “non-charters” do not exhaust the range of possibilities.

What if we instead ask this question: Among the range of programs available, which are most likely to be most effective at scale?

To illustrate the importance of this question, consider a study in England, which evaluated a program called Engaging Parents Through Mobile Phones.  The program involves texting parents on cell phones to alert them to upcoming tests, inform them about whether students are completing their homework, and tell them what students were being taught in school.  A randomized evaluation (Miller et al, 2017) found effect sizes of +0.06 for math and +0.03 for reading, remarkably similar to the urban charter school effects reported by CREDO (2015).  The cost of the mobile phone program was £6 per student per year, or $7.80.  If you like the outcomes of charter schools, might you prefer to get the same outcomes for $7.80 per child per year, without all the political, legal, and financial stresses of charter schools?

The point here is that rather than arguing about the size of small charter effects, one could consider charters a “treatment” and compare them to other proven approaches.  In our Evidence for ESSA website, we list 112 reading and math programs that meet ESSA standards for “Strong,” “Moderate,” or “Promising” evidence of effectiveness.  Of these, 107 had effect sizes larger than those CREDO (2015) reports for urban charter schools.  In both math and reading, there are many programs with average effect sizes of +0.20, +0.30, up to more than +0.60.  If applied as they were in the research, the best of these programs could, for example, entirely overcome Black-White and Hispanic-White achievement gaps in one or two years.

A few charter school networks have their own proven educational approaches, but the many charters that do not have proven programs should be looking for them.  Most proven programs work just as well in charter schools as they do in traditional public schools, so there is no reason existing charter schools should not proactively seek proven programs to increase their outcomes.  For new charters, wouldn’t it make sense for chartering agencies to encourage charter applicants to systematically search for and propose to adopt programs that have strong evidence of effectiveness?  Many charter schools already use proven programs.  In fact, there are several that specifically became charters to enable them to adopt or maintain our Success for All whole-school reform program.

There is no reason for any conflict between charter schools and smarter schools.  The goal of every school, regardless of its governance, should be to help students achieve their full potential, and every leader of a charter or non-charter school would agree with this. Whatever we think about governance, all schools, traditional or charter, should get smarter, using proven programs of all sorts to improve student outcomes.

References

Cheng, A., Hitt, C., Kisida, B., & Mills, J. N. (2017). “No excuses” charter schools: A meta-analysis of the experimental evidence on student achievement. Journal of School Choice, 11 (2), 209-238.

Clark, M.A., Gleason, P. M., Tuttle, C. C., & Silverberg, M. K., (2015). Do charter schools improve student achievement? Educational Evaluation and Policy Analysis, 37 (4), 419-436.

Gleason, P.M., Clark, M. A., Tuttle, C. C., & Dwoyer, E. (2010).The evaluation of charter school impacts. Washington, DC: What Works Clearinghouse.

Miller, S., Davison, J, Yohanis, J., Sloan, S., Gildea, A., & Thurston, A. (2016). Texting parents: Evaluation report and executive summary. London: Education Endowment Foundation.

Washington Post: Denying poor children a chance. [Editorial]. (May 29, 2019). The Washington Post, A16.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.