What if a Sears Catalogue Married Consumer Reports?

blog_3-15-18_familyreading_500x454When I was in high school, I had a summer job delivering Sears catalogues. I borrowed my mother’s old Chevy station wagon and headed out fully laden into the wilds of the Maryland suburbs of Washington.

I immediately learned something surprising. I thought of a Sears catalogue as a big book of advertisements. But the people to whom I was delivering them often saw it as a book of dreams. They were excited to get their catalogues. When a neighborhood saw me coming, I became a minor celebrity.

Thinking back on those days, I was thinking about our Evidence for ESSA website (www.evidenceforessa.org). I realized that what I wanted it to be was a way to communicate to educators the wonderful array of programs they could use to improve outcomes for their children. Sort of like a Sears catalogue for education. However, it provides something that a Sears catalogue does not: Evidence about the effectiveness of each catalogue entry. Imagine a Sears catalogue that was married to Consumer Reports. Where a traditional Sears catalogue describes a kitchen gadget, “It slices and dices, with no muss, no fuss!”, the marriage with Consumer Reports would instead say, “Effective at slicing and dicing, but lots of muss. Also fuss.”

If this marriage took place, it might take some of the fun out of the Sears catalogue (making it a book of realities rather than a book of dreams), but it would give confidence to buyers, and help them make wise choices. And with proper wordsmithing, it could still communicate both enthusiasm, when warranted, and truth. But even more, it could have a huge impact on the producers of consumer goods, because they would know that their products would need to be rigorously tested and found to be able to back up their claims.

In enhancing the impact of research on the practice of education, we have two problems that have to be solved. Just like the “Book of Dreams,” we have to help educators know the wonderful array of programs available to them, programs they may never had heard of. And beyond the particular programs, we need to build excitement about the opportunity to select among proven programs.

In education, we make choices not for ourselves, but on behalf of our children. Responsible educators want to choose programs and practices that improve the achievement of their students. Something like a marriage of the Sears catalogue and Consumer Reports is necessary to address educators’ dreams and their need for information on program outcomes. Users should be both excited and informed. Information usually does not excite. Excitement usually does not inform. We need a way to do both.

In Evidence for ESSA, we have tried to give educators a sense that there are many solutions to enduring instructional problems (excitement), and descriptions of programs, outcomes, costs, staffing requirements, professional development, and effects for particular subgroups, for example (information).

In contrast to Sears catalogues, Evidence for ESSA is light (Sears catalogues were huge, and ultimately broke the springs on my mother’s station wagon). In contrast to Consumer Reports, Evidence for ESSA is free.  Every marriage has its problems, but our hope is that we can capture the excitement and the information from the marriage of these two approaches.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Picture source: Nationaal Archief, the Netherlands

 

Advertisements

Getting the Best Mileage from Proven Programs

Race carWouldn’t you love to have a car that gets 200 miles to the gallon? Or one that can go hundreds of miles on a battery charge? Or one that can accelerate from zero to sixty twice as fast as any on the road?

Such cars exist, but you can’t have them. They are experimental vehicles or race cars that can only be used on a track or in a lab. They may be made of exotic materials, or may not carry passengers or groceries, or may be dangerous on real roads.

In working on our Evidence for ESSA website (www.evidenceforessa.org), we see a lot of studies that are like these experimental cars. For example, there are studies of programs in which the researcher or her graduate students actually did the teaching, or in which students used innovative technology with one adult helper for every machine or every few machines. Such studies are fine for theory building or as pilots, but we do not accept them for Evidence for ESSA, because they could never be replicated in real schools.

However, there is a much more common situation to which we pay very close attention. These are studies in which, for example, teachers receive a great deal of training and coaching, but an amount that seems replicable, in principle. For example, we would reject a study in which the experimenters taught the program, but not one in which they taught ordinary teachers how to use the program.

In such studies, the problem comes in dissemination. If studies validating a program provided a lot of professional development, we would accept it only if the disseminator provides a similar level of professional development, and their estimates of cost and personnel take this level of professional development into account. We put on our website clear expectations that these services be provided at a level similar to what was provided in the research, if the positive outcomes seen in the research are to be obtained.

The problem is that disseminators often offer schools a form of the program that was never evaluated, to keep costs low. They know that schools don’t like to spend a lot on professional development, and they are concerned that if they require the needed levels of PD or other services or materials, schools won’t buy their program. At the extreme end of this, there are programs that were successfully evaluated using extensive professional development, and then put their teacher’s manual on the web for schools to use for free.

A recent study of a program called Mathalicious illustrated the situation. Mathalicious is an on-line math course for middle school. An evaluation found that teachers randomly assigned to just get a license, with minimal training, did not obtain significant positive impacts, compared to a control group. Those who received extensive on-line training, however, did see a significant improvement in math scores, compared to controls.

When we write our program descriptions, we compare program implementation details in the research to what is said or required on the program’s website. If these do not match, within reason, we try to make it clear what were the key elements necessary for success.

Going back to the car analogy, our procedures eliminate those amazing cars that can only operate on special tracks, but we accept cars that can run on streets, carry children and groceries, and generally do what cars are expected to do. But if outstanding cars require frequent recharging, or premium gasoline, or have other important requirements, we’ll say so, in consultation with the disseminator.

In our view, evidence in education is not for academics, it’s for kids. If there is no evidence that a program as disseminated benefits kids, we don’t want to mislead educators who are trying to use evidence to benefit their children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence-Based Does Not Equal Evidence-Proven

Chemist

As I speak to educational leaders about using evidence to help them improve outcomes for students, there are two words I hear all the time that give me the fantods (as Mark Twain would say):

Evidence-based

            I like the first word, “evidence,” just fine, but the second word, “based,” sort of negates the first one. The ESSA evidence standards require programs that are evidence-proven, not just evidence-based, for various purposes.

“Evidence-proven” means that a given program, practice, or policy has been put to the test. Ideally, students, teachers, or schools have been assigned at random to use the experimental program or to remain in a control group. The program is provided to the experimental group for a significant period of time, at least a semester, and then final performance on tests that are fair to both groups are compared, using appropriate statistics.

If your doctor gives you medicine, it is evidence proven. It isn’t just the same color or flavor as something proven, it isn’t just generally in line with what research suggests might be a good idea. Instead, it has been found to be effective, compared to current standards of care, in rigorous studies.

“Evidence-based,” on the other hand, is one of those wiggle words that educators love to use to indicate that they are up-to-date and know what’s expected, but don’t actually intend to do anything different from what they are doing now.

Evidence-based is today’s equivalent of “based on scientifically-based research” in No Child Left Behind. It sure sounded good, but what educational program or practice can’t be said to be “based on” some scientific principle?

In a recent Brookings article Mark Dynarski wrote about state ESSA plans, and conversations he’s heard among educators. He says that the plans are loaded with the words “evidence-based,” but with little indication of what specific proven programs they plan to implement, or how they plan to identify, disseminate, implement, and evaluate them.

I hope the ESSA evidence standards give leaders in even a few states the knowledge and the courage to insist on evidence-proven programs, especially in very low-achieving “school improvement” schools that desperately need the very best approaches. I remain optimistic that ESSA can be used to expand evidence-proven practices. But will it in fact have this impact? That remains to be proven.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Why the What Works Clearinghouse Matters

In 1962, the most important breakthrough in modern medicine took place. It was not a drug, not a device, not a procedure. It did not immediately save a single life, or cure a single person of disease. But it profoundly changed medicine worldwide, and led to the rapid progress in all of medicine that we have come to take for granted.

This medical miracle was a law, passed in the U.S. Congress, called the Kefauver-Harris Drug Act. It required that drugs sold in the U.S. be proven safe and effective, in high-quality randomized experiments. This law was introduced by Senator Estes Kefauver of Tennessee, largely in response to the thalidomide disaster, when a widely used drug was found to produce disastrous birth defects.

From the moment the Act was passed, medical research changed utterly. The number of randomized experiments shot up. There are still errors and debates and misplaced enthusiasm, but the progress that has made in every area of medicine is undeniable. Today, it is unthinkable in medicine that any drug would be widely sold if it has not been proven to work. Even though Kefauver-Harris itself only applies to the U.S., all advanced countries now have similar laws requiring rigorous evidence of safety and effectiveness of medicines.

One of the ways the Kefauver-Harris Act made its impact was through reviews and publications of research on the evidence supporting the safety and efficacy of medicines. It’s no good having a law requiring strong evidence if only experts know what the evidence is. Many federal programs have sprung up over the years to review the evidence of what works and communicate it to front-line practitioners.

In education, we are belatedly going through our own evidence revolution. Since 2002, the function of communicating the findings of rigorous research in education has mostly been fulfilled by the What Works Clearinghouse (WWC), a website maintained by the U.S. Department of Education’s Institute of Education Sciences (IES). The existence of the WWC has been enormously beneficial. In addition to reviewing the evidence base for educational programs, the WWC’s standards set norms for research. No funder and no researcher wants to invest resources in a study they know the WWC will not accept.

In 2015, education finally had what may be its own Kefauver-Harris moment. This was the passage by the U.S. Congress of the Every Student Succeeds Act (ESSA), which contains specific definitions of strong, moderate, and promising levels of evidence. For certain purposes, especially for school improvement funding for very low-achieving schools, schools must use programs that meet ESSA evidence standards. For others, schools or districts can receive bonus points on grant applications if they use proven programs.

ESSA raises the stakes for evidence in education, and therefore should have raised the stakes for the WWC. If the government itself now requires or incentivizes the use of proven programs, then shouldn’t the government provide information on what individual programs meet those standards?

Yet several months after ESSA was passed, IES announced that the WWC would not be revised to align itself with ESSA evidence standards. This puts educators, and the government itself, in a bind. What if ESSA and WWC conflict? The ESSA standards are in law, so they must prevail over the WWC. Yet the WWC has a website, and ESSA does not. If WWC standards and ESSA standards were identical, or nearly so, this would not be a problem. But in fact they are very far apart.

Anticipating this situation, my colleagues and I at Johns Hopkins University created a new website, www.evidenceforessa.org. It launched in February, 2017, including elementary and secondary reading and math. We are now adding other subjects and grade levels.

In creating our website, we draw from the WWC every day, and in particular use a new Individual Study Database (ISD) that contains information on all of the evaluations the WWC has ever accepted.

The ISD is a useful tool for us, but it has made it relatively easy to ask and answer questions about the WWC itself, and the answers are troubling. We’ve found that almost half of the WWC outcomes rated “positive” or “potentially positive” are not even statistically significant. We have found that measures made by researchers or developers produce effect sizes more than three times those that are independent, yet they are fully accepted by the WWC.

As reported in a recent blog, we’ve discovered that the WWC is very, very slow to add new studies to its main “Find What Works” site. The WWC science topic is not seeking or accepting new studies (“This area is currently inactive and not conducting reviews”). Character education, dropout prevention, and English Language Learners are also inactive. How does this make any sense?

Over the next couple of months, starting in January, I will be releasing a series of blogs sharing what we have been finding out about the WWC. My hope in this is that we can help create a dialogue that will lead the WWC to reconsider many of its core policies and practices. I’m doing this not to compete or conflict with the WWC, but to improve it. If evidence is to have a major role in education policy, government has to help educators and policy makers make good choices. That is what the WWC should be doing, and I still believe it is possible.

The WWC matters, or should matter, because it expresses government’s commitment to evidence, and evidence-based reform. But it can only be a force for good if it is right, timely, accessible, comprehensible, and aligned with other government initiatives. I hope my upcoming blogs will be read in the spirit in which they were written, with hopes of helping the WWC do a better job of communicating evidence to educators eager to help young people succeed in our schools.

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Title I: A 20% Solution

Here’s an idea that would cost nothing and profoundly shift education funding and the interest of educators and policy makers toward evidence-proven programs. Simply put, the idea is to require that schools receiving Title I funds use 20% of the total on programs that meet at least a moderate standard of evidence. Two thin dimes on the dollar could make a huge difference in all of education.

In terms of federal education policy, Title I is the big kahuna. At $15 billion per year, it is the largest federal investment in elementary and secondary education, and it has been very politically popular on both sides of the aisle since the Johnson administration in 1965, when the Elementary and Secondary Education Act (ESEA) was first passed. Title I has been so popular because it goes to every congressional district, and provides much-needed funding by formula to help schools serving children living in poverty. Since the reauthorization of ESEA as the Every Student Succeeds Act in 2015, Title I remains the largest expenditure.

In ESSA and other federal legislation, there are two kinds of funding. One is formula funding, like Title I, where money usually goes to states and is then distributed to districts and schools. The formula may adjust for levels of poverty and other factors, but every eligible school gets its share. The other kind of funding is called competitive, or discretionary funding. Schools, districts, and other entities have to apply for competitive funding, and no one is guaranteed a share. In many cases, federal funds are first given to states, and then schools or districts apply to their state departments of education to get a portion of it, but the state has to follow federal rules in awarding the funds.

Getting proven programs into widespread use can be relatively easy in competitive grants. Competitive grants are usually evaluated on a 100-point scale, with all sorts of “competitive preference points” for certain categories of applicants, such as for rural locations, inclusion of English language learners or children of military families, and so on. These preferences add perhaps one to five points to a proposal’s score, giving such applicants a leg up but not a sure thing. In the same way, I and others have proposed adding competitive preference points in competitive proposals for applicants who propose to adopt programs that meet established standards of evidence. For example, Title II SEED grants for professional development now require that applicants propose to use programs found to be effective in at least one rigorous study, and give five points if the programs have been proven effective in at least two rigorous studies. Schools qualifying for school improvement funding under ESSA are now required to select programs that meet ESSA evidence standards.

Adding competitive preference points for using proven programs in competitive grants is entirely sensible and pain-free. It costs nothing, and does not require applicants to use any particular program. In fact, applicants can forego the preference points entirely, and hope to win without them. Preference points for proven programs is an excellent way to nudge the field toward evidence-based reform without top-down mandates or micromanagement. The federal government states a preference for proven programs, which will at least raise their profile among grant writers, but no school or district has to do anything different.

The much more difficult problem is how to get proven programs into formula funding (such as Title I). The great majority of federal funds are awarded by formula, so restricting evidence-based reform to competitive grants is only nibbling at the edges of practice. One solution to this would be to allocate incentive grants to districts if they agree to use formula funds to adopt and implement proven programs.

However, incentives cost money. Instead, imagine that districts and schools get their Title I formula funds, as they have since 1965. However, Congress might require that districts use at least 20% of their Title I, Part A funding to adopt and implement programs that meet a modest standard of evidence, similar to the “moderate” level in ESSA (which requires one quasi-experimental study with positive effects). The adopted program could be anything that meets other Title I requirements—reading, math, tutoring, technology—except that the program has to have evidence of effectiveness. The funds could pay for necessary staffing, professional development, materials, software, hardware, and so on. Obviously, schools could devote more than 20% if they choose to do so.

There are several key advantages to this 20% solution. First, of course, children would immediately benefit from receiving programs with at least moderate evidence of effectiveness. Second, the process would instantly make leaders of the roughly 55,000 Title I schools intensely interested in evidence. Third, the process could gradually shift discussion about Title I away from its historical focus on “how much?” to an additional focus on “for what purpose?” Publishers, software developers, academics, philanthropy, and government itself would perceive the importance of evidence, and would commission or carry out far more high-quality studies to meet the new standards. Over time, the standards of evidence might increase.

All of this would happen at no additional cost, and with a minimum of micromanagement. There are now many programs that would meet the “moderate” standards of evidence in reading, math, tutoring, whole-school reform, and other approaches, so schools would have a wide choice. No Child Left Behind required that low-performing schools devote 20% of their Title I funding to after-school tutoring programs and student transfer policies that research later showed to make little or no difference in outcomes. Why not spend the same on programs that are proven to work in advance, instead of once again rolling the dice with the educational futures of at-risk children?

20% of Title I is a lot of money, but if it can make 100% of Title I more impactful, it is more than worthwhile.

Half a Worm: Why Education Policy Needs High Evidence Standards

There is a very old joke that goes like this:

What’s the second-worst thing to find in your apple?  A worm.

What’s the worst?  Half a worm.

The ESSA evidence standards provide clearer definitions of “strong,” “moderate,” and “promising” levels of evidence than have ever existed in law or regulation. Yet they still leave room for interpretation.  The problem is that if you define evidence-based too narrowly, too few programs will qualify.  But if you define evidence-based too broadly, it loses its meaning.

We’ve already experienced what happens with a too-permissive definition of evidence.  In No Child Left Behind, “scientifically-based research” was famously mentioned 110 times.  The impact of this, however, was minimal, as everyone soon realized that the term “scientifically-based” could be applied to just about anything.

Today, we are in a much better position than we were in 2002 to insist on relatively strict evidence of effectiveness, both because we have better agreement about what constitutes evidence of effectiveness and because we have a far greater number of programs that would meet a high standard.  The ESSA definitions are a good consensus example.  Essentially, they define programs with “strong evidence of effectiveness” as those with at least one randomized study showing positive impacts using rigorous methods, and “moderate evidence of effectiveness” as those with at least one quasi-experimental study.  “Promising” is less well-defined, but requires at least one correlational study with a positive outcome.

Where the half-a-worm concept comes in, however, is that we should not use a broader definition of “evidence-based”.  For example, ESSA has a definition of “strong theory.”  To me, that is going too far, and begins to water down the concept.  What program in all of education cannot justify a “strong theory of action”?

Further, even in the top categories, there are important questions about what qualifies. In school-level studies, should we insist on school-level analyses (i.e., HLM)? Every methodologist would say yes, as I do, but this is not specified. Should we accept researcher-made measures? I say no, based on a great deal of evidence indicating that such measures inflate effects.

Fortunately, due to investments made by IES, i3, and other funders, the number of programs that meet strict standards has grown rapidly. Our Evidence for ESSA website (www.evidenceforessa.org) has so far identified 101 PK-12 reading and math programs, using strict standards consistent with ESSA definitions. Among these, more than 60% meet the “strong” standard. There are enough proven programs in every subject and grade level to give educators choices among proven programs. And we add more each week.

This large number of programs meeting strict evidence standards means that insisting on rigorous evaluations, within reason, does not mean that we end up with too few programs to choose among. We can have our apple pie and eat it, too.

I’d love to see federal programs of all kinds encouraging use of programs with rigorous evidence of effectiveness.  But I’d rather see a few programs that meet a strict definition of “proven” than to see a lot of programs that only meet a loose definition.  20 good apples are much better than applesauce of dubious origins!

This blog is sponsored by the Laura and John Arnold Foundation

Proven Tutoring Approaches: The Path to Universal Proficiency

There are lots of problems in education that are fundamentally difficult. Ensuring success in early reading, however, is an exception. We know what skills children need in order to succeed in reading. No area of teaching has a better basis in high-quality research. Yet the reading performance of America’s children is not improving at an adequate pace. Reading scores have hardly changed in the past decade, and gaps between white, African-American, and Hispanic students have been resistant to change.
In light of the rapid growth in the evidence base, and of the policy focus on early reading at the federal and state levels, this is shameful. We already know a great deal about how to improve early reading, and we know how to learn more. Yet our knowledge is not translating into improved practice and improved outcomes on a large enough scale.
There are lots of complex problems in education, and complex solutions. But here’s a really simple solution:

 

Over the past 30 years researchers have experimented with all sorts of approaches to improve students’ reading achievement. There are many proven and promising classroom approaches, and such programs should be used with all students in initial teaching as broadly as possible. Effective classroom instruction, universal access to eyeglasses, and other proven approaches could surely reduce the number of students who need tutors. But at the end of the day, every child must read well. And the only tool we have that can reliably make a substantial difference at scale with struggling readers is tutors, using proven one-to-one or small-group methods.

I realized again why tutors are so important in a proposal I’m making to the State of Maryland, which wants to bring all or nearly all students to “proficient” on its state test, the PARCC. “Proficient” on the PARCC is a score of 750, with a standard deviation of about 50. The state mean is currently around 740. I made a colorful chart (below) showing “bands” of scores below 750 to show how far students have to go to get to 750.

 

Each band covers an effect size of 0.20. There are several classroom reading programs with effect sizes this large, so if schools adopted them, they could move children scoring at 740 to 750. These programs can be found at www.evidenceforessa.org. But implementing these programs alone still leaves half of the state’s children not reaching “proficient.”

What about students at 720? They need 30 points, or +0.60. The best one-to-one tutoring can achieve outcomes like this, but these are the only solutions that can.

Here are mean effect sizes for various reading tutoring programs with strong evidence:

 

 

As this chart shows, one-to-one tutoring, by well-trained teachers or paraprofessionals using proven programs, can potentially have the impacts needed to bring most students scoring 720 (needing 30 points or an effect size of +0.60) to proficiency (750). Three programs have reported effect sizes of at least +0.60, and several others have approached this level. But what about students scoring below 720?

So far I’ve been sticking to established facts, studies of tutoring that are, in most cases, already being disseminated. Now I’m entering the region of well-justified supposition. Almost all studies of tutoring occupy just one year or less. But what if the lowest achievers could receive multiple years of tutoring, if necessary?

One study, over 2½ years, did find an effect size of +0.68 for one-to-one tutoring. Could we do better that that? Most likely. In addition to providing multiple years of tutoring, it should be possible to design programs to achieve one-year effect sizes of +1.00 or more. These may incorporate technology or personalized approaches specific to the needs of individual children. Using the best programs for multiple years, if necessary, could increase outcomes further. Also, as noted earlier, using proven programs other than tutoring for all students may increase outcomes for students who also receive tutoring.

But isn’t tutoring expensive? Yes it is. But it is not as expensive as the costs of reading failure: Remediation, special education, disappointment, and delinquency. If we could greatly improve the reading performance of low achievers, this would of course reduce inequities across the board. Reducing inequities in educational outcomes could reduce inequities in our entire society, an outcome of enormous importance.

Even providing a substantial amount of teacher tutoring could, by my calculations, increase total state education expenditures (in Maryland) by only about 12%. These costs could be reduced greatly or even eliminated by reducing expenditures on ineffective programs, reducing special education placements, and other savings. Having some tutoring done by part time teachers may reduce costs. Using small-group tutoring (fewer than 6 students at a time) for students with milder problems may save a great deal of money. Even at full cost, the necessary funding could be phased in over a period of 6 years at 2% a year.

The bottom line is that the low levels of achievement and high levels of gaps according to economic and racial differences could be improved a great deal using methods already proven to be effective and already widely available. Educators and policy makers are always promising policies that bring every child to proficiency: “No Child Left Behind” and “Every Student Succeeds” come to mind. Yet if these outcomes are truly possible, why shouldn’t we be pursuing them, with every resource at our disposal?