Tutoring Works. But Let’s Learn How It Can Work Better and Cheaper

I was once at a meeting of the British Education Research Association, where I had been invited to participate in a debate about evidence-based reform. We were having what journalists often call “a frank exchange of views” in a room packed to the rafters.

At one point in the proceedings, a woman stood up and, in a furious tone of voice, informed all and sundry that (I’m paraphrasing here) “we don’t need to talk about all this (very bad word). Every child should just get Reading Recovery.” She then stomped out.

I don’t know how widely her view was supported in the room or anywhere else in Britain or elsewhere, but what struck me at the time, and what strikes even more today, is the degree to which Reading Recovery has long defined, and in many ways limited, discussions about tutoring. Personally, I have nothing against Reading Recovery, and I have always admired the commitment Reading Recovery advocates have had to professional development and to research. I’ve also long known that the evidence for Reading Recovery is very impressive, but you’d be amazed if one-to-one tutoring by well-trained teachers did not produce positive outcomes. On the other hand, Reading Recovery insists on one-to-one instruction by certified teachers with a lot of cost for all that admirable professional development, so it is very expensive. A British study estimated the cost per child at $5400 (in 2018 dollars). There are roughly one million Year 1 students in the U.K., so if the angry woman had her way, they’d have to come up with the equivalent of $5.4 billion a year. In the U.S., it would be more like $27 billion a year. I’m not one to shy away from very expensive proposals if they provide also extremely effective services and there are no equally effective alternatives. But shouldn’t we be exploring alternatives?

If you’ve been following my blogs on tutoring, you’ll be aware that, at least at the level of research, the Reading Recovery monopoly on tutoring has been broken in many ways. Reading Recovery has always insisted on certified teachers, but many studies have now shown that well-trained teaching assistants can do just as well, in mathematics as well as reading. Reading Recovery has insisted that tutoring should just be for first graders, but numerous studies have now shown positive outcomes of tutoring through seventh grade, in both reading and mathematics. Reading Recovery has argued that its cost was justified by the long-lasting impacts of first-grade tutoring, but their own research has not documented long-lasting outcomes. Reading Recovery is always one-to-one, of course, but now there are numerous one-to-small group programs, including a one-to-three adaptation of Reading Recovery itself, that produce very good effects. Reading Recovery has always just been for reading, but there are now more than a dozen studies showing positive effects of tutoring in math, too.

blog_12-20-18_tutornkid_500x333

All of this newer evidence opens up new possibilities for tutoring that were unthinkable when Reading Recovery ruled the tutoring roost alone. If tutoring can be effective using teaching assistants and small groups, then it is becoming a practicable solution to a much broader range of learning problems. It also opens up a need for further research and development specific to the affordances and problems of tutoring. For example, tutoring can be done a lot less expensively than $5,400 per child, but it is still expensive. We created and evaluated a one-to-six, computer-assisted tutoring model that produced effect sizes of around +0.40 for $500 per child. Yet I just got a study from the Education Endowment Fund (EEF) in England evaluating one-to-three math tutoring by college students and recent graduates. They only provided tutoring one hour per week for 12 weeks, to sixth graders. The effect size was much smaller (ES=+0.19), but the cost was only about $150 per child.

I am not advocating this particular solution, but isn’t it interesting? The EEF also evaluated another means of making tutoring inexpensive, using online tutors from India and Sri Lanka, and another, using cross-age peer tutors, both in math. Both failed miserably, but isn’t that interesting?

I can imagine a broad range of approaches to tutoring, designed to enhance outcomes, minimize costs, or both. Out of that research might come a diversity of approaches that might be used for different purposes. For example, students in deep trouble, headed for special education, surely need something different from what is needed by students with less serious problems. But what exactly is it that is needed in each situation?

In educational research, reliable positive effects of any intervention are rare enough that we’re usually happy to celebrate anything that works. We might say, “Great, tutoring works! But we knew that.”  However, if tutoring is to become a key part of every school’s strategies to prevent or remediate learning problems, then knowing that “tutoring works” is not enough. What kind of tutoring works for what purposes?  Can we use technology to make tutors more effective? How effective could tutoring be if it is given all year or for multiple years? Alternatively, how effective could we make small amounts of tutoring? What is the optimal group size for small group tutoring?

We’ll never satisfy the angry woman who stormed out of my long-ago symposium at BERA. But for those who can have an open mind about the possibilities, building on the most reliable intervention we have for struggling learners and creating and evaluating effective and cost-effective tutoring approaches seems like a worthwhile endeavor.

Photo Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

How do Textbooks Fit Into Evidence-Based Reform?

In a blog I wrote recently, “Evidence, Standards, and Chicken Feathers,” I discussed my perception that states, districts, and schools, in choosing textbooks and other educational materials, put a lot of emphasis on alignment with standards, and very little on evidence of effectiveness.  My colleague Steve Ross objected, at least in the case of textbooks.  He noted that it was very difficult for a textbook to prove its effectiveness, because textbooks so closely resemble other textbooks that showing a difference between them is somewhere between difficult and impossible.  Since the great majority of classrooms use textbooks (paper or digital) or sets of reading materials that collectively resemble textbooks, the control group in any educational experiment is almost certainly also using a textbook (or equivalents).  So as evidence becomes more and more important, is it fair to hold textbooks to such a difficult standard of evidence? Steve and I had an interesting conversation about this point, so I thought I would share it with other readers of my blog.

blog_12-6-18_textbook_500x404

First, let me define a couple of key words.  Most of what schools purchase could be called commodities.  These include desks, lighting, carpets, non-electronic whiteboards, playground equipment, and so on. Schools need these resources to provide students with safe, pleasant, attractive places in which to learn. I’m happy to pay taxes to ensure that every child has all of the facilities and materials they need. However, no one should expect such expenditures to make a measurable difference in achievement beyond ordinary levels.

In contrast, other expenditures are interventions.  These include teacher preparation, professional development, innovative technology, tutoring, and other services clearly intended to improve achievement beyond ordinary levels.   Educators would generally agree that such investments should be asked to justify themselves by showing their effectiveness in raising achievement scores, since that is their goal.

By analogy, hospitals invest a great deal in their physical plants, furniture, lighting, carpets, and so on. These are all necessary commodities.   No one should have to go to a hospital that is not attractive, bright, airy, comfortable, and convenient, with plenty of parking.  These things may contribute to patients’ wellness in subtle ways, but no one would expect them to make major differences in patient health.  What does make a measurable difference is the preparation and training provided to the staff, medicines, equipment, and procedures, all of which can be (and are) constantly improved through ongoing research, development, and dissemination.

So is a textbook a commodity or an intervention?  If we accept that every classroom must have a textbook or its equivalent (such as a digital text), then a textbook is a commodity, just an ordinary, basic requirement for every classroom.  We would expect textbooks-as-commodities to be well written, up-to-date, attractive, and pedagogically sensible, and, if possible, aligned with state and national standards.  But it might be unfair and perhaps futile to expect textbooks-as-commodities to significantly increase student achievement in comparison to business as usual, because they are, in effect, business as usual.

If, somehow, a print or digital textbook, with associated professional development, digital add-ons, and so forth, turns out to be significantly more effective than alternative, state-of-the-art textbooks, then a textbook could also be considered an intervention, and marketed as such.  It would then be considered in comparison to other interventions that exist only, or primarily, to increase achievement beyond ordinary levels.

The distinction between commodities and interventions would be academic but for the appearance of the ESSA evidence standards.  The ESSA law requires that schools seeking school improvement funding select and implement programs that meet one of the top three standards (strong, moderate, or promising). It gives preference points on other federal grants, especially Title II (professional development), to applicants who promise to implement proven programs. Some states have applied more stringent criteria, and some have extended use of the standards to additional funding initiatives, including state initiatives.  These are all very positive developments. However, they are making textbook publishers anxious. How are they going to meet the new standards, given that their products are not so different from others now in use?

My answer is that I do not think it was the intent of the ESSA standards to forbid schools from using textbooks that lack evidence of effectiveness. To do so would be unrealistic, as it would wipe out at least 90% of textbooks.  Instead, the purpose of the ESSA evidence standards was to encourage and incentivize the use of interventions proven to be effective.  The concept, I think, was to assume that other funding (especially state and local funds) would support the purchase of commodities, including ordinary textbooks.  In contrast, the federal role was intended to focus on interventions to boost achievement in high-poverty and low-achieving schools.  Ordinary textbooks that are no more effective than any others are clearly not appropriate for those purposes, where there is an urgent need for approaches proven to have significantly greater impacts than methods in use today.

It would be a great step forward if federal, state, and local funding intended to support major improvements in student outcomes were held to tough standards of evidence.  Such programs should be eligible for generous and strategic funding from federal, state, and local sources dedicated to the enhancement of student outcomes.  But no one should limit schools in spending their funds on attractive desks, safe and fun playground equipment, and well-written textbooks, even though these necessary commodities are unlikely to accelerate student achievement beyond current expectations.

Photo credit: Laurentius de Voltolina [Public domain]

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Nevada Places Its Bets on Evidence

blog_3-29-18_HooverDam_500x375In Nevada, known as the land of big bets, taking risks is what they do. The Nevada State Department of Education (NDE) is showing this in its approach to ESSA evidence standards .  Of course, many states are planning policies to encourage use of programs that meet the ESSA evidence standards, but to my knowledge, no state department of education has taken as proactive a stance in this direction as Nevada.

 

Under the leadership of their state superintendent, Steve Canavero, Deputy Superintendent Brett Barley, and Director of the Office of Student and School Supports Seng-Dao Keo, Nevada has taken a strong stand: Evidence is essential for our schools, they maintain, because our kids deserve the best programs we can give them.

All states are asked by ESSA to require strong, moderate, or promising programs (defined in the law) for low-achieving schools seeking school improvement funding. Nevada has made it clear to its local districts that it will enforce the federal definitions rigorously, and only approve school improvement funding for schools proposing to implement proven programs appropriate to their needs. The federal ESSA law also provides bonus points on various other applications for federal funding, and Nevada will support these provisions as well.

However, Nevada will go beyond these policies, reasoning that if evidence from rigorous evaluations is good for federal funding, why shouldn’t it be good for state funding too? For example, Nevada will require ESSA-type evidence for its own funding program for very high-poverty schools, and for schools serving many English learners. The state has a reading-by-third-grade initiative that will also require use of programs proven to be effective under the ESSA regulations. For all of the discretionary programs offered by the state, NDE will create lists of ESSA-proven supplementary programs in each area in which evidence exists.

Nevada has even taken on the holy grail: Textbook adoption. It is not politically possible for the state to require that textbooks have rigorous evidence of effectiveness to be considered state approved. As in the past, texts will be state adopted if they align with state standards. However, on the state list of aligned programs, two key pieces of information will be added: the ESSA evidence level and the average effect size. Districts will not be required to take this information into account, but by listing it on the state adoption lists the state leaders hope to alert district leaders to pay attention to the evidence in making their selections of textbooks.

The Nevada focus on evidence takes courage. NDE has been deluged with concern from districts, from vendors, and from providers of professional development services. To each, NDE has made the same response: we need to move our state toward use of programs known to work. This is worth undergoing the difficult changes to new partnerships and new materials, if it provides Nevada’s children better programs, which will translate into better achievement and a chance at a better life. Seng-Dao Keo describes the evidence movement in Nevada as a moral imperative, delivering proven programs to Nevada’s children and then working to see that they are well implemented and actually produce the outcomes Nevada expects.

Perhaps other states are making similar plans. I certainly hope so, but it is heartening to see one state, at least, willing to use the ESSA standards as they were intended to be used, as a rationale for state and local educators not just to meet federal mandates, but to move toward use of proven programs. If other states also do this, it could drive publishers, software producers, and providers of professional development to invest in innovation and rigorous evaluation of promising approaches, as it increases use of approaches known to be effective now.

NDE is not just rolling the dice and hoping for the best. It is actively educating its district and school leaders on the benefits of evidence-based reform, and helping them make wise choices. With a proper focus on assessments of needs, facilitating access to information, and assistance with ensuring high quality implementation, really promoting use of proven programs should be more like Nevada’s Hoover Dam: A sure thing.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Photo by: Michael Karavanov [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Money and Evidence

Many years ago, I spent a few days testifying in a funding equity case in Alabama. At the end of my testimony, the main lawyer for the plaintiffs drove me to the airport. “I think we’re going to win this case,” he said, “But will it help my clients?”

The lawyer’s question has haunted me ever since. In Alabama, then and now, there are enormous inequities in education funding in rich and poor districts due to differences in property tax receipts in different districts. There are corresponding differences in student outcomes. The same is true in most states. To a greater or lesser degree, most states and the federal government provide some funding to reduce inequalities, but in most places it is still the case that poor districts have to tax themselves at a higher rate to produce education funding that is significantly lower than that of their wealthier neighbors.

Funding inequities are worse than wrong, they are repugnant. When I travel in other countries and try to describe our system, it usually takes me a while to get people outside the U.S. to even understand what I am saying. “So schools in poor areas get less than those in wealthy ones? Surely that cannot be true.” In fact, it is true in the U.S., but in all of our peer countries, national or at least regional funding policies ensure basic equality in school funding, and in most cases I know about they then add additional funding on top of equalized funding for schools serving many children in poverty. For example, England has long had equal funding, and the Conservative government added “Pupil Premium” funding in which each disadvantaged child brings additional funds to his or her school. Pupil Premium is sort of like Title I in the U.S., if you can imagine Title I adding resources on top of equal funding, which it does in only a few U.S. states.

So let’s accept the idea that funding inequity is a BAD THING. Now consider this: Would eliminating funding inequities eliminate achievement gaps in U.S. schools? This gets back to the lawyer’s question. If we somehow won a national “case” that required equalizing school funding, would the “clients” benefit?

More money for disadvantaged schools would certainly be welcome, and it would certainly create the possibility of major advances. But in order to maximize the impact of significant additional funding, it all depends on what schools do with the added dollars. Of course you’d have to increase teachers’ salaries and reduce class sizes to draw highly qualified teachers into disadvantaged schools. But you’d also have to spend a significant portion of new funds to help schools implement proven programs with fidelity and verve.

Again, England offers an interesting model. Twenty years ago, achievement in England was very unequal, despite equal funding. Children of immigrants from Pakistan and Bangladesh, Africans, Afro-Caribbeans, and other minorities performed well below White British children. The Labour government implemented a massive effort to change this, starting with the London Challenge and continuing with a Manchester Challenge and a Black Country Challenge in the post-industrial Midlands. Each “challenge” provided substantial professional development to school staffs, as well as organizing achievement data to show school leaders that other schools with exactly the same demographic challenges were achieving far better results.

Today, children of Pakistani and Bangladeshi immigrants are scoring at the English mean. Children of African and Afro-Caribbean immigrants are just below the English mean. Policy makers in England are now turning their attention to White working-class boys. But the persistent and substantial gaps we see as so resistant to change in the U.S. are essentially gone in England.

Today, we are getting even smarter about how to turn dollars into enhanced achievement, due to investments by the Institute of Education Sciences (IES) and Investing in Innovation (i3) program in the U.S. and the Education Endowment Foundation (EEF) in England. In both countries, however, we lack the funding to put into place what we know how to do on a large enough scale to matter, but this need not always be the case.

Funding matters. No one can make chicken soup out of chicken feathers, as we say in Baltimore. But funding in itself will not solve our achievement gap. Funding needs to be spent on specific, high-impact investments to make a big difference.

Accelerating the Pace of Innovation

The biggest problem in evidence-based reform in education is that there are too few replicable programs that have strong evidence of effectiveness available to educators. The evidence provisions of the Every Student Succeeds Act (ESSA) encourage the use of programs that have strong, moderate, or promising evidence of effectiveness, and they require School Improvement efforts (formerly SIG) to include approaches with evidence that meets these definitions. There are significant numbers of programs that do meet these definitions, but not enough to give educators multiple choices of proven programs for each subject and grade level. The Institute for Education Sciences (IES), Investing in Innovation (i3) program, the National Science Foundation (NSF), and England’s Education Endowment Foundation (EEF) have all been supporting rigorous evaluations of replicable programs at all levels, and this work (and work funded by others) is progressively enriching offerings of programs that are both proven to be effective and ready for widespread dissemination. However, progress is slow. Large-scale randomized experiments demanded by these funders are expensive and may take many years to be completed. As in any scientific field (such as medicine), most experiments do not show positive outcomes for innovative treatments. At a time when demand is starting to pick up, the supply needs to keep pace.

Given that money is not being thrown at education research by Congress or other funders, how can promising innovations be evaluated, made ready for dissemination, and taken to scale? First, existing funders need to be supported adequately to continue the good work they are doing. Grants for Education Innovation and Research (EIR) will pick up where i3 ends, and IES needs to maintain its leadership in supporting development and evaluation of promising programs in all subjects and grade levels. The National Science Foundation should invest far more in creating, evaluating, and disseminating proven STEM approaches. All of this work, in fact, is in need of increased funding and publicity to build political and public support for the entire enterprise.

However, there are several additional avenues that might be pursued to increase the number of proven, ready-to-disseminate approaches. One promising model is low-cost randomized evaluations of interventions supported by government or other funding. Both IES and the Laura and John Arnold Foundation are offering support for such studies. For example, imagine that a school district is introducing a digital textbook to its schools, however, it only can afford to provide the program to 30 schools each year. If the district finds 60 schools willing to receive the program and randomly assigns half of them to start in a given year, then it is spending no more on digital textbooks than it planned to spend. If state test scores can be obtained and used as pre- and post-tests, then the measurement costs nothing. The only costs of studying the effects of the digital textbooks might be the costs of data analysis, perhaps some questionnaires or observations to find out what schools did with the digital textbooks, and a report. Such a study would be very inexpensive, might produce results within a year or two, and would be evaluating something that is appealing to schools and ready to go.

Beyond these existing strategies, others might be considered to speed up the proven programs process. One example might be to build on Small Business Innovation Research (SBIR) grants. At $1 million over two years, these grants, limited to for-profit companies, are often too small to develop and evaluate promising approaches (usually, technology applications). IES or other funders might proactively look for promising SBIR projects and encourage them to apply for larger funding to complete development and do rigorous evaluations. One advantage of SBIRs is that they are usually created by small, ambitious, undercapitalized companies, which are motivated to take their programs to scale.

Another strategy that might work could be to fund “aggregators” whose job would be to identify promising approaches from any source, help assemble partnerships if necessary, and then help prepare applications for funding. This could help young innovators with great ideas combine their efforts, create more complete and powerful innovations, and subject them to rigorous evaluations. In addition to SBIR-funded projects, promising program elements might be found in projects funded by private foundations or agencies outside of education. They might be components of IES or i3 projects that produced promising but not conclusive outcomes in their evaluations, perhaps due to insufficient sample size. Aggregators might link up programs with broad reach but limited technology with brash technology start-ups in need of access to markets. If the goal is finding promising but incomplete efforts and helping them reach effectiveness and scale, every source should be fair game.

Government has made extraordinary progress in promoting the development, rigorous evaluation, and scale-up of proven programs. However, its success has led to a demand for proven programs that it cannot fulfill at the usual pace. Current grant programs at IES and i3/EIR should continue, but in addition we need innovative strategies capable of greatly accelerating the pace of development, evaluation, and scale up.

Educationists and Economists

I used to work part time in England, and I’ve traveled around the world a good bit speaking about evidence-based reform in education and related topics. One of the things I find striking in country after country is that at the higher levels, education is not run by educators. It is run by economists.

In the U.S., this is also true, though it’s somewhat less obvious. The main committees in Congress that deal with education are the House Education and the Workforce Committee and the Senate Health, Education, Labor, and Pensions (HELP) Committee. Did you notice the words “workforce” and “labor”? That’s economists. Further, politicians listen to economists, because they consider them tough-minded, data-driven, and fact-friendly. Economists see education as contributing to the quality of the workforce, now and in the future, and this makes them influential with politicians.

A lot of the policy prescriptions that get widely discussed and implemented broadly are the sorts of things economists love to dream up. For example, they are partial to market incentives, new forms of governance, rewards and punishments, and social impact bonds. Individual economists, and the politicians who listen to them, take diverse positions on these policies, but the point is that economists rather than educators often set the terms of the debates on both sides. As one example, educators have been talking about long-term impacts of quality preschool for 30 years, but when Nobel Prize-winning economist James Heckman took up the call, preschool became a top priority of the Obama Administration.

I have nothing against economists. Some of my best friends are economists. But here is why I am bringing them up.

Evidence-based reform is creating a link between educationists and economists, and thereby to the politicians who listen to them, that did not exist before. Evidence-based reform speaks the language that economists insist on: randomized evaluations of replicable programs and practices. When an educator develops a program, successfully evaluates it at scale, and shows it can be replicated, this gives economists a tangible tool they can show will make a difference in policy. Other research designs are simply not as respected or accepted. But an economist with a proven program in hand has a measurable, powerful means to affect policy and help politicians make wise use of resources.

If we want educational innovation and research to matter to public policy, we have to speak truth to power, in the language of power. And that language is increasingly the language of rigorous evidence. If we keep speaking it, our friends the economists will finally take evidence from educational research seriously, and that is how policy will change to improve outcomes for children on a grand scale.

R&D That Makes a Difference

Over the course of my career, I’ve written a lot of proposals. I’ve also reviewed a lot, and mostly, I’ve seen many funded projects crash and burn, or produce a scholarly article or two that are never heard of again.

As evidence becomes more important in educational policy and practice, I think it’s time to rethink the whole process of funding for development, evaluation, and dissemination.

Here’s how the process works now at the federal level. The feds put out a Request for Proposals (RFP) in the Federal Register. It specifies the purpose of the grant, who is eligible, funding available, deadlines, and most importantly, the criteria on which the proposals will be judged. Proposal writers know that they must follow those criteria very carefully to make it easy for readers to know that each criterion has been satisfied.

The problem with the whole proposal system lies in the perception that each proposal starts with a perfect score (usually 100), and is then marked down for any deficiencies. To oversimplify, reviewers nitpick, and if there is much left after the nits have been picked, the proposal wins.

What this system rewards is enormous care and OCD-level attention to detail. It does not reward creativity, risk, insight, or actual utility for schools. Yet funding grants that do not move forward practice at any significant scale do not do much good in an applied field like education (in related fields such as psychology, purely basic research might justify such approaches, but in education this is a hard argument to make). Maybe our collective inability to do research that affects practice on a broad scale explains some of the lack of enthusiasm our political leadership has for research.

So what would I propose as an alternative? I’m so glad you asked. I’d propose that RFPs be explicitly structured to ask not, “Why shouldn’t we fund this proposal,” but, “Why should we?” That is, proposal writers should be asked to make a case for the potential importance of their work. Here’s a model set of evaluation standards to illustrate what I mean.

A. Significance
1. What are you planning to create?
2. What national problem does your proposed program potentially solve?
3. What outcomes do you expect to achieve, and why are these important?
4. Based on prior research by yourself and others, what is the likelihood that your program will produce the outcomes you expect?
5. What is the likelihood that, if your program is successful, it will work on a significant scale? What is your experience with working at scale or scaling up proven programs in educational settings?
6. In what way is your program creative or distinctive? How might it spark new thinking or development to solve longstanding problems in education?

B. Capabilities
1. Describe the organizational capabilities of the partners to this proposal, as well as the capabilities of the project leadership. Consider capabilities in the following areas:
a. development
b. roll-out, piloting
c. evaluation
d. reporting
e. scale-up
f. communications, marketing
2. Timelines, milestones

C. Evaluation
1. Research questions
2. Design, analysis

D. Impact
Given all you’ve written so far, summarize in one page why this project will make a substantial difference in educational practice and policy.

If we want research and development to produce useful solutions to educational problems, we have to ask the field for just that, and reward those able to produce, evaluate, and disseminate such solutions. Ironically, the federal funding stream closest to the ideal I’ve described is the Investing in Innovation (i3) program, which Congress may be about to shut down. i3 is at least focused on pragmatic solutions rather than theory-building and it has high standards of evidence. But if i3 survives or if it is replaced by another initiative to support innovation, development, evaluation, and scale-up of proven programs, I’d argue that it needs to focus even more on pragmatic issues of effectiveness and scale. Reviewers should be exclaiming, “I get it!” rather than “I gotcha!”