More Chinese Dragons: How the WWC Could Accelerate Its Pace

blog_4-26-18_chinesedragon_500x375

A few months ago, I wrote a blog entitled “The Mystery of the Chinese Dragon: Why Isn’t the WWC Up to Date?” It really had nothing to do with dragons, but compared the timeliness of the What Works Clearinghouse review of research on secondary reading programs and a Baye et al. (2017) review on the same topic. The graph depicting the difference looked a bit like a Chinese dragon with a long tail near the ground and huge jaws. The horizontal axis was the dates accepted studies had appeared, and the vertical axis was the number of studies. Here is the secondary reading graph.

blog_4-26-18_graph1_500x292

What the graph showed is that the WWC and the U.S. studies from the Baye et al. (2017) review were similar in coverage of studies appearing from 1987 to 2009, but after that diverged sharply, because the WWC is very slow to add new studies, in comparison to reviews using similar methods.

In the time since the Chinese Dragon for secondary reading studies appeared on my blog, my colleagues and I have completed two more reviews, one on programs for struggling readers by Inns et al. (2018) and one on programs for elementary math by Pellegrini et al. (2018). We made new Chinese Dragon graphs for each, which appear below.*

blog_4-26-18_graph3_500x300

blog_4-26-18_graph2_500x316

*Note: In the reading graph, the line for “Inns et al.” added numbers of studies from the Inns et al. (2018) review of programs for struggling readers to additional studies of programs for all elementary students in an unfinished report.

The new dragons look remarkably like the first. Again, what matters is the similar pattern of accepted studies before 2009, (the “tail”), and the sharply diverging rates in more recent years (the “jaws”).

There are two phenomena that cause the dragons’ “jaws” to be so wide open. The upper jaw, especially in secondary reading and elementary math, indicate that many high-quality rigorous evaluations are appearing in recent years. Both the WWC inclusion standards and those of the Best Evidence Encyclopedia (BEE; www.bestevidence.org) require control groups, clustered analysis for clustered designs, samples that are well-matched at pretest and have similar attrition by posttest, and other features indicating methodological rigor, of the kind expected by the ESSA evidence standards, for example.

The upper jaw of each dragon is increasing so rapidly because rigorous research is increasing rapidly in the U.S. (it is also increasing rapidly in the U.K., but the WWC does not include non-U.S. studies, and non-U.S. studies are removed from the graph for comparability). This increase is due to U. S. Department of Education funding of many rigorous studies in each topic area, through its Institute for Education Sciences (IES) and Investing in Innovation (i3) programs, and special purpose funding such as Striving Readers and Preschool Curriculum Education Research. These recent studies are not only uniformly rigorous, they are also of great importance to educators, as they evaluate current programs being actively disseminated today. Many of the older programs whose evaluations appear on the dragons’ tails no longer exist, as a practical matter. If educators wanted to adopt them, the programs would have to be revised or reinvented. For example, Daisy Quest, still in the WWC, was evaluated on TRS-80 computers not manufactured since the 1980s. Yet exciting new programs with rigorous evaluations, highlighted in the BEE reviews, do not appear at all in the WWC.

I do not understand why the WWC is so slow to add new evaluations, but I suspect that the answer lies in the painstaking procedures any government has to follow to do . . ., well, anything. Perhaps there are very good reasons for this stately pace of progress. However, the result is clear. The graph below shows the publication dates of every study in every subject and grade level accepted by the WWC and entered on its database. This “half-dragon” graph shows that only 26 studies published or made available after 2013 appear on the entire WWC database. Of these, only two have appeared after 2015.

blog_4-26-18_graph4_500x316

The slow pace of the WWC is of particular concern in light of the appearance of the ESSA evidence standards. More educators than ever before must be consulting the WWC, and many must be wondering why programs they know to exist are not listed there, or why recent studies do not appear.

Assuming that there are good reasons for the slow pace of the WWC, or that for whatever reason the pace cannot be greatly accelerated, what can be done to bring the WWC up to date? I have a suggestion.

Imagine that the WWC commissioned someone to do rapid updating of all topics reviewed on the WWC website. The reviews would follow WWC guidelines, but would appear very soon after studies were published or issued. It’s clear that this is possible, because we do it for Evidence for ESSA (www.evidenceforessa.org). Also, the WWC has a number of “quick reviews,” “single study reports,” and so on, scattered around on its site, but not integrated with its main “Find What Works” reviews of various programs. These could be readily integrated with “Find What Works.”

The recent studies identified in this accelerated process might be identified as “provisionally reviewed,” much as the U. S. Patent Office has “patent pending” before inventions are fully patented. Users would have an option to look only at program reports containing fully reviewed studies, or could decide to look at reviews containing both fully and provisionally reviewed studies. If a more time consuming full review of a study found results different from those of the provisional review, the study report and the program report in which it was contained would be revised, of course.

A process of this kind could bring the WWC up to date and keep it up to date, providing useful, actionable evidence in a timely fashion, while maintaining the current slower process, if there is a rationale for it.

The Chinese dragons we are finding in every subject we have examined indicate the rapid growth and improving quality of evidence on programs for schools and students. The U. S. Department of Education and our whole field should be proud of this, and should make it a beacon on a hill, not hide our light under a bushel. The WWC has the capacity and the responsibility to highlight current, high-quality studies as soon as they appear. When this happens, the Chinese dragons will retire to their caves, and all of us, government, researchers, educators, and students, will benefit.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (2017). Effective reading programs for secondary students. Manuscript submitted for publication. Also see Baye, A., Lake, C., Inns, A. & Slavin, R. E. (2017, August). Effective reading programs for secondary students. Baltimore, MD: Johns Hopkins University, Center for Research and Reform in Education.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Photo credit: J Bar [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/), GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Great Tutors Could Be a Source of Great Teachers

blog_4-19-18_tutoring_500x329In a recent blog, I wrote about findings of three recent reviews of research on tutoring, contained within broader reviews of research on effective programs for struggling elementary readers, struggling secondary readers, and math. The blog reported the astonishing finding that in each of the reviews, outcomes for students tutored by paraprofessionals (teaching assistants) were as good, and usually somewhat better, than outcomes for students tutored by teachers.

It is important to note that the paraprofessionals tutoring students usually had BAs, one indicator of high quality. But since paras are generally paid about half as much as teachers, using them enables schools to serve twice as many struggling students at about the same cost as hiring teacher tutors. And because there are teacher shortages in many areas, such as inner cities and rural locations, sufficient teachers may not be available in some places at any cost.

In my earlier blog, I explained all this, but now I’d like to expand on one aspect of the earlier blog I only briefly mentioned.

If any district or state decided to invest substantially in high-quality paraprofessional tutors and train them in proven one-to-one and one-to-small group tutoring strategies, it would almost certainly increase the achievement of struggling learners and reduce retentions and special education placements. But it could also provide a means of attracting capable recent university graduates into teaching.

Imagine that districts or states recruited students graduating from local universities to serve in a “tutor corps.” Those accepted would be trained and mentored to become outstanding tutors. From within that group, tutors who show the greatest promise would be invited to participate in a fast-track teacher certification program. This would add coursework to the paraprofessionals’ schedules, while they continue tutoring during other times. In time, the paraprofessionals would be given opportunities to do brief classroom internships, and then student teaching. Finally, they would receive their certification, and would be assigned to a school in the district or state.

There are several features worth noting about this proposal. First, the paraprofessionals would be paid throughout their teacher training, because at all points they would be providing valuable services to children. This would make it easier for recent university graduates to take courses leading to certification, which could expand the number of promising recent graduates who might entertain the possibility of becoming teachers. Paying teacher education candidates (as tutors) throughout their time in training could open the profession to a broader range of talented candidates, including diverse candidates who could not afford traditional teacher education.

Second, the whole process of recruiting well-qualified paraprofessionals, training and mentoring them as tutors, selecting the best of them to become certified, and providing coursework and student teaching experiences for them, would be managed by school districts or states, not by universities. School districts and states have a strong motivation to select the best teachers, see that they get excellent training and mentoring, and proceed to certification only when they are ready. Coursework might be provided by university professors contracted by the district or qualified individuals within the district or state. Again, because the district or state has a strong interest in having these experiences be optimal for their future teachers, they would be likely to take an active role in ensuring that coursework and coaching are first rate.

One important advantage of this system would be that it would give school, district, and state leaders opportunities to see future teachers operate in real schools over extended periods of time, first as tutors, then as interns, then as student teachers. At the end of the process, the school district or state should be willing to guarantee that all who succeed in this demanding sequence will be offered a job. They should be able to do this with confidence, because school and district staff would have seen the candidate work with real children in real schools.

The costs of this system might be minimal. During tutoring, internships, and student teaching, teacher candidates are providing invaluable services to struggling students. The only additional cost would entail providing coursework to meet state or district requirements. But this cost could be modest, and in exchange for paying for or providing the courses, the district or state would gain the right to select instructors of very high quality and insist on their effectiveness in practice. These are the schools’ own future teachers, and they should not be satisfied with less than stellar teacher education.

The system I’m proposing could operate alongside of traditional programs provided by universities. School districts or states might in fact create partnerships in which all teacher education candidates would serve as tutors as part of their teacher education, in which case university-led and district-led teacher education may essentially merge into one.

This system is more obviously attuned to the needs of elementary schools than secondary schools, because historically tutors have been rarely used in the secondary grades. Yet recent evidence from studies in England (http://www.bestevidence.org/reading/mhs/mhs_read.htm) has shown positive effects of tutoring in reading in the middle grades, and it seems likely that one-to-one or one-to-small group tutoring would be beneficial in all major subjects and, as in elementary school, may keep students who are far behind grade level in a given subject out of special education and able to keep up with their classmates. If paraprofessional tutors can work in the secondary grades, this would form the basis for a teacher certification plan like the one I have described.

Designing teacher certification programs around the needs of recent BAs sounds like Teach for America, and in many ways it is. But this system would, I’d argue, be more likely to attract large numbers of talented young people who would be more likely than TFA grads to stay in teaching for many years.

The main reason schools, districts, and states should invest in tutoring by paraprofessionals is to serve the large number of struggling learners who exist in every district. But in the course of doing this, districts could also take control of their own destinies and select and train the teachers they need. The result would be better teachers for all students, and a teaching profession that knows how to use proven programs to ensure the success of all.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Effect Sizes: How Big is Big?

blog_4-12-18_elephantandmouseAn effect size is a measure of how much an experimental group exceeds a control group, controlling for pretests. As every quantitative researcher knows, the formula is (XT – XC)/SD, or adjusted treatment mean minus adjusted control mean divided by the unadjusted standard deviation. If this is all gobbledygook to you, I apologize, but sometimes us research types just have to let our inner nerd run free.

Effect sizes have come to be accepted as a standard indicator of the impact an experimental treatment had on a posttest. As research becomes more important in policy and practice, understanding them is becoming increasingly important.

One constant question is how important a given effect size is. How big is big? Many researchers still use a rule of thumb from Cohen to the effect that +0.20 is “small,” +0.50 is “moderate,” and +0.80 or more is “large.”  Yet Cohen himself disavowed these standards long ago.

High-quality experimental-control comparison research in schools rarely gets effect sizes as large as +0.20, and only one-to-one tutoring studies routinely get to +0.50. So Cohen’s rule of thumb was demanding effect sizes for rigorous school research far larger than those typically reported in practice.

An article by Hill, Bloom, Black, and Lipsey (2008) considered several ways to determine the importance of effect sizes. They noted that students learn more each year (in effect sizes) in the early elementary grades than do high school students. They suggested that therefore a given effect size for an experimental treatment may be more important in secondary school than the same effect size would be in elementary school. However, in four additional tables in the same article, they show that actual effect sizes from randomized studies are relatively consistent across the grades. They also found that effect sizes vary greatly depending on methodology and the nature of measures. They end up concluding that it is most reasonable to determine the importance of an effect size by comparing it to effect sizes in other studies with similar measures and designs.

A study done by Alan Cheung and myself (2016) reinforces the importance of methodology in determining what is an important effect size. We analyzed all findings from 645 high-quality studies included in all reviews in our Best Evidence Encyclopedia (www.bestevidence.org). We found that the most important factors in effect sizes were sample size and design (randomized vs. matched). Here is the key table.

Effects of Sample Size and Designs on Effect Sizes

  Sample Size
Design Small Large
Matched +0.33 +0.17
Randomized +0.23 +0.12

What this chart shows is that matched studies with small sample sizes (less than 250 students) have much higher effect sizes, on average, than, say, large randomized studies (+0.33 vs. +0.12). These differences say nothing about the impact on children, but are completely due to differences in study design.

If effect sizes are so different due to study design, then we cannot have a single standard to tell us when an effect size is large or small. All we can do is note when an effect size is large compared to similar studies. For example imagine that a study finds an effect size of +0.20. Is that big or small? If it was a matched study with a small sample size, +0.20 would be a rather small impact. If it were a randomized study with a large sample size, it might be considered quite a large impact.

Beyond study methods, a good general principle is to compare like with like. For example, some treatments may have very small effect sizes, but they may be so inexpensive or may affect so many students that a small effect may be important. For example, principal or superintendent training may affect very many students, or benchmark assessments may be so inexpensive that a small effect size may be worthwhile, and may compare favorably with equally inexpensive means of solving the same problem.

My colleagues and I will be developing a formula to enable researchers and readers to easily put in features of a study to produce an “expected effect size” to determine more accurately whether an effect size should be considered large or small.

Not long ago, it would not have mattered much how large effect sizes were considered, but now it does. That’s an indication of the progress we have made in recent years. Big indeed!

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

New Findings on Tutoring: Four Shockers

blog_04 05 18_SURPRISE_500x353One-to-one and one-to-small group tutoring have long existed as remedial approaches for students who are performing far below expectations. Everyone knows that tutoring works, and nothing in this blog contradicts this. Although different approaches have their champions, the general consensus is that tutoring is very effective, and the problem with widespread use is primarily cost (and for tutoring by teachers, availability of sufficient teachers). If resources were unlimited, one-to-one tutoring would be the first thing most educators would recommend, and they would not be wrong. But resources are never unlimited, and the numbers of students performing far below grade level are overwhelming, so cost-effectiveness is a serious concern. Further, tutoring seems so obviously effective that we may not really understand what makes it work.

In recent reviews, my colleagues and I examined what is known about tutoring. Beyond the simple conclusion that “tutoring works,” we found some big surprises, four “shockers.” Prepare to be amazed! Further, I propose an explanation to account for these unexpected findings.

We have recently released three reviews that include thorough, up-to-date reviews of research on tutoring. One is a review of research on programs for struggling readers in elementary schools by Amanda Inns and colleagues (2018). Another is a review on programs for secondary readers by Ariane Baye and her colleagues (2017). Finally, there is a review on elementary math programs by Marta Pellegrini et al. (2018). All three use essentially identical methods, from the Best Evidence Encyclopedia (www.bestevidence.org). In addition to sections on tutoring strategies, all three also include other, non-tutoring methods directed at the same populations and outcomes.

What we found challenges much of what everyone thought they knew about tutoring.

Shocker #1: In all three reviews, tutoring by paraprofessionals (teaching assistants) was at least as effective as tutoring by teachers. This was found for reading and math, and for one-to-one and one-to-small group tutoring.  For struggling elementary readers, para tutors actually had higher effect sizes than teacher tutors. Effect sizes were +0.53 for paras and +0.36 for teachers in one-to-one tutoring. For one-to-small group, effect sizes were +0.27 for paras, +0.09 for teachers.

Shocker #2: Volunteer tutoring was far less effective than tutoring by either paras or teachers. Some programs using volunteer tutors provided them with structured materials and extensive training and supervision. These found positive impacts, but far less than those for paraprofessional tutors. Volunteers tutoring one-to-one had an effect size of +0.18, paras had an effect size of +0.53. Because of the need for recruiting, training, supervision, and management, and also because the more effective tutoring models provide stipends or other pay, volunteers were not much less expensive than paraprofessionals as tutors.

Shocker #3:  Inexpensive substitutes for tutoring have not worked. Everyone knows that one-to-one tutoring works, so there has long been a quest for approaches that simulate what makes tutoring work. Yet so far, no one, as far as I know, has found a way to turn lead into tutoring gold. Although tutoring in math was about as effective as tutoring in reading, a program that used online math tutors communicating over the Internet from India and Sri Lanka to tutor students in England, for example, had no effect. Technology has long been touted as a means of simulating tutoring, yet even when computer-assisted instruction programs have been effective, their effect sizes have been far below those of the least expensive tutoring models, one-to-small group tutoring by paraprofessionals. In fact, in the Inns et al. (2018) review, no digital reading program was found to be effective with struggling readers in elementary schools.

 Shocker #4: Certain whole-class and whole-school approaches work as well or better for struggling readers than tutoring, on average. In the Inns et al. (2018) review, the average effect size for one-to-one tutoring approaches was +0.31, and for one-to-small group approaches it was +0.14. Yet the mean for whole-class approaches, such as Ladders to Literacy (ES = +0.48), PALS (ES = +0.65), and Cooperative Integrated Reading and Composition (ES = +0.19) averaged +0.33, similar to one-to-one tutoring by teachers (ES = +0.36). The mean effect sizes for comprehensive tiered school approaches, such as Success for All (ES = +0.41) and Enhanced Core Reading Instruction (ES = +0.22) was +0.43, higher than any category of tutoring (note that these models include tutoring as part of an integrated response to implementation approach). Whole-class and whole-school approaches work with many more students than do tutoring models, so these impacts are obtained at a much lower cost per pupil.

Why does tutoring work?

Most researchers and others would say that well-structured tutoring models work primarily because they allow tutors to fully individualize instruction to the needs of students. Yet if this were the only explanation, then other individualized approaches, such as computer-assisted instruction, would have outcomes similar to those of tutoring. Why is this not the case? And why do paraprofessionals produce at least equal outcomes to those produced by teachers as tutors? None of this squares with the idea that the impact of tutoring is entirely due to the tutor’s ability to recognize and respond to students’ unique needs. If that were so, other forms of individualization would be a lot more effective, and teachers would presumably be a lot more effective at diagnosing and responding to students’ problems than would less highly trained paraprofessionals. Further, whole-class and whole-school reading approaches, which are not completely individualized, would have much lower effect sizes than tutoring.

My theory to account for the positive effects of tutoring in light of the four “shockers” is this:

  • Tutoring does not work due to individualization alone. It works due to individualization plus nurturing and attention.

This theory begins with the fundamental and obvious assumption that children, perhaps especially low achievers, are highly motivated by nurturing and attention, perhaps far more than by academic success. They are eager to please adults who relate to them personally.  The tutoring setting, whether one-to-one or one-to-very small group, gives students the undivided attention of a valued adult who can give them personal nurturing and attention to a degree that a teacher with 20-30 students cannot. Struggling readers may be particularly eager to please a valued adult, because they crave recognition for success in a skill that has previously eluded them.

Nurturing and attention may explain the otherwise puzzling equality of outcomes obtained by teachers and paraprofessionals as tutors. Both types of tutors, using structured materials, may be equally able to individualize instruction, and there is no reason to believe that paras will be any less nurturing or attentive. The assumption that teachers would be more effective as tutors depends on the belief that tutoring is complicated and requires the extensive education a teacher receives. This may be true for very unusual learners, but for most struggling students, a paraprofessional may be as capable as a teacher in providing individualization, nurturing, and attention. This is not to suggest that paraprofessionals are as capable as teachers in every way. Teachers have to be good at many things: preparing and delivering lessons, managing and motivating classes, and much more. However, in their roles as tutors, teachers and paraprofessionals may be more similar.

Volunteers certainly can be nurturing and attentive, and can be readily trained in structured programs to individualize instruction. The problem, however, is that studies of volunteer programs report difficulties in getting volunteers to attend every day and to avoid dropping out when they get a paying job. This is may be less of a problem when volunteers receive a stipend; paid volunteers are much more effective than unpaid ones.

The failure of tutoring substitutes, such as individualized technology, is easy to predict if the importance of nurturing and attention is taken into account. Technology may be fun, and may be individualized, but it usually separates students from the personal attention of caring adults.

Whole-Class and Whole-School Approaches.

Perhaps the biggest shocker of all is the finding that for struggling readers, certain non-technology approaches to instruction for whole classes and schools can be as effective as tutoring. Whole-class and whole-school approaches can serve many more students at much lower cost, of course. These classroom approaches mostly use cooperative learning and phonics-focused teaching, or both, and the whole-school models especially Success for All,  combine these approaches with tutoring for students who need it.

The success of certain whole-class programs, of certain tutoring approaches, and of whole-school approaches that combine proven teaching strategies with tutoring for students who need more, argues for response to intervention (RTI), the policy that has been promoted by the federal government since the 1990s. So what’s new? What’s new is that the approach I’m advocating is not just RTI. It’s RTI done right, where each component of  the strategy has strong evidence of effectiveness.

The good news is that we have powerful and cost-effective tools at our disposal that we could be putting to use on a much more systematic scale. Yet we rarely do this, and as a result far too many students continue to struggle with reading, even ending up in special education due to problems schools could have prevented. That is the real shocker. It’s up to our whole profession to use what works, until reading failure becomes a distant memory. There are many problems in education that we don’t know how to solve, but reading failure in elementary school isn’t one of them.

Practical Implications.

Perhaps the most important practical implication of this discussion is a realization that benefits similar or greater than those of one-to-one tutoring by teachers can be obtained in other ways that can be cost-effectively extended to many more students: Using paraprofessional tutors, using one-to-small group tutoring, or using whole-class and whole-school tiered strategies. It is no longer possible to say with a shrug, “of course tutoring works, but we can’t afford it.” The “four shockers” tell us we can do better, without breaking the bank.

 

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (2017). Effective reading programs for secondary students. Manuscript submitted for publication. Also see Baye, A., Lake, C., Inns, A. & Slavin, R. E. (2017, August). Effective Reading Programs for Secondary Students. Baltimore, MD: Johns Hopkins University, Center for Research and Reform in Education.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Photo by Westsara (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

 

Nevada Places Its Bets on Evidence

blog_3-29-18_HooverDam_500x375In Nevada, known as the land of big bets, taking risks is what they do. The Nevada State Department of Education (NDE) is showing this in its approach to ESSA evidence standards .  Of course, many states are planning policies to encourage use of programs that meet the ESSA evidence standards, but to my knowledge, no state department of education has taken as proactive a stance in this direction as Nevada.

 

Under the leadership of their state superintendent, Steve Canavero, Deputy Superintendent Brett Barley, and Director of the Office of Student and School Supports Seng-Dao Keo, Nevada has taken a strong stand: Evidence is essential for our schools, they maintain, because our kids deserve the best programs we can give them.

All states are asked by ESSA to require strong, moderate, or promising programs (defined in the law) for low-achieving schools seeking school improvement funding. Nevada has made it clear to its local districts that it will enforce the federal definitions rigorously, and only approve school improvement funding for schools proposing to implement proven programs appropriate to their needs. The federal ESSA law also provides bonus points on various other applications for federal funding, and Nevada will support these provisions as well.

However, Nevada will go beyond these policies, reasoning that if evidence from rigorous evaluations is good for federal funding, why shouldn’t it be good for state funding too? For example, Nevada will require ESSA-type evidence for its own funding program for very high-poverty schools, and for schools serving many English learners. The state has a reading-by-third-grade initiative that will also require use of programs proven to be effective under the ESSA regulations. For all of the discretionary programs offered by the state, NDE will create lists of ESSA-proven supplementary programs in each area in which evidence exists.

Nevada has even taken on the holy grail: Textbook adoption. It is not politically possible for the state to require that textbooks have rigorous evidence of effectiveness to be considered state approved. As in the past, texts will be state adopted if they align with state standards. However, on the state list of aligned programs, two key pieces of information will be added: the ESSA evidence level and the average effect size. Districts will not be required to take this information into account, but by listing it on the state adoption lists the state leaders hope to alert district leaders to pay attention to the evidence in making their selections of textbooks.

The Nevada focus on evidence takes courage. NDE has been deluged with concern from districts, from vendors, and from providers of professional development services. To each, NDE has made the same response: we need to move our state toward use of programs known to work. This is worth undergoing the difficult changes to new partnerships and new materials, if it provides Nevada’s children better programs, which will translate into better achievement and a chance at a better life. Seng-Dao Keo describes the evidence movement in Nevada as a moral imperative, delivering proven programs to Nevada’s children and then working to see that they are well implemented and actually produce the outcomes Nevada expects.

Perhaps other states are making similar plans. I certainly hope so, but it is heartening to see one state, at least, willing to use the ESSA standards as they were intended to be used, as a rationale for state and local educators not just to meet federal mandates, but to move toward use of proven programs. If other states also do this, it could drive publishers, software producers, and providers of professional development to invest in innovation and rigorous evaluation of promising approaches, as it increases use of approaches known to be effective now.

NDE is not just rolling the dice and hoping for the best. It is actively educating its district and school leaders on the benefits of evidence-based reform, and helping them make wise choices. With a proper focus on assessments of needs, facilitating access to information, and assistance with ensuring high quality implementation, really promoting use of proven programs should be more like Nevada’s Hoover Dam: A sure thing.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Photo by: Michael Karavanov [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Lessons from China

blog_3-22-18_Confucius_344x500Recently I gave a series of speeches in China, organized by the Chinese University of Hong Kong and Nanjing Normal University. I had many wonderful and informative experiences, but one evening stood out.

I was in Nanjing, the ancient capital, and it was celebrating the weeks after the Chinese New Year. The center of the celebration was the Temple of Confucius. In and around it were lighted displays exhorting Chinese youth to excel on their exams. Children stood in front of these displays to have their pictures taken next to characters saying “first in class,” never second. A woman with a microphone recited blessings and hopes that students would do well on exams. After each one, students hit a huge drum with a long stick, as an indication of accepting the blessing. Inside the temple were thousands of small silk messages, bright red, expressing the wishes of parents and students that students will do well on their exams. Chinese friends explained what was going on, and told me how pervasive this spirit was. Children all know a saying to the effect that the path to riches and a beautiful wife was through books. I heard that perhaps 70% of urban Chinese students go to after-school cram schools to ensure their performance on exams.

The reason Chinese parents and students take test scores so seriously is obvious in every aspect of Chines culture. On an earlier trip to China I toured a beautiful house, from hundreds of years ago, in a big city. The only purpose of the house was to provide a place for young men of a large clan to stay while they prepared for their exams, which determined their place in the Confucian hierarchy.

As everyone knows, Chinese students do, in fact, do very well on their exams. I would note that these data come in particular from urban Eastern China, such as Shanghai. I’d heard about but did not fully understand policies that contribute to these outcomes. In all big cities in China, students can only attend schools in their city neighborhoods, where the best schools in the country are, if they were born there or own apartments. In a country where a small apartment in a big city can easily cost a half million dollars (U.S.), this is no small selection factor. If parents work in the city but do not own an apartment, their children may have to remain in the village or small city they came from, living with grandparents and attending non-elite schools. Chinese cities are growing so fast that the majority of their inhabitants come from the rest of China. This matters because admirers of Chinese education often cite the amazing statistics from the rich and growing Eastern Chinese cities, not the whole country. It’s as though the U.S. only reported test scores on international comparisons from suburbs in the Northeastern states from Maryland to New England, the wealthiest and highest-achieving part of our country.

I do not want to detract in any way from the educational achievements of the Chinese, but just to put it in context. First, the Chinese themselves have doubts about test scores as the only important indicators, and admire Western education for its broader focus. But just sticking to test scores, China and other Confucian cultures such as Japan, South Korea, and Singapore have been creating a culture valuing test scores since Confucius, about 2500 years ago. It would be a central focus of Chinese culture even if PISA and TIMSS did not exist to show it off to the world.

My only point is that when American or European observers hold up East Asian achievements as a goal to aspire to, these achievements do not exist in a cultural vacuum. Other countries can potentially achieve what China has achieved, in terms of test scores and other indicators, but they cannot achieve it in the same way. Western culture is just not going to spend the next 2500 years raising its children the way the Chinese do. What we can do, however, is to use our own strengths, in research, development, and dissemination, to progressively enhance educational outcomes. The Chinese can and will do this, too; that’s what I was doing traveling around China speaking about evidence-based reform. We need not be in competition with any nation or society, as expanding educational opportunity and success throughout the world is in the interests of everyone on Earth. But engaging in fantasies about how we can move ahead by emulating parts of Chinese culture that they have been refining since Confucius is not sensible.

Precisely because of their deep respect for scholarship and learning and their eagerness to continue to improve their educational achievements, the Chinese are ideal collaborators in the worldwide movement toward evidence-based reform in education. Colleagues at the Chinese University of Hong Kong and the Nanjing Normal University are launching Chinese-language and Asian-focused versions of our newsletter on evidence in education, Best Evidence in Brief (BEiB). We and our U.K. colleagues have been distributing BEIB for several years. We welcome the opportunity to share ideas and resources with our Chinese colleagues to enrich the evidence base for education for children everywhere.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

What if a Sears Catalogue Married Consumer Reports?

blog_3-15-18_familyreading_500x454When I was in high school, I had a summer job delivering Sears catalogues. I borrowed my mother’s old Chevy station wagon and headed out fully laden into the wilds of the Maryland suburbs of Washington.

I immediately learned something surprising. I thought of a Sears catalogue as a big book of advertisements. But the people to whom I was delivering them often saw it as a book of dreams. They were excited to get their catalogues. When a neighborhood saw me coming, I became a minor celebrity.

Thinking back on those days, I was thinking about our Evidence for ESSA website (www.evidenceforessa.org). I realized that what I wanted it to be was a way to communicate to educators the wonderful array of programs they could use to improve outcomes for their children. Sort of like a Sears catalogue for education. However, it provides something that a Sears catalogue does not: Evidence about the effectiveness of each catalogue entry. Imagine a Sears catalogue that was married to Consumer Reports. Where a traditional Sears catalogue describes a kitchen gadget, “It slices and dices, with no muss, no fuss!”, the marriage with Consumer Reports would instead say, “Effective at slicing and dicing, but lots of muss. Also fuss.”

If this marriage took place, it might take some of the fun out of the Sears catalogue (making it a book of realities rather than a book of dreams), but it would give confidence to buyers, and help them make wise choices. And with proper wordsmithing, it could still communicate both enthusiasm, when warranted, and truth. But even more, it could have a huge impact on the producers of consumer goods, because they would know that their products would need to be rigorously tested and found to be able to back up their claims.

In enhancing the impact of research on the practice of education, we have two problems that have to be solved. Just like the “Book of Dreams,” we have to help educators know the wonderful array of programs available to them, programs they may never had heard of. And beyond the particular programs, we need to build excitement about the opportunity to select among proven programs.

In education, we make choices not for ourselves, but on behalf of our children. Responsible educators want to choose programs and practices that improve the achievement of their students. Something like a marriage of the Sears catalogue and Consumer Reports is necessary to address educators’ dreams and their need for information on program outcomes. Users should be both excited and informed. Information usually does not excite. Excitement usually does not inform. We need a way to do both.

In Evidence for ESSA, we have tried to give educators a sense that there are many solutions to enduring instructional problems (excitement), and descriptions of programs, outcomes, costs, staffing requirements, professional development, and effects for particular subgroups, for example (information).

In contrast to Sears catalogues, Evidence for ESSA is light (Sears catalogues were huge, and ultimately broke the springs on my mother’s station wagon). In contrast to Consumer Reports, Evidence for ESSA is free.  Every marriage has its problems, but our hope is that we can capture the excitement and the information from the marriage of these two approaches.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Picture source: Nationaal Archief, the Netherlands