How Biased Measures Lead to False Conclusions


One hopeful development in evidence-based reform in education is the improvement in the quality of evaluations of educational programs. Because of policies and funding provided by the Institute of Education Sciences (IES) and Investing in Innovation (i3) in the U.S. and by the Education Endowment Foundation (EEF) in the U.K., most evaluations of educational programs today use far better procedures than was true as recently as five years ago. Experiments are likely to be large, to use random assignment or careful matching, and to be carried out by third-party evaluators, all of which give (or should give) educators and policy makers greater confidence that evaluations are unbiased and that their findings are meaningful.

Despite these positive developments, there remain serious problems in some evaluations. One of these relates to measures that give the experimental group an unfair advantage.

There are several ways in which measures can unfairly favor the experimental group. The most common is where measures are made by the creator of the program and are precisely aligned with the curriculum taught in the experimental group but not the control group. For example, a developer might reason that a new curriculum represents what students should be taught in, say, science or math, so it’s all right to use a measure aligned with the experimental program. However, use of such measures gives a huge advantage to the experimental group. In an article published in the Journal of Research on Educational Effectiveness, Nancy Madden and I looked at effect sizes for such over-aligned measures among studies accepted by the What Works Clearinghouse (WWC). In reading, we found an average effect size of +0.51 for over-aligned measures, compared to an average of +0.06 for measures that were fair to the content taught in experimental and control groups. In math, the difference was +0.45 for over-aligned measures, -0.03 for fair ones. These are huge differences.

A special case of over-aligned measures takes place when content is introduced earlier than usual in students’ progression through school in the experimental group, but not the control group. For example, if students are taught first-grade math skills in kindergarten, they will of course do better on a first grade test (in kindergarten) than will students not taught these skills in kindergarten. But will the students still be better off by the end of first grade, when all have been taught first grade skills? It’s unlikely.

One more special case of over-alignment takes place in relatively brief studies when students are pre-tested, taught a given topic, and then post-tested, say, eight weeks later. The control group, however, might have been taught that topic earlier or later than that eight-week period, or might have spent much less than 8 weeks on it. In a recent review of elementary science programs, we found many examples of this, including situations in which experimental groups were taught a topic such as electricity during an experiment, while the control group was not taught about electricity at all during that period. Not surprisingly, these studies produce very large but meaningless effect sizes.

As evidence becomes more important in educational policy and practice, we researchers need to get our own house in order. Insisting on the use of measures that are not biased in favor of experimental groups is a major necessity in building a body of evidence that educators can rely on.


Making Effective Use of Paraprofessionals


The Education Endowment Foundation (EEF) in England has just released its first six reports of studies evaluating various interventions. In each case, rigorous, randomized evaluations were done by third parties. As is typical in such studies, most found that treatments did not have significant positive outcomes, but two of them did. Both evaluated different uses of paraprofessionals. In England, as in the U.S., paraprofessionals usually assist teachers in classrooms, helping individual students with problems, helping the teacher with classroom management, and “other duties as assigned.” As in the U.S., teachers, parents, and politicians like paraprofessionals, because they are usually nice, helpful people from the community who free teachers from mundane tasks so the teachers can do what they do best. Unfortunately, research in both countries finds that paraprofessionals make no difference in student learning. The famous Tennessee Class Size study, for example, compared larger and smaller classes, but also had a large-class-with-paraprofessional condition, in which student achievement was precisely the same as it was in the large classes without paraprofessionals.

In one of the recent EEF-funded evaluations, teaching assistants taught struggling secondary readers one-to-one 20 minutes a day for 10 weeks. The study involved 308 middle schoolers randomly assigned to tutoring or ordinary teaching in 19 schools. The tutored students gained significantly more in reading than did controls. Similarly, a studyin which 324 elementary students in 54 schools were randomly assigned to one-to-one tutoring in math or to regular teaching found that the tutored students gained significantly more.

The EEF reports add to a considerable body of research in the U.S. showing that well-trained paraprofessionals can obtain substantial gains with struggling readers in one-to-one and small-group tutoring.

What these findings tell us is crystal clear. Already in our schools we have a powerful but underutilized resource, paraprofessionals who, with training and assistance, could be making a substantial difference in the lives of struggling students. This resource is costing us a lot. Most of the $15 billion we spend on Title I every year is spent on paraprofessionals, as is a lot of state and local funding. From personal experience, paraprofessionals are caring and capable people who want to make a difference. Why not use the evidence to help them do just that?