Could evidence provide a solution to the continuing controversy about teacher evaluation? In a recent blog, I discussed low-cost and free ways to use proven programs to substantially improve outcomes in America’s schools. One of the most promising of these is based on providing alternatives to federal and state policies mandating new forms of teacher evaluation that combine extensive principal observations with value-added scores from students’ state reading and math tests.
Current teacher evaluation schemes are among the most contentious of the current administration’s policies. While states have long held schools accountable for their students’ achievement, teachers are now being increasingly and individually held accountable, based on some combination of frequent, structured principal observations and value-added scores from state achievement tests. States that received giant Race to the Top grants have had to have teacher evaluation plans as a part of their applications, as have states seeking waivers from onerous requirements of No Child Left Behind.
In concept, evaluating teachers makes perfect sense. In what private company are employees not evaluated and held accountable for their contribution to their company’s bottom line? Why should teachers be exempt from assessments of their job performance? In fact, teachers have been evaluated by their principals since long before Willa Cather was a first-year teacher, and these observations have long identified inadequate teachers.
In practice, evaluating teachers is not so easy. For a long time, principals have evaluated teachers based on formal observations. The problem is that principals give the great majority of their teachers the highest possible ratings, so they really only differentiate for teachers they perceive to be very poor. This is not unique to education, but is common in any business where metrics for success are subjective.
The new evaluation systems involve much more frequent and structured observations, and districts are paying a great deal of money to train their principals in detailed observation strategies. But guess what? Despite putting in many long hours learning and using the new methods, principals still end up giving all but their very least effective teachers very high scores. Further, even when trained researchers use these forms, they cannot make reliable differentiations between teachers from below average to outstanding (though, like the principals, they can reliably identify very poor teachers).
If teacher ratings are difficult to do reliably and tend to produce overwhelmingly high ratings, then overall evaluations of teachers will largely depend on value-added measures based on the reading and math scores of children in the grades tested, 3-8, plus one grade in high school (usually 11). Right off the bat, there’s an obvious problem: what about teachers of grades below 3, and of subjects other than reading and math? Middle and high schools do not usually even teach reading as a separate course. So how fair or accurate is it to judge preschool, kindergarten, grade 1-2, art, music, PE, and secondary English, science, and social studies teachers based on students’ reading and math gains?
There are many other technical problems of value added, mostly having to do with the difficulties of separating the effects teachers have from the effects of poverty, home environments, other teachers in the school, and so on.
Further, let’s be realistic about what teacher evaluations can do. They may help identify teachers who are doing a very poor job, and this information might be used to direct them toward assistance or toward other professions. However, it is not possible to fire a large proportion of teachers. There is not a great army of terrific teachers waiting for opportunities to teach, especially in high-poverty urban and rural schools. The small proportion of teachers who do need to leave the profession was, in general, already being identified by principals long before the current enthusiasm for teacher evaluation.
So if firing more teachers is not the main goal of current teacher evaluation systems, what is? The hope seems to be that evaluations will improve outcomes for whole schools by providing feedback and incentives for teachers to do their best.
Here at last we come to a testable hypothesis. If teacher evaluations help all teachers in a school get to the top of their game, then schools should show improvements in student test scores, right?
This might perhaps be true, but I have not yet seen a convincing study demonstrating such an effect. You might imagine that a school improvement approach that costs a lot in principal time and training, not to mention teacher angst and confrontations, would have been tested out in large-scale, randomized experiments, before it was required in schools across our nation. As one counterexample, all programs receiving i3 funding have to be subjected to third-party evaluations far more stringent than any that have evaluated student outcomes of recent teacher evaluation policies, yet the successfully evaluated programs are rare in practice while the unevaluated teacher evaluation schemes are nationally mandated. There are many programs for improving reading and math performance in grades K-12 that have already been found to be effective in rigorous evaluations, and many more proven programs are emerging from i3 and other sources. If the goal of teacher evaluation systems is to improve student outcomes, why not encourage use of all programs that are known to improve outcomes?
So here is my modest proposal for improving America’s elementary and secondary schools, at minimal cost.
- In all states required to use the new teacher evaluation schemes (extensive principal observation plus value-added scores) under Race to the Top, NCLB waivers, or other policy initiatives, allow schools to apply to implement proven programs instead of the new teacher evaluation schemes. These programs could be chosen from among those that meet current EDGAR standards for strong or moderate evidence of effectiveness. Principals would be expected to continue to use teacher evaluations to identify incompetent teachers.
- In order for schools to participate, 80% of their staffs would have to agree by secret ballot to implement the proven program with integrity and fidelity, using resources currently devoted to teacher evaluation.
- Schools selecting this option would then have three years to implement their chosen program or programs. Their students’ state test scores over the three-year period would be compared to those of a group of schools using the state’s teacher evaluation systems (extensive principal evaluation plus value added) and serving similar students.
- After three years, schools scoring no better than their comparison group would have to return to using the state’s teacher evaluation plan.
- During the time this is going on, the federal government and other funders would fund the development and evaluation of whole-school reforms and reading and math programs that might be added into the set of proven options schools might adopt over time, as this activity progresses.
If teacher evaluation schemes are intended to improve the performance of whole schools, then it is certainly fair to compare them to alternative strategies. Teachers and principals might be powerfully motivated to implement proven models well because their success keeps them out of the new teacher evaluation systems that are, let’s face it, not terrifically popular among educators. Kids would benefit today from proven programs, and knowledge would grow about how to unite schools around an enthusiastic embrace of proven strategies.If the proven strategies cost no more than the teacher evaluation plans, which seems likely, this could all be done at little or no cost.
Higher-achieving kids, happier teachers, happier principals, more knowledge about schoolwide reform, all at little or no cost to anyone. Does this sound good to anyone?