Reviewing Social and Emotional Learning for ESSA: MOOSES, not Parrots

This blog was co-authored by Elizabeth Kim

I’m delighted to see all the interest lately in social and emotional skills. These range widely, from kindness and empathy to ability to delay gratification to grit to belief that effort is more important than intelligence to avoidance of bullying, violence, and absenteeism. Social and emotional learning (SEL) has taken on even more importance as the Every Student Succeeds Act (ESSA) allows states to add to their usual reading and math accountability measures, and some are adding measures of SEL. This makes it particularly important to have rigorous research on this topic.

I’ve long been interested in social-emotional development, but I have just started working with a student, Liz Kim, on a systematic review of SEL research. Actually, Liz is doing all the work. Part of the purpose of the SEL review is to add a section on this topic to Evidence for ESSA. In conceptualizing our review, we immediately ran into a problem. While researchers studying achievement mostly use tests, essays, products, and other fairly objective indicators, those studying social-emotional skills and behaviors use a wide variety of measures, many of which are far less objective. For example, studies of social-emotional skills make much use of student self-report, or ratings of students’ behaviors by the teachers who administered the treatment. Researchers in this field are well aware of the importance of objectivity, but they report more and less objective measures within the same studies depending on their research purposes. For academic purposes this is perfectly fine. SEL researchers and the readers of their reports are of course free to emphasize whichever measures they find most meaningful.

The problem arises when SEL measures are used in reviews of research to determine which programs and practices meet the ESSA standards for strong, moderate, or promising levels of evidence. Under ESSA, selecting programs meeting strong, moderate, or promising criteria can have consequences for schools in terms of grant funding, so it could be argued that more objective measures should be required.

In our reviews of K-12 reading and math programs for Evidence for ESSA, we took a hard line on objectivity. For example, we do not accept outcome measures made by the researchers or developers, or those that assess skills taught in the experimental group but not the control group. The reason for this is that effect sizes for such studies are substantially inflated in comparison to independent measures. We also do not accept achievement measures administered individually to students by the students’ own teachers, who implemented the experimental treatment, for the same reason. In the case of achievement studies that use independent measures, at least as one of several measures, we can usually exclude non-independent measures without excluding whole studies.

Now consider measures in studies of social-emotional skills. They are often dependent on behavior ratings by teachers or self-reports by students. For example, in some studies students are taught to recognize emotions in drawings or photos of people. Recognizing emotions accurately may correlate with valuable social-emotional skills, but an experiment whose only outcome is the ability to recognize emotions could just be teaching students to parrot back answers on a task of unknown practical value in life. Many SEL measures used in studies with children are behavior ratings by the very teachers who delivered the treatment. Teacher ratings are sure to be biased (on average) by the normal human desire to look good (called social desirability bias). This is particularly problematic when teachers are trained to use a strategy to improve a particular outcome. For example, some programs are designed to improve students’ empathy. That’s a worthy goal, but empathy is hard to identify in practice. So teachers taught to identify behaviors thought to represent empathy are sure to see those behaviors in their children a lot more than teachers in the control group do, not necessarily because those children are in fact more empathetic, but because teachers and the children themselves may have learned a new vocabulary to recognize, describe, and exhibit empathy. This could be seen as another example of “parroting,” which means that subjects or involved raters (such as teachers or parents) have learned what to say or how to act under observation at the time of rating, instead of truly changing behaviors or attitudes.

For consequential purposes, such as reviews for ESSA evidence standards, it makes sense to ask for independently verified indicators demonstrating that students in an experimental group can and do engage in behaviors that are likely to help them in life. Having independent observers blind to treatments observe students in class or carry out structured tasks indicating empathetic or prosocial or cooperative behavior, for example, is very different from asking them on a questionnaire whether they engage in those behaviors or have beliefs in line with those skills. The problem is not only that attitudes and behaviors are not the same thing, but worse, that participants in the experimental group are likely to respond on a questionnaire in a way influenced by what they have just been taught. Students taught that bullying is bad will probably respond as the experimenters hope on a questionnaire. But will they actually behave differently with regard to bullying? Perhaps, but it is also quite possible that they are only parroting what they were just taught.

To determine ESSA ratings, we’d emphasize indicators we call MOOSES: Measureable, Observable, Objective Social Emotional Skills. MOOSES are quantifiable measures that can be observed in the wild (i.e., the school) objectively, ideally on routinely collected data unlikely to change just because staff or students know there is an experiment going on. For example, reports of disciplinary referrals, suspensions, and expulsions would be indicators of one type of social-emotional learning. Reports of fighting or bullying incidents could be MOOSES indicators.

Another category of MOOSES indicators would include behavioral observations by observers who are blind to experimental/control conditions, or observations of students in structured situations. Intergroup relations could be measured by watching who students play with during recess, for example. Or, if a SEL program focuses on building cooperative behavior, students could be placed in a cooperative activity and observed as they interact and solve problems together.

Self-report measures might serve as MOOSES indicators if they ask about behaviors or attitudes independent of the treatment students received. For instance, if students received a mindfulness intervention in which they were taught to focus on and regulate their own thoughts and feelings, then measures of self-reported or peer-reported prosocial behaviors or attitudes may not be an instance of parroting, because prosocial behavior was not the content of the intervention.

Social-emotional learning is clearly taking on an increasingly important role in school practice, and it is becoming more important in evidence-based reform as well. But reviewers will have to use conservative and rigorous approaches to evaluating SEL outcomes, as we do in evaluating achievement outcomes, if we want to ensure that SEL can be meaningfully incorporated in the ESSA evidence framework. We admit that this will be difficult and that we don’t have all the answers, but we also maintain that there should be some effort to focus on objective measures in reviewing SEL outcomes for ESSA.

This blog is sponsored by the Laura and John Arnold Foundation


Making Teaching and Teacher Education Respected Professions

I had an interesting conversation at the recent AERA meetings with the editor of my Pearson educational psychology text, Kevin Davis. He posed a question to me: “How can we convince school leaders, politicians, and the public that schools of education provide something of value to future teachers?”

I’ve thought a lot about this question, and about an even broader question: How can we increase respect for the teaching profession itself? These two questions are closely linked, of course, because if teachers were respected, the schools that produce them would be respected, and vice versa.

My answer, you may not be surprised to hear, drew from the history of medicine. Long ago, doctors were little respected because few of their treatments actually worked. My own grandfather, an immigrant from Argentina, believed that doctors had nothing to offer, and he refused to go to a doctor or hospital unless absolutely necessary. “Hospitals are where you go to die,” he always said (Note: he was healthy into his nineties and died at 96. In a hospital.).

However, physicians gained in status as their profession gained in proven treatments. In the 19th century, doctors could set bones, help in childbirth, administer smallpox vaccines, and prescribe various treatments that were mostly useless. However, in the 20th century, there was progress in what doctors could do. In mid-century, discovery of sulfa drugs, penicillin, a polio vaccine, and many other advances truly made medicine, physicians, and schools of medicine respected. Since 1962, when federal laws began to require randomized experiments for medications, the pace of discovery and application of effective treatments has exploded, and as physicians can reliably treat more and more diseases, respect for them and the schools that produce them has grown apace.

In education, this is how our profession and our schools of education will grow in status. As in medicine, this change will not happen all at once or overall, but it will happen as schools and teachers increasingly embrace and apply proven approaches.

Imagine, for example, that primary teachers were universally trained to use programs capable of ensuring reading success for their children. That secondary math teachers could ensure an understanding of algebra for every student. That science teachers could make American schools competitive with those in East Asia. Each of these accomplishments would be hugely beneficial for students, of course. But think what it would do for our profession. Picture this. A first grade teacher walks into a party. The room falls quiet. Parents meekly approach her to ask how they can help, or supplement her efforts with their children. Others are impressed by the school of education she attended. She gets this respect because everyone knows that she can teach every child who enters her class to read, no matter what. She has proven skills and knowledge that the world at large does not possess.

That’s how our profession must earn its respect. When every teacher has knowledge and skills that are proven effective and learned in schools of education, we’ll be respected. And we’ll deserve it.

This blog is sponsored by the Laura and John Arnold Foundation

Make No Small Plans

In recent years, an interest has developed in very low-cost interventions that produce small but statistically significant effects on achievement. The argument for their importance is that their costs are so low that their impacts are obtained very cost-effectively. For example, there is evidence that a brief self-affirmation exercise can produce a small but significant effect on achievement, and that a brief intervention to reduce “social identity threat” can do the same. A study in England found that a system to send 50 text messages over the course of a school year, announcing upcoming tests and homework assignments, feedback on grades, test results, and attendance, and updates on topics being studied in school, improved math achievement slightly but significantly, at a cost of about $5 a year.

There is nothing wrong with these mini-interventions, and perhaps all schools should use them. Why not? Yet I find myself a bit disturbed by this type of approach.

Step back from the small-cost/small-but-significant outcome and consider the larger picture, the task in which all who read this blog are jointly engaged. We face an educational system that is deeply dysfunctional. Disadvantaged students remain far, far behind middle-class students in educational outcomes, and the gap has not narrowed very much over decades. The U.S. remains well behind peer nations in achievement and is not catching up. Dropout rates in the U.S. are diminishing, but skill levels of American high school graduates from disadvantaged schools are appalling.

For schools with limited budgets to spend on reform, it may be all they can do to adopt a low-cost/low-but-significant outcome intervention on the basis that it’s better than nothing. But again, step back to look at the larger situation. The average American student is educated at a cost of more than $11,000 per year. There are whole-school reform approaches, such as our own Success for All in elementary and middle schools and BARR in secondary schools, that cost around $100 per student per year, and have been found to make substantial differences in student achievement. Contrast this to a low-cost program that costs, say, $5 per student per year.

$100 is less than 1% of the ordinary cost of educating a student, on average. $5 is less than .05%, of course. But in the larger scheme of things, who cares? Using a proven whole-school reform model might perhaps increase the per-student cost from $11,000 to $11,100. Adding the $5 low-cost intervention could increase per-student costs from $11,000 to $11,005. From the perspective of a principal who has a fixed budget, and simply does not have $100 per student to spend, the whole-school approach may be infeasible. But from the system perspective, the difference between $11,000 and $11,100 (or $11,005) is meaningless if it truly increases student achievement. Our goal must be to make meaningful progress in reducing gaps and increasing national achievement, not make a small difference that happens to be very inexpensive.

I once saw a film in England on the vital role of carrier pigeons in the English army in World War II. I’m sure those pigeons played their part in the victory, and they were very cost-effective. But ultimately, it was expensive tanks and planes and ships and other weapons, and courageous men and women, who won the war, not pigeons, and piling up small (even if effective) interventions was just not going to do it.

We should be in a war against inequality, disadvantage, and mediocre outcomes in education. Winning it will require identification and deployment of whole-school, whole-district, and whole-state approaches that can be reliably replicated and intelligently applied to ensure positive, widespread improvements. If we just throw pigeon-sized solutions at huge and tenacious problems, our difficulties are sure to come home to roost.

This blog is sponsored by the Laura and John Arnold Foundation

For Drug and Alcohol Prevention, Good Intentions Are Not Enough

Every year, I learn something at the AERA meetings, but it never has anything to do with what’s on the program. Last year it was about recycling. This year, it was about ineffective but heart-tugging programs.

One morning in San Antonio, I came out of a restaurant after breakfast and there were two very sweet-looking middle school girls who were collecting money for their school’s DARE program. DARE (Drug Abuse Resistance Education) is a very widespread program that is designed to reduce drug and alcohol use. Police officers speak to students and get them to sign a pledge not to use drugs or alcohol. The girls told me DARE had now added a focus on preventing suicide. I was impressed by their presentation, and gave them twenty bucks.

Why do I consider this mundane transaction blog-worthy? The answer is that it just so happens that DARE is the very anti-poster child among advocates for evidence-based reform. It’s seen as an appealing-sounding yet ineffective program. According to Blueprints (, which rigorously reviews mostly drug, alcohol, and delinquency prevention programs, does not rate DARE as effective, and numerous reports of large-scale evaluations found no benefits. In 2001, the U.S. Surgeon General put DARE on a list of ineffective and sometimes counterproductive programs.

Further, Blueprints certifies alternatives to DARE that have been rigorously evaluated and found to be effective in reducing drug and alcohol use among teens. For example, Blueprints lists the following programs as meeting its “model” criterion or better for middle school students: Lifeskills Training (LST), Multisystemic Therapy (MST), Functional Family Therapy (FFT), and Positive Action. Several other programs met the Blueprints “Promising” standard.

Knowing all this, why did I contribute? Clearly, I contributed from my heart, not my mind. The girls were very sincere, and believe fervently in what they were doing. From their perspective they were not advocating for a specific program, they were taking a personal stand against drugs and alcohol abuse, and I think that was admirable, so I admired it, to the extent of $20.

At the same time, I recognized the irony, and also thought about how government and philanthropists must see DARE, and many other programs intended to improve social and educational outcomes for youth. They must equally see programs that are sincere, appealing, and clearly offered by good people to do good things. They are probably not aware that there are proven alternatives offered by equally good people to accomplish equally valuable goals, which happen to actually make a difference. Evidence just does not play much of a role, if any, in these decisions. Supporting good causes is inherently good, isn’t it?

The problem is that government and philanthropic resources and attention are limited, and if these resources are tied up in ineffective or untested programs, they are not going to support proven alternatives that could actually move the needle.

Worse, funding DARE instead of proven alternatives may eventually put the alternative programs out of operation, and convince good-hearted people who want to improve outcomes for youth that doing rigorous evaluations of their programs is foolish.

Neither those middle school girls nor their teachers nor probably their principals could change the situation in which they find themselves. Even if they knew full well that DARE has not been shown to be effective, it is morally irresponsible to do nothing about drug and alcohol abuse, and DARE may be the only approach they have on offer.

Yet at higher levels in the system, there is a responsibility to find out which drug and alcohol prevention programs are truly effective and to invest in those. Such programs are easily found on Blueprints, for example, if only our leaders were in the habit of consulting it. Those middle school girls could just as well have been collecting for a program that works. Had they been doing so, I would have been a lot happier about the fate of my 20 bucks, not only because it might actually reduce drug and alcohol abuse, but because it would also indicate a changed mindset, one that values actual impact rather than just good intentions.

This blog is sponsored by the Laura and John Arnold Foundation