Reviewing Social and Emotional Learning for ESSA: MOOSES, not Parrots

This blog was co-authored by Elizabeth Kim

I’m delighted to see all the interest lately in social and emotional skills. These range widely, from kindness and empathy to ability to delay gratification to grit to belief that effort is more important than intelligence to avoidance of bullying, violence, and absenteeism. Social and emotional learning (SEL) has taken on even more importance as the Every Student Succeeds Act (ESSA) allows states to add to their usual reading and math accountability measures, and some are adding measures of SEL. This makes it particularly important to have rigorous research on this topic.

I’ve long been interested in social-emotional development, but I have just started working with a student, Liz Kim, on a systematic review of SEL research. Actually, Liz is doing all the work. Part of the purpose of the SEL review is to add a section on this topic to Evidence for ESSA. In conceptualizing our review, we immediately ran into a problem. While researchers studying achievement mostly use tests, essays, products, and other fairly objective indicators, those studying social-emotional skills and behaviors use a wide variety of measures, many of which are far less objective. For example, studies of social-emotional skills make much use of student self-report, or ratings of students’ behaviors by the teachers who administered the treatment. Researchers in this field are well aware of the importance of objectivity, but they report more and less objective measures within the same studies depending on their research purposes. For academic purposes this is perfectly fine. SEL researchers and the readers of their reports are of course free to emphasize whichever measures they find most meaningful.

The problem arises when SEL measures are used in reviews of research to determine which programs and practices meet the ESSA standards for strong, moderate, or promising levels of evidence. Under ESSA, selecting programs meeting strong, moderate, or promising criteria can have consequences for schools in terms of grant funding, so it could be argued that more objective measures should be required.

In our reviews of K-12 reading and math programs for Evidence for ESSA, we took a hard line on objectivity. For example, we do not accept outcome measures made by the researchers or developers, or those that assess skills taught in the experimental group but not the control group. The reason for this is that effect sizes for such studies are substantially inflated in comparison to independent measures. We also do not accept achievement measures administered individually to students by the students’ own teachers, who implemented the experimental treatment, for the same reason. In the case of achievement studies that use independent measures, at least as one of several measures, we can usually exclude non-independent measures without excluding whole studies.

Now consider measures in studies of social-emotional skills. They are often dependent on behavior ratings by teachers or self-reports by students. For example, in some studies students are taught to recognize emotions in drawings or photos of people. Recognizing emotions accurately may correlate with valuable social-emotional skills, but an experiment whose only outcome is the ability to recognize emotions could just be teaching students to parrot back answers on a task of unknown practical value in life. Many SEL measures used in studies with children are behavior ratings by the very teachers who delivered the treatment. Teacher ratings are sure to be biased (on average) by the normal human desire to look good (called social desirability bias). This is particularly problematic when teachers are trained to use a strategy to improve a particular outcome. For example, some programs are designed to improve students’ empathy. That’s a worthy goal, but empathy is hard to identify in practice. So teachers taught to identify behaviors thought to represent empathy are sure to see those behaviors in their children a lot more than teachers in the control group do, not necessarily because those children are in fact more empathetic, but because teachers and the children themselves may have learned a new vocabulary to recognize, describe, and exhibit empathy. This could be seen as another example of “parroting,” which means that subjects or involved raters (such as teachers or parents) have learned what to say or how to act under observation at the time of rating, instead of truly changing behaviors or attitudes.

For consequential purposes, such as reviews for ESSA evidence standards, it makes sense to ask for independently verified indicators demonstrating that students in an experimental group can and do engage in behaviors that are likely to help them in life. Having independent observers blind to treatments observe students in class or carry out structured tasks indicating empathetic or prosocial or cooperative behavior, for example, is very different from asking them on a questionnaire whether they engage in those behaviors or have beliefs in line with those skills. The problem is not only that attitudes and behaviors are not the same thing, but worse, that participants in the experimental group are likely to respond on a questionnaire in a way influenced by what they have just been taught. Students taught that bullying is bad will probably respond as the experimenters hope on a questionnaire. But will they actually behave differently with regard to bullying? Perhaps, but it is also quite possible that they are only parroting what they were just taught.

To determine ESSA ratings, we’d emphasize indicators we call MOOSES: Measureable, Observable, Objective Social Emotional Skills. MOOSES are quantifiable measures that can be observed in the wild (i.e., the school) objectively, ideally on routinely collected data unlikely to change just because staff or students know there is an experiment going on. For example, reports of disciplinary referrals, suspensions, and expulsions would be indicators of one type of social-emotional learning. Reports of fighting or bullying incidents could be MOOSES indicators.

Another category of MOOSES indicators would include behavioral observations by observers who are blind to experimental/control conditions, or observations of students in structured situations. Intergroup relations could be measured by watching who students play with during recess, for example. Or, if a SEL program focuses on building cooperative behavior, students could be placed in a cooperative activity and observed as they interact and solve problems together.

Self-report measures might serve as MOOSES indicators if they ask about behaviors or attitudes independent of the treatment students received. For instance, if students received a mindfulness intervention in which they were taught to focus on and regulate their own thoughts and feelings, then measures of self-reported or peer-reported prosocial behaviors or attitudes may not be an instance of parroting, because prosocial behavior was not the content of the intervention.

Social-emotional learning is clearly taking on an increasingly important role in school practice, and it is becoming more important in evidence-based reform as well. But reviewers will have to use conservative and rigorous approaches to evaluating SEL outcomes, as we do in evaluating achievement outcomes, if we want to ensure that SEL can be meaningfully incorporated in the ESSA evidence framework. We admit that this will be difficult and that we don’t have all the answers, but we also maintain that there should be some effort to focus on objective measures in reviewing SEL outcomes for ESSA.

This blog is sponsored by the Laura and John Arnold Foundation

Advertisements

Keep Up the Good Work (To Keep Up the Good Outcomes)

I just read an outstanding study that contains a hard but crucially important lesson. The study, by Woodbridge et al. (2014), evaluated a behavior management program for students with behavior problems. The program, First Step to Success, has been successfully evaluated many times. In the Woodbridge et al. study, 200 children in grades 1 to 3 with serious behavior problems were randomly assigned to experimental or control groups. On behavior and achievement measures, students in the experimental group scored much higher, with effect sizes of +0.44 to +0.87. Very impressive.

The researchers came back a year later to see if the outcomes were still there. Despite the substantial impacts seen at posttest, none of three prosocial/adaptive behavior measures, only one of three problem/maladaptive behaviors, and none of four academic achievement measures showed positive outcomes.

These findings were distressing to the researchers, but they contain a message. In this study, students passed from teachers who had been trained in the First Step method to teachers who had not. The treatment is well-established and inexpensive. Why should it ever be seen as a one-year intervention with a follow-up? Instead, imagine that all teachers in the school learned the program and all continued to implement it for many years. In this circumstance, it would be highly likely that the first-year positive impacts would be sustained and most likely improved over time.

Follow-up assessments are always interesting, and for interventions that are very expensive it may be crucial to demonstrate lasting impacts. But so often in education effective treatments can be maintained for many years, creating more effective school-wide environments and lasting impacts over time. Much as we might like to have one-shot treatments with long-lasting impacts, this does not correspond to the nature of children. The personal, family, or community problems that led children to have problems at a given point in time are likely to lead to problems in the future, too. But the solution is clear. Keep up the good work to keep up the good outcomes!

Good Failure/Bad Failure

Evidence junkies (like me) are reacting to the disappointing news on the evaluation of the Adolescent Behavioral Learning Experience (ABLE), a program implemented at Rikers Island to reduce recidivism among adolescent prisoners. Bottom line: The rigorous independent evaluation of the program failed to find any benefits. What makes this experiment especially interesting is that it is the first U.S. application of social impact bonds. Goldman Sachs put up a $7.2 million loan, and Bloomberg Philanthropies committed to a $6 million loan guarantee. Since the program did not produce the expected outcomes, Goldman Sachs lost $1.2 million.

Ironically, New York City administrators are delighted about the outcome because they do not have to pay for the program. They think they learned a great deal from the experience, for free.

It’s unclear what this will do to the social impact bond movement, currently in its infancy. However, I wanted to extend from this fascinating case to a broader issue in evidence-based reform.

The developers and advocates for the ABLE program who expected positive outcomes turned out to be wrong, at least in this implementation. The investors were wrong in expecting to make a profit. But I’d argue that they are all better off because of this experience, just as the N.Y.C. administrators said.

The distinction I want to make is between wrong and wrong-headed. Wrong, as I’m defining it in this context, means that a given outcome was not achieved, but it was entirely reasonable to expect that it might have been achieved. In contrast, wrong-headed means that not only was the desired outcome not achieved, but it was extremely unlikely that it could have been achieved. In many cases, a key component of wrong-headed actions is that the actor does not even know whether the action was effective or ineffective, right or wrong, and therefore continues with the same or similar actions indefinitely.

Wrong, I’d argue, is an honorable and useful outcome. In a recent interview, former White House advisor Gene Sperling noted that when a few cancer drugs fail to cure cancer, you don’t close down NIH. Instead, you take that information and use it to continue the research and development process. “Wrong,” in this view, can be defined as “good failure,” because it is a step on the path to progress.

“Wrong-headed,” on the other hand, is “bad failure.” When you do something wrong-headed, you learn nothing, or you learn the wrong lessons. Wrong-headed decisions tend to lead to more wrong-headed decisions, as you have no systematic guide to what is working and what is not.

The issue of wrong vs. wrong-headed comes up in the current discussions in Congress about continuing the Investing in Innovation (i3) program. By now, committees in both the House and the Senate have recommended ending i3. But this would be the very essence of wrong-headed policy. Sure, it is probable that many i3 programs funded so far will fail to make a difference in achievement, or will fail to go to scale. This just means that these programs have not yet found success. Some of these may still have evidence of promise, and some will not. However, all i3 programs are rigorously evaluated, so we will know a lot about which worked, which did not, and which still seem promising even if they did not work this time. That’s huge progress. The programs that are already showing success can have immediate impact in hundreds or thousands of schools while others greatly enrich understanding of what needs to be done.

Abandoning i3, in contrast, would be wrong-headed, a sure path to bad failure. A tiny slice of education funding, i3 tells us what works and what does not, so we can continually move towards effective strategies and policies. Without i3 and other research and development investments, education policy is just guesswork, and it gets no smarter over time.

No one can honestly argue that American education is as successful as it should be. Our kids, our economy, and our society deserve much better. Policies that seek a mixture of proven success and “good failure” will get us to solid advances in educational practice and policy. Abandoning or cutting programs like i3 is not just wrong. It’s wrong-headed.

School Reform Is Empty Without Love and Support

2014-08-28-Hp45Image.jpg

In a recent article in The New York Times, David Kirp wrote in opposition to “business models” being applied to education: Assessments, accountability, charters, vouchers, technology, and competition. In place of these, he notes that truly effective programs that have stood the test of time focus not on charters or vouchers or technology alone, but on building strong personal bonds between teachers and students. He gives as examples our Success for All modelDiplomas NowBig Brothers/Big Sisters, and YouthBuild. All of these help teachers, administrators, and community members to build positive, caring environments for children in high-poverty elementary and secondary schools. At the same time, all have been rigorously evaluated and found to improve such key outcomes as achievement and high school graduation.

Kirp is not arguing that approaches emphasizing charters or technology cannot work. However, his point is that unless they also intentionally improve teacher-student bonds, they are unlikely to make a lasting difference. We have enough evidence today to indicate that charters and technology are not magic, and that it matters a great deal what programs and practices they incorporate. I believe that Kirp is right in identifying teacher-student relationships as a key component of effective and sustainable reform. Further, building positive teacher-student relationships is valuable in itself.

Children are social beings who want and need to be loved and supported, and want to please teachers who give them love and support. Love and support are not enough to ensure that students can read or succeed in math, science, or other subjects. Teaching skill and effective practices and programs are also essential. One expression of love and support is the use of exciting, engaging teaching methods and communicating high expectations for all students. But even the best of programs and practices are empty and futile if they do not flow from love and support for children. No new form of governance, no new technology, no program of any kind can make a lasting difference without love and support at their core.

Universal Preschool: Use Innovation and Evidence to Make it Effective

In his recent State of the Union Address, President Obama proposed to “make high-quality preschool available to every child in America.” He referred to research that has demonstrated long term positive effects of attending high-quality preschool programs. President Obama’s support has excited the early childhood community. Who could be opposed to expanding high-quality preschool opportunities? Yet this begs the question: What does “high-quality” mean in practice?

“High-quality” preschools are often defined by educators and economists alike as ones in which teachers are adequately paid, facilities are adequate, and the ratio of staff to children is low. These are indeed important elements of quality and they are serious problems, as preschool educators are often very poorly paid, poorly educated themselves, and lack decent facilities. The low salaries received by preschool teachers leads to a high turnover rate, which also reduces quality. So ensuring universal access to high-quality preschools when many current preschoolers are already struggling with quality and funding issues will be a heavy lift.

Leaving aside money issues, however, there is an important question about how preschool programs should be structured. There is lots of research showing the benefits of high-quality preschool in comparison to no preschool (as in the famous Perry Preschool and Abecedarian programs). However, there is far less research showing different benefits of different preschool approaches.

The Preschool Curriculum Effectiveness Research initiative compared a number of promising approaches to each other and to groups using standard preschool teaching methods. The results are summarized in a review on the Best Evidence Encyclopedia. By the end of kindergarten, only a few of the programs showed child outcomes superior to those achieved by other programs. Structured programs that had a very strong focus on language and emergent literacy, giving children many opportunities to use language to work together, solve challenges, and develop positive relationships with each other, had the best outcomes for children.

Technology has so far played a modest role in early childhood education, but this may change as multimedia devices (such as interactive whiteboards) become more commonly used. Technology offers opportunities for teachers to enhance language development by engaging children with brief content that helps them understand how the world works. For example, children learning about health can see videos on how the body works and can be provided with video models of how to stay safe and healthy. Children can make choices and manipulate pictures and videos representing objects and processes. Further, classroom technology allows for linkages with the home, as parents increasingly have computers, DVDs, and other media available. Children can be shown exciting content in school and then take home DVDs or link electronically to specific materials that closely align with the content they learned that day. These electronic activities can be designed to be done with parents and children together, and can then inform parents about what children are learning in school. Also, in high-poverty homes children often have few if any books. Existing DVD or internet technologies can provide children with access to appropriate literature, which can be read to them by narrators or by their parents or older siblings.

Of course, technology will not replace the majority of early childhood teaching. Young children still need to manipulate real objects and learn to work with each other, sing songs, develop coordination and creativity, and practice appropriate behaviors. However, technology may add the capacity for teachers to show anything they want to their children and to link to the home in ways that have not been possible in the past, and this may result in enhanced learning at this critical age.

Expanding preschool access is a terrific idea, but it will take a lot of money and a long time to put into place. The possibility that it may take place should motivate immediate investments in innovation and evaluation, to develop new ways of ensuring that early education leads to enhanced preparation for success, especially for disadvantaged children.

Preschool quality should not just be seen as a question of per-pupil cost. Preschool educators and children need innovative, proven models that use modern teaching strategies and technologies that are appropriate to the developmental needs of four-year-olds. Innovation and research is needed to show the way as we head toward universal preschool.

Classrooms Need More Pizzazz

On a recent trip to London, I visited Cayley Primary School, a high-poverty elementary school that has been using our Success for All* whole-school reform approach for several years. The principal, Lissa Samuel, has been at this same school for many years before and after it adopted Success for All. She is proud of the achievement gains, which include a jump from 30% to 80% of students passing sixth-grade reading assessments. During our conversation, though, she talked more about how disciplinary problems, fights, and stealing had completely disappeared. Success for All has very good approaches to classroom management and social-emotional learning, and Ms. Samuel thought these had helped. But even more powerful, she thought, was the effect of success itself. Kids who feel confident, engaged, and motivated to learn do not act out.

The importance of this observation, which I’ve heard in many, many schools, is profound. Especially at the policy level, I often encounter a belief that the path to improving outcomes on a broad scale is to use test-based accountability that force teachers to align their instruction with desired outcomes. If students are bored or resistant, then teachers should use effective classroom management methods that keep them in control.

Teachers do need a deep understanding of classroom management methods designed to prevent behavior problems, and then they need to be ready with effective responses if students misbehave despite good preventive efforts. Yet using classroom management methods to get students to attend to boring lessons is shoveling against the tide. The key ingredient in effective lessons isn’t alignment, it’s pizzazz: excitement, engagement, challenge.

How do you create pizzazz? Well-structured cooperative learning helps to engage students with each other in jointly learning context. Stimulating video content can add to excitement and understanding. Hands-on experimentation helps a lot when appropriate, as does competition between teams or against the clock.

Cayley Primary was full of pizzazz. Its mostly Bangladeshi students worked eagerly in four-member teams. They took turns reading to each other and helping each other with difficult words. Their teachers called on “random reporters” to represent their teams, and teammates prepared each other, not knowing which of them might be randomly chosen to play this role. Brief, humorous videos introduced letter sounds and sound-blending strategies to first graders. Throughout the school, students were invariably kind and helpful to each other. An observer who did not know the history might think that classroom management was not necessary in such a school, but it was proactive use of pizzazz that got it to where it is, and makes it all look easy.

Classroom management strategies matter, of course, but pizzazz matters more. Motivated, engaged, challenged, and successful students are well-behaved, not because they’ve been threatened but because they are too busy engaged in learning to misbehave. The goal of classroom management is not quiet classrooms, it’s productive students. Using pizzazz to motivate and engage kids in learning valued content is the way to manage classrooms toward accomplishing the real goals of education.

*Robert Slavin is Chairman of the Board of the Success for All Foundation

For the latest on evidence-based education, follow me on twitter: @RobertSlavin