Word Attack: “Value-Added” (Updated)
UPDATE: Aaron Pallas followed-up today (8/5/10) on his last post about IMPACT, when some people at DCPS and Mathematica claimed he was misrepresenting the procedures used to calculate teachers’ value-added scores…by directly referencing DCPS’s official explanation of it. Apparently, the truth about it is all too complicated for us regular folks to understand, but we are to trust that they know what they’re doing because they’re experts. Also, DCPS is “considering releasing” a technical report explaining the regression analyses their Mathematica consultants’ use to calculate the scores “once it’s finalized.” How they’re comfortable firing people using a system that isn’t quite finished, and that they don’t seem to completely understand, is completely beyond me. I’m absolutely floored. –Sabrina
News that 241 teachers had been fired in DC under the new IMPACT system, combined with several inbox hits about new information and concerns about measuring student growth using standardized test scores (thanks, friends & fam!), have convinced me that it’s time to dig a little deeper into the concept of value-added assessment in schools.
Supporters of value-added assessment present it as an improvement over the more traditional use of Data in judging teachers and schools. Instead of using the straightforward scores students generate on standardized tests, a value-added model–theoretically, at least– measures the amount of growth students have made over the course of a given year. This is usually calculated by comparing last year’s performance to this year’s performance, and then subtracting students’ actual growth from their predicted* growth. (*This piece requires a complex algorithm that attempts to control for the effects of things like race, poverty, and the like.) Recognizing that students enter each classroom with different amounts of knowledge, and that low-income students and schools will almost always appear lacking in a straightforward comparison, the value-added framework uses demonstrated growth to judge teacher and school effectiveness. That way, credit can be given for moving students along, even if their overall achievement level remains comparatively low.
This is all well and good in theory. The problem occurs in practice. Current methods of calculating “value-added” are riddled with issues.
In their July 2010 report, “Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains“, researchers from Mathematica Policy Research (working on behalf of the US Department of Education) concluded that when growth is computed using commonly used procedures, error rates for teachers and schools being judged on three years worth of student data will be around 25%; expected error rates rise to 35% when using just one year of student data (as is the case in Washington, D.C. and other states that are beginning to use Data in their teacher evaluation systems). In other words, even when using three years of data, 1 out of 4 “average” teachers (or schools) will be falsely identified as either so exceptional as to merit a reward, or so terrible as to merit dismissal or other sanctions. (Jay Mathews of the Washington Post recently touted a D.C. teacher’s blog. His post explaining the implications of the study’s findings does a far better job of explaining the study than I will, so please read it!)
Speaking of D.C. in particular, professor Aaron Pallas wrote a guest post on The Answer Sheet last week to explain his concerns about the problems with D.C.’s IMPACT system, one of the most well-known examples of this form of assessment. He says it’s possible that some teachers were unfairly fired and others mischaracterized as “minimally effective” because the scoring procedures used to produce that data are, in his words, “idiotic.” “These procedures warrant this harsh characterization,” he writes, “because they make a preposterous assumption based on a misunderstanding of the properties of the DC Comprehensive Assessment System (DC CAS).” He goes on to explain just how odd the DC CAS’s scoring system is. Unlike the SAT or the NAEP, where scores are converted to the same scale and are thus comparable to each other across years and/or grade levels, the DC CAS has separate scales for each grade level, which cannot be directly compared to one another.
Subtracting one score from another only makes sense if the two scores are on the same scale. We wouldn’t, for example, subtract 448 apples from 535 oranges and expect an interpretable result. But that’s exactly what the DC value-added approach is doing: Subtracting values from scales that aren’t comparable.
The result is that actual performance from year to year can be so badly distorted that “[a] fifth-grade student who got every question wrong on the reading test at the end of fourth grade and every question wrong at the end of fifth grade would show an actual gain of 500–400=100 points.”
How weird is that? If we assume teachers and schools can be fairly judged on the basis of score gains or declines, then in this case the teacher and school would be given credit for a 100-point boost in a child’s learning, when in fact there is no evidence that this child learned at all. Likewise, as Pallas points out, if a fifth grader was retained in grade, earning 510 points the first time and 530 the second time, the teacher and school would only get credit for a 20 point boost. The teacher who did nothing would look great, while the teacher who helped a child learn 20 points-worth (whatever that means…) would look like a failure in comparison. Of course, we need to keep in mind another issue raised in the IES study mentioned earlier– “…more than 90 percent of the variation in student gain scores is due to the variation in student-level factors that are not under the control of the teacher.”
Now there’s the rub. What, exactly, is “value-added” all about, if 90+ percent of the changes in student performance can’t be attributed to the teacher?
From the IMPACT guide (Note: This is from the guide for schools, though the guide for teachers with individual value added is essentially identical, substituting the word “teacher” for “school”):
To explain value-added, it might be helpful to consider the following hypothetical scenario. Consider School A. Suppose that 50% of School A’s students were proficient on the math portion of the DC CAS at the end of last school year. Now suppose that 60% are proficient at the end of this school year. The change in proficiency for School A’s students would be 10 percentage points (50% to 60%).
Now let us consider School B. Just like School A’s students, suppose that 50% of School B’s students were proficient in math at the end of last school year. But let us suppose that 65% are proficient at the end of this school year. The change in proficiency for School B’s students would be 15 percentage points (50% to 65%).
Which school was more successful? School B’s students grew more so we might be inclined to say that it was the more successful school. But what if none of School B’s students qualified for free or reduced price lunch, received special education services, or were English Language Learners? And what if all of School A’s students qualified for free or reduced price lunch, received special education services, and were English Language Learners?
Value-added attempts to address this complicated scenario by “controlling for,” or taking into account, certain data about the students in a school (e.g., the percentage who qualify for free or reduced price lunch). *It does this by creating a “predicted growth” for each school. Each prediction is different as it depends on the characteristics of the students in a school.
To determine the value-added measurement, the predicted growth is compared against the students’ “actual growth.” The actual growth is simply how much the students actually gained from one year to the next on the DC CAS. The difference between the predicted growth and the actual growth is the value-added measurement. High value-added schools are those whose students’ actual growth exceeds their predicted growth. Those schools are “beating the odds” for their students.
Compare that to DCPS’s Core Beliefs (emphasis added):
We believe that:
- All children, regardless of background or circumstance, can achieve at the highest levels.
- Achievement is a function of effort, not innate ability.
- We have the power and the responsibility to close the achievement gap.
- Our schools must be caring and supportive environments.
- It is critical to engage our students’ families and communities as valued partners.
- Our decisions at all levels must be guided by robust data.
See the disconnect? Chris does. On the one hand, DCPS and others who take student characteristics into account when they calculate value-added recognize that those non-school characteristics– poverty, learning differences, language status, etc.– affect how much students learn in school. On the other hand, they’re saying that all children can “achieve at the highest levels,” and they are holding school personnel responsible for ensuring that they do (as scores of teachers in D.C.–like their counterparts in Central Falls, South Central Los Angeles, and elsewhere– have recently found out). Or, as he puts it, “All students can learn at the highest levels regardless of circumstance, but we’re predicting some of you are not going to be able to overcome these difficult circumstances and because of all that we’re going to hold your teachers responsible for your failure.”
“Value-added assessment” is meant to correct some of the problems associated with using test scores to evaluate teachers and schools. In promoting it, proponents are acknowledging the unfairness of evaluating all students, teachers, and schools on the basis of an absolute score that doesn’t account for students’ initial performance, or for the advantages and disadvantages they bring with them to the testing situation. But the methods commonly used to determine what “value” teachers and schools have added are still deeply flawed, and they still fail to completely correct for a much bigger problem– the erroneous assumption that teachers and schools are responsible for the majority of student performance as measured by standardized tests. Though researchers and even test-makers understand this, district and government officials are attempting to use this Data to label and manage teachers and schools, a use for which these tests are simply inappropriate.
Yes, teachers and schools can and should be held accountable for helping students learn. But standardized tests are still a problematic way to measure their progress in doing so. Perhaps there will come a day when someone develops a stellar 21st Century test with an equally fair scoring system to allow for truly objective measures of “effectiveness.” But that day has not yet arrived, and anyone claiming otherwise is unsupported by the facts.
ETA: Here is a great video by Professor Daniel Willingham that quickly and clearly describes six big problems with “value-added” measures.