The study, “Student Test Scores: How the Sausage Is Made and Why You Should Care,” was written by Brian A. Jacob, a University of Michigan professor and senior fellow at the Brookings Institute. What he figured out is definitely worth noting in this day and age of school reform, data collecting, and accountability and where teacher effectiveness is based on the standardized test scores of students:
“Contrary to popular belief, modern cognitive assessments–including the new Common Core tests–produce test scores based on sophisticated statistical models rather than the simple percent of items a student answers correctly. While there are good reasons for this, it means that reported test scores depend on many decisions made by test designers, some of which have important implications for education policy.
For example, all else equal, the shorter the length of the test, the greater the fraction of students placed in the top and bottom proficiency categories–an important metric for state accountability. On the other hand, some tests report ‘shrunken’ measures of student ability, which pull particularly high- and low-scoring students closer to the average, leading one to understate the proportion of students in top and bottom proficiency categories. Shrunken test scores will also understate important policy metrics, such as the black-which achievement gap–if black children score lower on average than white children, then scores of black students will be adjusted up while the opposite it true for white students.
The scaling of test scores is equally important. Despite common perceptions, a 5-point gain at the bottom of the test score distribution may not mean the same thing in terms of addition knowledge as a 5-point gain at the top of the distribution. This fact has important implications for the value-added based comparisons of teacher effectiveness, as well as accountability rankings of schools.
There are no easy solutions to these issues. Instead, there must be greater transparency of the test creation process, and more robust discussion about the inherent trade-offs about the creation of test scores, and more robust discussion about how different types of test scores are used for policy-making as well as research.”