Lies, Damn Lies, & Statistics
Posted on February 28th, 2009
I was doing some basic research this weekend to attempt to see how Arkansas ACT scores have changed as a function of rising expenditures. While looking at the numbers I noticed a startlingly unfair comparison between the states created by the differences in the level of participation. Here is the raw data:
If you spend much time looking at that information you will notice that for several years Colorado & Michigan had 100% of graduates tested. Since research has shown that the ACT scores are an essentially “normal” statistical distribution and those states have a 100% participation, then the true statistical “mean” for each of those states would be virtually equal to the “Average Composite Score.”
However, now consider Massachusetts. That level of testing was 17% in 2008 and less in preceding years. While not precise, I think it would be fair for comparative purposes to assume that this 17% breaks down into 1% in upper 1st standard deviation, 14% between the upper 1st and 2nd standard deviations, and the remainder above the 2nd deviation. This means that the “true statistical mean” would probably be about 1 standard deviation to the left (lower!) of the Massachusetts “Average Composite Score.” This would mean that their ACS at 100% participation would be about 3 points lower or about 20.6 on the ACT.
From a policy perspective, the ability to compare the “true statistical mean” rather than an “average composite score of test takers” would be far more useful in making decisions. We have been led to believe that our policy decisions should be patterned after northeastern states with high composite averages, but if you consider the points above, I would bet on other states as better models. For example, consider Minnesota, Iowa, Wisconsin, Nebraska, South Dakota, and Kansas have relatively high participation rates and relatively high scores. Would those states make better models than Massachusetts and Delware? I think so. However, even those states are probably suppressing their lower statistical “tails.” Not a practice that leads to good comparisons without a conversion process to convert the ACS to a “true mean.”
As further proof of what I suspect, consider the following point cloud graphs I created from the data. There is a cluster of states that only have a participation rate under 20% that probably only test their top performing students. There is a cluster of states that have a participation rate from 60% to 80% that probably test all but their lowest performing students. In a unbiased random sample, the trendline on these charts should have zero slope and a larger scatter at the lower participation rates. Clearly, there is substantial bias introduced by the testing participation policies of the various states, rather than the actual average performance of the students and teachers in those states.


Tags: education policy
Filed under Uncategorized |
Comments are closed.

