Once you've looked at that link & graphs, the more interesting points need to be considered. First, note this sentence by Tabarrok:
Buchanan received 0.78 percent of the vote in Palm Beach County. By comparison, he received an average of 0.46 percent of the vote in the other Florida counties.
Now let me point out that Buchanan actually got 17465 votes out of a total of 5957092 votes in the state for a total of 0.293%. Above Tabarrok claims 0.46%. What the heck? Both are correct. The total population mean is in fact 0.293%, but the sample mean (view each of the 67 counties as a sample, cumulatively sampling the entire population once) is in fact 0.46. The sample means have a Stdev of 0.32324, which means that the population mean estimated from the exhaustive sampling (usually much smaller samples are used) is fully 0.55 std dev's from the true population mean. This is rather startling given the large sample size (67) and the fact that it exhaustively samples the entire 5M + population.
This leads to the obvious conclusion that the samples are somehow not sampling a uniform population. On a little bit of reflection, it becomes obvious, that the correct first step of normalization is not sufficient here. The 2'nd plot of Tabarrok's, which makes Palm Beach look OK, is in fact deceptive itself. The problem lies in the assumption of sampling normalcy and what the x-axis of the graph conveys. In both plots shown, the x-axis merely represents sample position. It is in fact the alphabetical order of the counties. Seems harmless. In fact, it is one method of making the data look random, and random is what Tabarrok wants you to see. You could of course sort the x-axis by say either ascending or descending order of the %vote, and in fact, you could also single out Palm Beach and sift it's position was the data was sorted, just to make it stand out. Any method of such sorting is equally valid, because the x-axis does not in fact convey any information here, it is simply sample number, and any ordering of sample number is just as valid as any other ordering. Random ordering, and there are many such combinations, make the data look nicely random.
Instead, you need data pairs of total vote and %vote for Buchanan. If you plot %vote for Buchanan vs total county vote, the x-axis now means something. The resulting plot is well worth looking at. I provide you the raw data below in comma delimited format, so you can plot it in Excel. The first column is the actual Buchanan vote, the second is total county vote, and the third is %Buchanan vote for each county. I've already sorted in ascending county vote order, but you will need to plot the third column against the 2'nd column, so the order is not important.
Once you do this, toss out Palm Beach (it is obvious) and compute some stats on the remaining 10 largest counties. I get a mean of 0.21347 and a std dev of 0.071175. Palm Beach clocks in at 0.788136 which is 8.07 std devs away from the mean. Please note that a linear regression which would show the slightly negative slope with size, would make this even worse (IIRC close to 10 std devs). Then look at where all the samples that are > Palm Beach lie. They are all clustered down near zero on the X-axis. LOL! This is one chart that I kept on my wall for a couple of years to remind myself how screwy “statistics” can be when one does not look at the details. Enjoy!
The data:
Buchana Votes, Total Votes, Buchanan % 39,2410,1.618257261 10,2505,0.399201597 9,3365,0.267459138 37,3826,0.967067433 23,3964,0.580221998 33,4644,0.710594315 29,4666,0.62151736 90,5174,1.739466564 29,5395,0.537534754 29,5642,0.514002127 71,6144,1.155598958 29,6162,0.47062642 30,6233,0.481309161 27,6808,0.396592244 76,7395,1.027721433 36,7805,0.461242793 88,8021,1.09712006 22,8138,0.270336692 73,8154,0.895266127 46,8587,0.53569349 65,8673,0.749452323 43,9853,0.436415305 108,12441,0.86809742 67,12724,0.526563974 38,14727,0.25802947 102,16300,0.625766871 120,18318,0.655093351 89,18508,0.480873136 114,22261,0.512106374 90,23581,0.381663203 148,26222,0.564411563 83,27111,0.306148796 47,33878,0.138733101 127,35149,0.361318956 105,49622,0.211599694 311,50319,0.618056798 145,55657,0.260524283 270,57200,0.472027972 186,57353,0.32430736 248,58805,0.421732846 229,60746,0.376979554 112,62013,0.180607292 242,65219,0.371057514 182,66896,0.272064099 267,70680,0.377758913 124,77989,0.158996782 263,85729,0.306780669 289,88611,0.32614461 122,92141,0.132405769 563,102956,0.546835541 282,103113,0.273486369 271,110221,0.245869662 502,116648,0.430354571 194,137634,0.140953543 570,142731,0.399352628 305,160942,0.189509264 532,168486,0.315753238 496,183256,0.270659624 305,184377,0.165421934 570,218395,0.260994986 652,264636,0.246376154 446,280125,0.159214636 847,360295,0.235085139 1013,398469,0.254223039 3407,432286,0.788135632 788,573396,0.137426839 560,625362,0.089548134 |