This is a paper I wrote in early 2008. At the time it was original research, a topic that was hinted at and understood to be true, but without any real published work. A few months later, Bill Bishop's excellent 'The Big Sort' was published. This book dealt with some similar topics, but did not have the same level of detailed methodology that my paper did, and it was not as scientific in nature.
The reason why I am posting this now is as a response to this article on the Atlantic (Cities sub-magazine). This article misstates the differences in voting as 'Democrat' vs. 'Republican', rather than a simple preference of presidential candidates. People do not vote exclusively based upon party lines, this is abundantly clear. Anyway its not worth my time to go through it line by line so just read my work.
Do Urban Voters Favor Democrats?
A Case Study of the 2000 United States Presidential Election
Introduction:
The purpose of this project was to determine if voters living in urban areas in the 2000 United States Presidential Election were more likely to vote for the Democratic candidate Al Gore, rather than the Republican candidate, George Bush. It is a popular belief that people in the urban areas are significantly more liberal and therefore vote for the Democratic candidate rather than the Republican.
Methods:
Election results and levels of urbanization from 639 counties in 15 states were correlated. Data from the 2000 United States Census Summary File 3 were used to assess urbanization. Urbanization was defined as a ratio of the number of urban residents in a county divided by the total number of residents of the county. The resultant ratio could then range from 0-1. The election results were manually entered from the results posted on CNN.com. Third party candidate data were removed to create a two variable system. Counties with a third party candidate earning 10% or more of the vote were excluded from this analysis. Then a ratio of votes for a particular candidate divided by the total number of votes for the county was calculated to create another column with calculated values from 0-1. These numbers were then correlated to derive a Pearson’s product-moment correlation coefficient.
The definition of an “Urban Area” or “Urban Cluster”, which together make up Urban Population, is a census block or block group that has a density of 1000 people per square mile with surrounding blocks or block groups of similar density. The difference between them is the population size, where “Urban Area” is larger than “Urban Cluster”.
I felt that aggregating data at a countywide level would provide the most accurate measure of urbanization, and election results would correspond to this. Any aggregation level smaller than this would be either excessively time-consuming or limited in scope. Using state data would be problematic because states are physically large, populous, and varied in urbanization. Generalizing by state would be an ecological fallacy.
For the voting data I used CNN.com. These data were mostly reported on a county-by-county basis, although in some cases such as Connecticut, the reporting system differed. It was important to use voting data which matched the geographic break down of the census data.
I decided the most effective way to measure the correlation is to create a two-variable system, one being urbanization level, and the other, candidate selection. To compare the two principle candidates necessitated eliminating data from other candidates. In most cases the percentage of the population that voted for the third party was small, but in five to ten counties this exceeded 10%. These counties were excluded to avoid skewing the results. Other counties that were excluded included counties with fewer than 4000 voters.
There were other data management difficulties to deal with in this analysis. There are 3119 counties in the United States and so performing an analysis on that scale with would have been excessively time-consuming for this project. Additionally, in several cases the election data reported by CNN.com were on a different scale in some states than the rest of the United States. The data for several Midwestern states such as Illinois were reported by county, but also by city which created complications.
I felt that it would be most appropriate to select states that all together had a mean level of urbanization similar to that of the entire US. Also I selected states that had a ratio of Gore to Bush votes similar to nationwide election results. Several of the states selected such as Idaho, Utah and Wyoming heavily favored George Bush, while others like California, Delaware and Maryland strongly favored Al Gore. New Mexico and Oregon with were split almost exactly between the two candidates. The final sample included 639 counties with the number per state ranging from 3 to 109. The mean was 42.6, the median was 29 and the standard deviation was 30.98.
Microsoft Excel was used to create a spreadsheet with columns for votes for the two candidates and calculated columns for the percent of these total votes for each of the two major candidates respectively. The urban population of each county was calculated.
The null hypothesis was that urban voters did not differ in candidate selection from rural voters. The alternate hypothesis was that urban voters did differ in candidate selection from rural voters. Pearson’s correlation test was used to test these hypotheses
Results:
For the 639 counties there was an r-value of .22. There was a 1% chance of making a type I error. The null hypothesis was rejected with very little chance of being incorrect. To compare states to each other, r-values were calculated for these smaller data sets. The states that had significantly significant correlations were California, Colorado, Georgia, Indiana, Louisiana, Maryland, Oregon, and Wyoming. They were all positive correlations ranging from .23-.67.
Table 1
|
Counties
|
% Gore
|
% Urban
|
Pearson
|
Statistically Significant
|
Alabama
|
67
|
42.47
|
55.44
|
-0.14
|
No
|
Arizona
|
14
|
47.07
|
88.23
|
-0.21
|
No
|
California
|
52
|
57.60
|
94.76
|
0.59
|
Yes
|
Colorado
|
35
|
45.65
|
87.07
|
0.37
|
Yes
|
Delaware
|
3
|
56.85
|
80.02
|
0.97
|
No
|
Georgia
|
109
|
43.80
|
74.33
|
0.35
|
Yes
|
Idaho
|
24
|
29.86
|
70.74
|
0.33
|
No
|
Indiana
|
87
|
42.04
|
71.24
|
0.23
|
Yes
|
Louisiana
|
63
|
46.05
|
72.76
|
0.26
|
Yes
|
Maryland
|
24
|
58.65
|
86.07
|
0.67
|
Yes
|
Mississippi
|
74
|
41.73
|
49.38
|
0.20
|
No
|
New Mexico
|
23
|
49.88
|
76.31
|
-0.16
|
No
|
Oregon
|
29
|
50.53
|
79.23
|
0.50
|
Yes
|
Utah
|
19
|
28.43
|
89.54
|
0.09
|
No
|
Wyoming
|
16
|
33.63
|
67.66
|
0.64
|
Yes
|
TOTAL
|
639
|
50.16
|
83.23
|
0.35
|
Yes
|
NOTE: The Pearson’s r for “TOTAL” is from a correlation of the states not the counties
Table 2
|
Sample
|
Total US
|
Bush Votes
|
13886458
|
50456169
|
Gore Votes
|
13973898
|
50996116
|
Counties
|
639
|
3119
|
% Population Urban
|
83.2
|
80.6
|
Figure 1
Discussion:
There was a statistically significant difference between the likelihood of urban voters to favor Al Gore over George Bush. This was consistent across 8 of 15 states. While the difference was not large, the study data provide sufficient power to say with confidence that these findings are meaningful
This study would have been improved with the inclusion of all 50 states and countywide election results. Alaska does not report their election results by county so this may be difficult. I decided that 4000 voters was a good cut off point for inclusion because of the way in which this could have skewed election results. There may be a more appropriate number to use as a cutoff point to get better results.
The states used for this study are mainly in the West, South and Southwest. This regionally skewed data set should not be a big factor however because the ratio of voters who picked Gore and Bush in this study was very consistent with the overall national ratio. Likewise, if the level of urbanization in the states used for this study was different from the urbanization in the US, there could have been some problem.
The data were entered manually and there so is a chance of human error. Automating the data transfer would have reduced this source of error.
There were a few instances of apparent errors in CNN’s posted data. The county voting totals in some instances did not add up to equal the totals for the state. Results for California and Georgia showed statewide totals for the winning candidate that were lower than the totals of the counties added together would indicate. This error puts the reliability of data from those states into question.
Third party candidates were not included in this study to simplify the data analysis. Ralph Nader has been blamed for giving the election to George Bush because of the high likelihood that many of those who voted for him would have voted for Al Gore instead if the election had only two candidates. These voters would have been enough to give the presidency to Al Gore. Excluding the data from this candidate may have skewed the results slightly. It is unclear if those who voted for Ralph Nader were concentrated in urban or in rural counties so the effect on the correlation between voters for Gore and an urban area is uncertain.
The biggest possible flaw with the data used is the urbanization data. Some counties had very few residents yet had a significant number of “urban” residents. Most people would not count a county with less than 30,000 people as urban at all. The common perception of “Urban” is likely to be a little more exclusive than this. In some ways having a broader definition is good because it allows for a greater variation between the data points instead of having a lot of counties that have no urban residents. In one case there was a county with around 13,000 residents, and 15 people that lived in an “urban cluster”.
Conclusion:
There was a statistically significant correlation between the proportion of urban voters in a county and the proportion of voters who voted for Al Gore in the 2000 Presidential election. It is not a strong correlation but it is clear and definitive. The correlation was clear and statistically significant as well in more than half of the states using a smaller data set. The strength of these data suggests that the findings would not be changed by an analysis less limited by the possible errors identified.
HYPERLINK "http://www.cnn.com/ELECTION/2000/results/president/" http://www.cnn.com/ELECTION/2000/results/president/ Data Accessed: 04/25/08-04/30/08
Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont reported by township and city
80.6% is the percent of urbanization in the United States. My sample has an urbanization percent of 83.2
The overall percent of voters in the United States who selected George Bush in the adjusted (3rd party candidates removed) voting percent was 49.73%. The overall percent of voters who selected Al Gore was 50.26%. In my sample the percent of voters who selected George Bush was 49.84%. The percent who selected Al Gore was 50.16%.
The states selected were Alabama, Arizona, California, Colorado, Delaware, Georgia, Idaho, Indiana, Louisiana, Maryland, Mississippi, New Mexico, Oregon, Utah, and Wyoming
No comments:
Post a Comment