Showing posts with label 2000 Presidential election. Show all posts
Showing posts with label 2000 Presidential election. Show all posts

Monday, September 30, 2013

Urban and Rural Voting: 2000 Presidential Election Case Study


This is a paper I wrote in early 2008. At the time it was original research, a topic that was hinted at and understood to be true, but without any real published work. A few months later, Bill Bishop's excellent 'The Big Sort' was published. This book dealt with some similar topics, but did not have the same level of detailed methodology that my paper did, and it was not as scientific in nature. 

The reason why I am posting this now is as a response to this article on the Atlantic (Cities sub-magazine). This article misstates the differences in voting as 'Democrat' vs. 'Republican', rather than a simple preference of presidential candidates. People do not vote exclusively based upon party lines, this is abundantly clear. Anyway its not worth my time to go through it line by line so just read my work.


Do Urban Voters Favor Democrats? 
A Case Study of the 2000 United States Presidential Election


Introduction:
The purpose of this project was to determine if voters living in urban areas in the 2000 United States Presidential Election were more likely to vote for the Democratic candidate Al Gore, rather than the Republican candidate, George Bush. It is a popular belief that people in the urban areas are significantly more liberal and therefore vote for the Democratic candidate rather than the Republican. 

Methods:
Election results and levels of urbanization from 639 counties in 15 states were correlated. Data from the 2000 United States Census Summary File 3 were used to assess urbanization. Urbanization was defined as a ratio of the number of urban residents in a county divided by the total number of residents of the county. The resultant ratio could then range from 0-1. The election results were manually entered from the results posted on CNN.com. Third party candidate data were removed to create a two variable system. Counties with a third party candidate earning 10% or more of the vote were excluded from this analysis. Then a ratio of votes for a particular candidate divided by the total number of votes for the county was calculated to create another column with calculated values from 0-1. These numbers were then correlated to derive a Pearson’s product-moment correlation coefficient.

The definition of an “Urban Area” or “Urban Cluster”, which together make up Urban Population, is a census block or block group that has a density of 1000 people per square mile with surrounding blocks or block groups of similar density. The difference between them is the population size, where “Urban Area” is larger than “Urban Cluster”. 
I felt that aggregating data at a countywide level would provide the most accurate measure of urbanization, and election results would correspond to this. Any aggregation level smaller than this would be either excessively time-consuming or limited in scope. Using state data would be problematic because states are physically large, populous, and varied in urbanization. Generalizing by state would be an ecological fallacy. 
For the voting data I used CNN.com. These data were mostly reported on a county-by-county basis, although in some cases such as Connecticut, the reporting system differed. It was important to use voting data which matched the geographic break down of the census data.
I decided the most effective way to measure the correlation is to create a two-variable system, one being urbanization level, and the other, candidate selection. To compare the two principle candidates necessitated eliminating data from other candidates. In most cases the percentage of the population that voted for the third party was small, but in five to ten counties this exceeded 10%. These counties were excluded to avoid skewing the results. Other counties that were excluded included counties with fewer than 4000 voters. 
There were other data management difficulties to deal with in this analysis. There are 3119 counties in the United States and so performing an analysis on that scale with would have been excessively time-consuming for this project. Additionally, in several cases the election data reported by CNN.com were on a different scale in some states than the rest of the United States. The data for several Midwestern states such as Illinois were reported by county, but also by city which created complications. 
I felt that it would be most appropriate to select states that all together had a mean level of urbanization similar to that of the entire US. Also I selected states that had a ratio of Gore to Bush votes similar to nationwide election results. Several of the states selected such as Idaho, Utah and Wyoming heavily favored George Bush, while others like California, Delaware and Maryland strongly favored Al Gore.  New Mexico and Oregon with were split almost exactly between the two candidates. The final sample included 639 counties with the number per state ranging from 3 to 109. The mean was 42.6, the median was 29 and the standard deviation was 30.98. 
Microsoft Excel was used to create a spreadsheet with columns for votes for the two candidates and calculated columns for the percent of these total votes for each of the two major candidates respectively. The urban population of each county was calculated. 
The null hypothesis was that urban voters did not differ in candidate selection from rural voters. The alternate hypothesis was that urban voters did differ in candidate selection from rural voters. Pearson’s correlation test was used to test these hypotheses

Results:
For the 639 counties there was an r-value of .22. There was a 1% chance of making a type I error. The null hypothesis was rejected with very little chance of being incorrect. To compare states to each other, r-values were calculated for these smaller data sets. The states that had significantly significant correlations were California, Colorado, Georgia, Indiana, Louisiana, Maryland, Oregon, and Wyoming. They were all positive correlations ranging from .23-.67. 

Table 1

Counties
% Gore
% Urban
Pearson
Statistically Significant
Alabama
67
42.47
55.44
-0.14
No
Arizona
14
47.07
88.23
-0.21
No
California
52
57.60
94.76
0.59
Yes
Colorado
35
45.65
87.07
0.37
Yes
Delaware
3
56.85
80.02
0.97
No
Georgia
109
43.80
74.33
0.35
Yes
Idaho
24
29.86
70.74
0.33
No
Indiana
87
42.04
71.24
0.23
Yes
Louisiana
63
46.05
72.76
0.26
Yes
Maryland
24
58.65
86.07
0.67
Yes
Mississippi
74
41.73
49.38
0.20
No
New Mexico
23
49.88
76.31
-0.16
No
Oregon
29
50.53
79.23
0.50
Yes
Utah
19
28.43
89.54
0.09
No
Wyoming
16
33.63
67.66
0.64
Yes
TOTAL
639
50.16
83.23
0.35
Yes
NOTE: The Pearson’s r for “TOTAL” is from a correlation of the states not the counties


Table 2

Sample
Total US
Bush Votes
13886458
50456169
Gore Votes
13973898
50996116
Counties
639
3119
% Population Urban
83.2
80.6

Figure 1

Discussion: 
There was a statistically significant difference between the likelihood of urban voters to favor Al Gore over George Bush.  This was consistent across 8 of 15 states.  While the difference was not large, the study data provide sufficient power to say with confidence that these findings are meaningful
This study would have been improved with the inclusion of all 50 states and countywide election results. Alaska does not report their election results by county so this may be difficult. I decided that 4000 voters was a good cut off point for inclusion because of the way in which this could have skewed election results. There may be a more appropriate number to use as a cutoff point to get better results. 
The states used for this study are mainly in the West, South and Southwest. This regionally skewed data set should not be a big factor however because the ratio of voters who picked Gore and Bush in this study was very consistent with the overall national ratio. Likewise, if the level of urbanization in the states used for this study was different from the urbanization in the US, there could have been some problem. 
The data were entered manually and there so is a chance of human error. Automating the data transfer would have reduced this source of error.
There were a few instances of apparent errors in CNN’s posted data. The county voting totals in some instances did not add up to equal the totals for the state. Results for California and Georgia showed statewide totals for the winning candidate that were lower than the totals of the counties added together would indicate. This error puts the reliability of data from those states into question.
Third party candidates were not included in this study to simplify the data analysis. Ralph Nader has been blamed for giving the election to George Bush because of the high likelihood that many of those who voted for him would have voted for Al Gore instead if the election had only two candidates. These voters would have been enough to give the presidency to Al Gore. Excluding the data from this candidate may have skewed the results slightly. It is unclear if those who voted for Ralph Nader were concentrated in urban or in rural counties so the effect on the correlation between voters for Gore and an urban area is uncertain. 
The biggest possible flaw with the data used is the urbanization data. Some counties had very few residents yet had a significant number of “urban” residents. Most people would not count a county with less than 30,000 people as urban at all. The common perception of “Urban” is likely to be a little more exclusive than this. In some ways having a broader definition is good because it allows for a greater variation between the data points instead of having a lot of counties that have no urban residents. In one case there was a county with around 13,000 residents, and 15 people that lived in an “urban cluster”. 

Conclusion:
There was a statistically significant correlation between the proportion of urban voters in a county and the proportion of voters who voted for Al Gore in the 2000 Presidential election. It is not a strong correlation but it is clear and definitive. The correlation was clear and statistically significant as well in more than half of the states using a smaller data set.  The strength of these data suggests that the findings would not be changed by an analysis less limited by the possible errors identified.

  HYPERLINK "http://www.cnn.com/ELECTION/2000/results/president/" http://www.cnn.com/ELECTION/2000/results/president/ Data Accessed: 04/25/08-04/30/08
 Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont reported by township and city
 80.6% is the percent of urbanization in the United States. My sample has an urbanization percent of 83.2
 The overall percent of voters in the United States who selected George Bush in the adjusted (3rd party candidates removed) voting percent was 49.73%. The overall percent of voters who selected Al Gore was 50.26%. In my sample the percent of voters who selected George Bush was 49.84%. The percent who selected Al Gore was 50.16%.  
 The states selected were Alabama, Arizona, California, Colorado, Delaware, Georgia, Idaho, Indiana, Louisiana, Maryland, Mississippi, New Mexico, Oregon, Utah, and Wyoming


Tuesday, April 16, 2013

You are what you read

One of my school projects which I am most proud of is a statistical study of presidential voting patterns versus urban population in the 2000 US Presidential Election. I made a fantastically well-designed study with great methodologies, and I ended up finding statistically significant results (If someone actually cares I can explain to them why this study was so well-considered, but its a bit technical so I'll hold back for now). My research questions was whether urban voters (or counties heavy in "urban" populations), as designated by the US Census Bureau, voted for the Democratic candidate, Al Gore, or the Republican candidate, George W. Bush. The only significant problem with my study was Ralph Nader, but overall he did not receive too many votes (with the except of one county which I then had to exclude). Surprisingly, at the time of this project (I believe it was Fall 2007), there were ZERO serious papers that addressed the topic of urban versus rural voting patterns. It is always assumed that urban voters are more liberal because of their increased likelihood of being in a heterogenous environment, but at this time it had not been addressed. Since then journalist Bill Bishop wrote an excellent book, "The Big Sort" about how populations are clustering based on ideological values. 

The results of my study were that urban voters were statistically significantly more likely to vote for the Democratic candidate rather than the Republican candidate. Bishop's book took this idea a step further in saying that it is more than just urban/rural people that are clustering, but also within this areas a similar smaller-scale clustering is happening.

Recently I've become more and more active on Twitter. Though typically outspoken, and likely to tell someone when I disagree with them, I try to keep my arguments civil and merit-based, rather than personal. Yesterday someone who I almost always disagree with wrote something thoughtful and respectful (for a change) and so I wished to jokingly tell this individual that we finally agree on something, hell must have frozen over blah blah blah. When I went to this person's Twitter page it would not allow to me reply to the tweet or even send tweet in general. Apparently this means the person has blocked me. I know people often block other people that they find annoying or who have sent nasty messages (and spammers of course), and while I am not the most innocent of Twitter users, I found it shocking that someone has blocked me for no apparent reason. 

It really made me wonder why someone who openly speaks of themselves as progressive would actively work to silence someone with a dissenting opinion, no matter what that opinion is. If we are excluding people because we disagree with their politics, we are only causing further problems. I myself follow people and organizations I do not like or agree with, but because they command some respect from a variety of other people or may be important some how, I like to know what they are saying and thinking. Of course I am not perfect, and I really dislike MSNBC and Fox and will rarely watch either of them, but this is also because there are better quality sources with similar enough opinions that I am able to access. If you cut off a source and prevent yourself from being exposed to others how can you be considered progressive? How can you be considered "worldly"? How can you understand the other side if you do not even know it? 

People like this individual are a big problem for our society. They consider themselves "holier than thou", think their opinion is correct no matter what, and are unwilling to seek out those who may critique (and therefore IMPROVE) their own arguments. Once again I am not perfect in this matter, I can be quite immodest with my own beliefs and sense of self-worth, but the last thing I would ever want to be accused of is being disinterested in meeting and talking to people are different. We learn and we grow from our experiences, and if we have the same experience every day, what will we ever learn? Reading, television and internet are the most prevalent ways in which many of us are exposed to others, why not read or watch something new? You might learn a thing or two.