analyze the american national election studies (anes) with r

on election days in the united states, the news media peppers its coverage with quick, dirty exit polls that allow them to make coarse statements like, "x% of demographic group y voted for candidate z."  the american national election studies are the scientific community's response to those haphazard polls, for those of us who care more about having the number right than having the number right away.  available every presidential election since dewey defeated truman and every off-year congressional election since eisenhower's first term, the anes has released a data set so that professional researchers, political junkies, partisan hacks could seriously figure out who voted for who.  and if any of you out there are personally running for office, consider this your best source of information to view the demographics and behavior of split-ticket voters.

although it might lag behind the published microdata, berkeley's sda (survey documentation and analysis) online query tool has a few of the anes data files hot and ready for crosstabulation and simple regression.  before diving into either sda or the r code, perhaps review the available topics - with weighted proportions over time - posted on the main electionstudies.org website.  you won't be able to access any demographic breakouts there, but it's the quickest way to view the ross perot anomaly.

choose which microdata file to work with after carefully reading your four study choices.  you could review the frequently asked questions as well, but only if you promise me you won't read anything into spss.  most american national election studies generalize to all eligible voters in the united states, confirm the sample universe on the `weights summary` section of your selection.  and have fun.  have fun.  this new github repository contains four scripts:

download and import.R

analysis examples.R

replicate table one.R

replicate table two.R



click here to view these four scripts



for more detail about the american national election studies (anes), visit:


notes:

as you'd expect with any survey dating back to 1948, some of the weighting and confidence interval calculations have changed over time.  with five notable exceptions (see table one), the main anes data sets did not start including a sampling weight until 1992 - when it became the norm.  to further complicate your life, the more recent data sets include both a pre- and post-election weight.  if no weight variable exists, just add a column of all ones and make that your weighting variable - matching what they've done in the multi-year cumulative file.

if you only care about specific points-in-time (one of the cross-sectional time series studies), then simply find four variables to construct a taylor-series design: the strata variable, the primary sampling unit (also called the psu or cluster) variable, the pre-election weight, and the post-election weight.  as stated at the bottom of this page, if your analysis only involves questions asked during the pre-election portion, use the pre-election weight (the unweighted sample will be larger) - but if you're looking at any variables collected during the post-election interview, use the post-election weight instead.  next, look for the cluster and strata variables.  sometimes they're mushed together into a single variable and will need to be extracted with a simple recode like `stratum = substr( v040103 , 1 , 2 )` and `secu = substr( v040103 , 3 , 3 )`  for some of the older studies, these variables are not available - and your standard errors may be misleadingly small.

if you're analyzing the cumulative file, they've prepared a few multi-year columns of all weights.  e-mail anes@electionstudies.org and ask for cluster and strata variable advice.  there's also a weighting anomaly back in the 1970 file that's outlined in the main how-to guide, but in order to understand the three weight options, you actually gotta read the middle paragraph on the 1970 study design page.


confidential to sas, spss, stata, and sudaan users: and saber-toothed tigers probably laughed when they saw the first humans crossing the bering strait.  don't be a saber toothed-tiger.  time to transition to r.  :D

2 comments:

  1. Hi Anthony,

    I left a previous comment which, due to an enormous brain fart, is complete gibberish. Nonetheless, I am having difficulty importing the "anes_mergedfile_1992to1997.dta" file with read.dta(). My R Session aborts. Also, when I try to import the "anes_mergedfile_1992to1997.por" file with spss.portable.file() finction, I get the message:
    "Error in as.data.set(spss.portable.file(fp)) :
    error in evaluating the argument 'x' in selecting a method for function 'as.data.set': Error in parseHeaderPorStream(ptr) :
    unknown tag "5" found in line 5215 offset 11"

    I do not know if this information is of value to you and/or whether other individuals are having similar issues with some of the files. (The first 16 in the constructed download list downloaded fine - but I started having issues after that...)

    R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
    Copyright (C) 2013 The R Foundation for Statistical Computing
    Platform: x86_64-w64-mingw32/x64 (64-bit)

    ReplyDelete
    Replies
    1. the r core team recently fixed a bug in the foreign package's read.dta function..

      install.packages("foreign")

      ..should solve this problem :D

      Delete