analyze the world values survey (wvs) with r

a global barometer of public opinion, the world values survey (wvs) clocks in as your best source of cross-cultural moods, attitudes.   you might find its most famous product sweepingly general, but who among us has never ever swept a smidgen of nuance under the rug?  if you want to explore complex international patterns of belief, now's your chance.

though their scientific advisory committee (sac) sets the ground rules and dictates the core content, individual national samples should be viewed as something of a confederacy of surveys.  carefully read the technical reports for any nations you dare to compare.  the homepage struck me as more personality-driven than that of other public use data sets.  but, really, who am i to judge?  if you care about religious fervency, gender equality, democracy, or even being grossly nationally happy, then the world values survey is the best source there ever will be.  this github repository contains two scripts:

download all microdata.R
  • impersonate a thirteen year old ukrainian boy, convince the archive that a human's doing the downloading
  • for-loop through every wave, every study, every nation
  • save each file to your local hard disk according to an easy-to-peruse structure

analysis examples.R
  • load a country-specific data set
  • construct a fake survey design object.  statistics and coefficients will be calculated correctly, but standard errors and confidence intervals generated off of this complex sample design should be ignored.  read the user note within the script for more four one one
  • examine the bejesus out of that survey design object, calculating every descriptive statistic possible

click here to view these two scripts

for more detail about the world values survey (wvs), visit:
  • geocities and myspace had a baby, and named it  i half expected a midi track to start up
  • wikipedia for much of the same content, but structured in a format you know and love


the administrators have neglected to produce microdata files that permit users to calculate confidence intervals using either of the most common survey analysis methods.  in other words, these data will give you a best guess, but you'll be in the dark about whether that guess is any good.  since there are no correct confidence intervals to match, i have not provided my usual replication script.  if you look in the "results" pdf file (not the "sample design" or "methodology" pdf files) for any nation, you'll find an "estimated error" somewhere around the second page.  this is a crude, dataset-wide measure of variance, but it's your only option to use as the standard error for any statistical testing.  this is a one-size-fits-all substitute for other more precise sampling error calculations like taylor-series linearization or replicate weightingyou could politely! request that they include clustering and strata variables on both future and historical files.  because awesome data can always get more awesome.

confidential to sas, spss, stata, and sudaan users: would you buy an imitation rolex if the real thing were free?  well look at your wrist because it's time to transition to r.  :D