analyze the public libraries survey (pls) with r

each and every year, the institute of museum and library services coaxes librarians around the country to put down their handheld "shhhh..." sign and fill out a detailed online questionnaire about their central library, branch, even bookmobile.  the public libraries survey (pls) is actually a census: nearly every public library in the nation responds annually.  that microdata is waiting for you to check it out, no membership required.  the american library association estimates well over one hundred thousand libraries in the country, but less than twenty thousand outlets are within the sample universe of this survey since most libraries in the nation are enveloped by some sort of school system.  a census of only the libraries that are open to the general public, the pls typically hits response rates of 98% from the 50 states and dc.  check that out.

laughably easy files to work with, these microdata do not require the r survey package or any of the batman-like statistical tools seen in the other public use file folders.  as confirmed by one of the administrators of this survey, your analysis can simply tabulate, sum, average, whatever else using the base commands in r rather than complex sample survey design commands.  since these data sets are the universe rather than a sample, i've foregone a set of analysis examples.  if you want to do something, search stackoverflow with an [r] tag.  no survey design assembly required.  this new github repository contains two scripts:

download all microdata.R
  • download each zipped year of data onto your local computer
  • load a trifecta of tables into RAM
  • save all three data.frame objects as an R data file (.rda)

replicate imls publications.R

click here to view these two scripts

for more detail about the public libraries survey (pls), visit:


plainly described at the bottom of pdf page 6 of the technical documentation, each year of microdata gets released as three tables: a table of library systems (where new york city public libraries would have one entry), a table of library buildings (where new york city public libraries have one entry per branch), one table of states (where all libraries in new york state get collapsed into one).  imls takes care not to disclose stuff like salary information of individual employees, and the more-aggregated tables require less confidentiality-related-data-squelching.  if you need microdata sans suppression, apply for the restricted use files.

confidential to sas, spss, stata, sudaan users: you are using the blockbuster video of statistical languages.  time to transition to r.  :D