analyze the program for international student assessment (pisa) with r and monetdb

the authoritative source for evaluating educational achievement across nations, the program(me) for international student assessment ranks the math, science, and reading skills of 15-year-olds in more than sixty countries.  coordinated by the organisation for economic co-operation and development (oecd) and released every three years, this data set gives finland reason to gloat and anti-poverty advocates in the united states reason to fight.  participating countries must sample at least 5,000 teenagers, though some governments survey many more in order to provide education researchers with enough of a sample to perform within-country comparisons.  in the world of cross-border standardized testing, this is the big momma.

to understand what's possible with pisa, either visit the international products page or - if you only care about one country - start on the participating economies page and click through to the country-specific website (so here's america's).

instead of processing the pisa microdata line-by-line, the r language stoically attempts to read everything into memory at once.  to avoid the unpleasantness of a seized-up computer, dr. lumley wrote the entire sqlsurvey package (to deal with this monster), and i tweaked, pruned, manicured that code to work on multiply-imputed big survey data.  if you're already familiar with syntax used for the survey package, be patient and read my sqlsurvey examples carefully when something doesn't behave as you expect it to - some sqlsurvey commands require a different structure (i.e. svyby gets called through svymean) and others might not exist anytime soon (like svyolr).  gimme some good news: sqlsurvey uses ultra-fast monetdb (click here for speed tests), so follow the monetdb installation instructions before running my code.  monetdb imports, writes, recodes data slowly, but reads it hyper-fast.  a magnificent trade-off: data exploration typically requires you to think, send an analysis command, think some more, send another query, repeat.  importation scripts (especially the ones i've already written for you) can be left running overnight sans hand-holding.

pisa is a pita to analyze, because it's both multiply-imputed (like the survey of consumer finances) and big data (like the american community survey).  to help researchers deal with that complexity, the twentieth-century-dwelling statisticians at oecd wrote sas macros and spss functions as part of their analysis manual.  well guess what?  those languages are prohibitively expensive, so i've done gone and translated everything over to the r language, precisely reproducing their published results, then automating the download and importation into everybody's favorite monetdb.  say buh-bye to buying proprietary statistical software.  this new github repository contains five scripts:


download import and design.R
  • create the batch (.bat) file needed to initiate the monet database in the future
  • download, unzip, and import each file for every year and size specified by the user
  • split all `plausible value` variables into five, yeah, five tables to account for the uncertainty of imputed responses
  • create a well-documented block of code to re-initiate the monetdb server in the future

analysis examples.R
  • run the well-documented block of code to re-initiate the monetdb server
  • load the r data file (.rda) containing all five, yup, five replicate weight designs for the 2012 file
  • detour and coerce a numeric variable to categorical, then match some compendium statistics in the ict file.
  • perform the standard repertoire of analysis examples, using a jolly mix of sqlsurvey and custom functions

variable recode example.R
  • run the well-documented block of code to re-initiate the monetdb server
  • copy the five, yes, five tables to maintain the pristine originals
  • add a new categorical variable by hand
  • re-create, then save the multiply-imputed sqlsurvey object from these sparkling new tables
  • close everything, then load everything back up in a fresh instance of r
  • run a simple statistic on the brand new variable

extract specific countries.R
  • run the well-documented block of code to re-initiate the monetdb server 
  • subset the 2009 student-interviews file to only the nation of brazil, read those records into working memory
  • save and then re-load a multiply-imputed brazil-only survey design object that no longer requires monetdb
  • match a brazilian statistic and standard error in the oecd's official technical documentation

replicate oecd publications.R
  • run the well-documented block of code to re-initiate the monetdb server
  • load the r data file (.rda) containing the five, yay, five designs for the 2009 file
  • match every type of statistic in the oecd's official technical documentation



click here to view these five scripts



for more detail about the program for international student assessment (pisa), visit:

if you're just looking for a couple data points, you ought to give the australian council for educational research's interactive data selection tools a spin.  it's a menu-drive table creator, so easy-to-use but inflexible.

you wouldn't be analyzing the program for international student assessment right now without the work of not one but two dr. thomas lumleys.  (or, in latin, lumlii)  if you decide to hand-write a thank-you letter for all of their hard work using jefferson's polygraph, you won't even need to switch out the paper to fill in specific names.  just another example of the unparalleled efficiencies you'll find when working in the r language with monetdb.

confidential to sas, spss, stata, and sudaan users: you are kissing the wrong frogs.  time to transition to r.  :D

2 comments:

  1. Hi Anthony,

    I've tried to run to codes, and when I run lines such as those of MICombine() functions, sometimes there is an error like the following:

    In R:
    Error in .mapiRequest(conn, paste0("s", statement, ";"), async = async) :
    error writing to socket (10054)
    Error in .mapiRequest(conn, paste0("s", statement, ";"), async = async) :
    error writing to socket (10054)
    Error in .mapiRequest(conn, paste0("s", statement, ";"), async = async) :
    error writing to socket (10054)
    >
    ========================================
    In the console:

    # MonetDB 5 server v11.17.9 "Jan2014"
    # Serving database 'pisa', using 4 threads
    # Compiled for x86_64-pc-winnt/64bit with 64bit OIDs dynamically linked
    # Found 7.913 GiB available main-memory.
    # Copyright (c) 1993-July 2008 CWI.
    # Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
    # Visit http://www.monetdb.org/ for further information
    # Listening for connection requests on mapi:monetdb://127.0.0.1:50007/
    # MonetDB/JAQL module loaded
    # MonetDB/SQL module loaded
    >!FATAL: 40000!COMMIT: transation commit failed (perhaps your disk is full?) exi
    ting (kernel error: !ERROR: BATsubselect: invalid argument: b must have a dense
    head.
    )
    Press any key to continue . . .
    =======================================

    ReplyDelete
    Replies
    1. you are unfortunately hitting a monetdb bug that was *introduced* in the "Jan2014" version (http://bugs.monetdb.org/show_bug.cgi?id=3443)

      this bug will be fixed in the next version of monetdb (Jan2014-SP1) but it *also* does not occur in the prior version, so until the monetdb folks release Jan2014-SP1, you can avoid it by installing the version Feb2013-SP6 (the windows folder is located here- http://www.monetdb.org/downloads/Windows/Feb2013-SP6/)

      my apologies for the hassle :/

      Delete