researchers interested in studying healthcare patterns among our elderly, disabled, or poor can go to the centers for medicare and medicaid services for all sorts of up-to-date utilization data. but what if you want to study the behavior and spending of everyone else? you could look at the medical expenditure panel survey (meps), the consumer expenditure survey (ce) or the national health interview survey (nhis), but there's an attrition problem with those - anyone who suddenly falls expensively-ill also starts slamming the door on follow-up survey interviews. and that's understandable - who wants to respond to a government questionnaire when you're struggling with a serious health condition? american healthcare surveys are biased at the tail - they don't capture our sickest very well.
think about it some more: we have single-payer healthcare for our elderly (medicare), disabled (medicare again), and poor (medicaid), meaning there's a government agency that's got all that data in one place. every claim paid by the government is just hanging out in baltimore, waiting for you to come a knockin'. and there's no non-response bias with government healthcare claims data - the united states government knows exactly how much the united states government paid on your behalf, whether or not you agreed to respond to somesuch survey. doctors submit bills pretty consistently, after all. so the utilization patterns of medicare and medicaid beneficiaries are stored in a central location, standardized, and available for purchase or (with limitations) for immediate download. but in a heavily-privatized medical industry like ours, what do you do when you want to explore the purchasing patterns of everyone else? well, you still probably look at meps or ce. but if your research question is hyper-focused on the dist-ri-bu-tion of medical claims among the privately-insured, well hey, the distribution of medical claims in mlces is much more realistic than what you'll find in survey data. yes, it's old. yes, it's only composed of claims from seven insurers and not every private insurer covering every covered life in the united states. and yes, it might even have a y2k bug or two. but for publicly-available medical claims for the privately insured in the united states of america, well, take it or leave it. this new github repository contains two scripts:
1997-1999 mlces - download.R
- download each zipped year of data onto your local computer
- load the entire table into RAM
- save the condensed file as an R data file (.rda)
replicate soa publications.R
- produce the control counts and totals at the bottom of this document
- replicate the 1997 statistics shown in table iv-a of this excel package
click here to view these two scripts
for more detail about the medical large claims experience study (mlces), visit:
this data set is not generalizable to any recent population of americans. its chief value is its relationship to itself - the distribution of medical spending, especially at the extreme values. in caveman speak: percentages good, totals bad.
confidential to sas, spss, stata, sudaan users: the best things in life are free. time to transition to r. :D