[Data Journal] “I don’t want to pay for other peoples’ healthcare”: Why the U.S. Pays More for Worse Healthcare

By Aaron Siter-Cohen, Brandon Hill and Charlie Ambler

Read the story: “I Don’t Want to Pay for Other Peoples’ Healthcare”: Why the U.S. Spends More For Worse Healthcare

The idea for this project was to address the quality and cost of the United States healthcare system. In order to do this, we decided to compare the U.S. to other similar developed countries in the OECD classification. The most relevant values to track for these countries were a normalized rate for GDP per capita and healthcare spending per capita. Along this line, the U.S. was a clear outlier for its much higher healthcare spending in relation to the way that healthcare spending increased as wealth increased in all other countries.

To address its outlier status, we decided to ask if the higher spending resulted in a proportionately higher quality of healthcare. To do so, we had to determine a way to measure healthcare outcomes. We went with infant mortality and life expectancy by birth year, because these measures are often used already by organizations that track healthcare around the world, and because the combination of those measures provides information about the start of life and the end of life. When we discovered that, despite its much higher spending, the U.S. performed worse than other similarly wealthy countries in both of these measurements, we began to research why. 

The two components of our research focused on the high cost of life-saving treatments and medications in the U.S., such as insulin, EpiPen and emergency department visits. The second component was to view healthcare expenditure and outcomes through the lens of the growing wealth gap in the U.S. 


By Aaron Siter-Cohen, Brandon Hill and Charlie Ambler

Download the data from the Kaiser Family Foundation National Health Tracker on healthcare expenditure per capita as a ratio of GDP per capita. 

Data Scraping

This story started off with data from the Kaiser Family Foundation National Health Tracker. We used a link in the story to download the data they used for their own visualization. 

Data Cleaning

Because this data was already included in a story, and we were using it for an identical measure, it was already sufficiently clean. However, we also included the 2022 data on average life expectancy to add context that addresses the varying results of different countries’ different rates of healthcare expenditure. 

Data Visualization

Using Tableau, we recreated a similar visualization as the one from the story we pulled from. We placed GDP per capita on the X-axis and Healthcare Expenditure per capita on the Y-axis. Then, using a calculated field (GDP per capita/healthcare expenditure per capita), we created a ratio for the number of dollars made for each dollar spent on healthcare. By setting this ratio to a size indicator in the visualization, the viewer is able to see at a glance how far off the spending was from the rest of the countries by how much larger the circle is. Finally, we added the life expectancy measurements to the chart by a color gradient of red to green. Any country in the red, would quickly be indicated to be below the median life expectancy of the measured countries. 

Defining Healthcare Expenditure and Health Outcomes 

By Aaron Siter-Cohen

  1. Data Scraping

Data on health indicators are luckily widely collected by governments and international organizations. Unfortunately, the formatting also varies widely depending on the source. While domestic statistics were relatively straightforward, spreadsheets from the OECD posed a challenge. CDC data on US domestic health indicators here. OECD data on international life expectancy here.

  1. Data cleaning

The OECD website allows you to download a lot of different data, but formats it in a way Tableau has trouble with. I copied and pasted the relevant data to a new spreadsheet and uploaded it to open refine for cleaning.

After Openrefine I re-downloaded a clean version and reformatted it so that Tableau could read the data. That format is below. That involved making three separate columns for each variable.

  1. Data visualizations

The visualization below on the left required me to sort through filters to select for 2019. Before that, Tableau added up all the age values. I briefly considered normalizing those values over time, however, I think these two images side by side show how looking at a single data point can hide the story.

The visualization on the right was a lot more complicated. Firstly, the graph started automatically at 0, making it difficult to see (right.) To fix this I had to change the Y axis to a fixed range from 72-85. Then I figured out how to separate the countries into groups and then select differentiated colors and gradients for each. The latter isn’t terribly helpful, but the interactive map allows for specific country selection.

The High Cost of Life-Saving Treatments

By Charlie Ambler

Insulin Dataset Breakdown:

  1. Scraping Data

The data set I chose for Insulin was the 2018 average price per standard unit for insulin in 33 countries. The first issue that I had was due to the fact that the document was in PDF format. Table Scraper was incompatible with this.


My next step was to convert this PDF into a word document in order to utilize Google Chrome’s Table Scraper application. Once I had a polished word document, I then took data from the chart “Table A.3. Average Price per Standard Unit, Overall and by Insulin Type, 33 Select OECD Countries, 2018.”

  1. Cleaning Data

Once I had an Excel formatted dataset, I then put the worksheet into Excel for cleaning purposes. The first issue was the far left columns. Many of the countries had incorrect spelling and spacing, thus causing issues with formatting in Tableau. I corrected these grammatical issues and deleted data fields that did apply to the overall Insulin price.

  1. Data Visualization



Once I put the Excel data worksheet into Tableau I realized that I had two major options: bar graphs or a block visualization. In order to maintain a sense of continuity throughout the project’s multiple visualizations, I decided to go with a bar chart. As seen in figure B., the chart was not increasing in height/magnitude nor was it in proper order. It was a quick and easy fix with the upper toolbar. 

EpiPen Dataset Breakdown 

  1. Data Scraping 

After a few days of research, I finally found a trustworthy source for information pertaining to EpiPen costs across the countries that pertain to the project’s interests. 

  1. Data Cleaning

My next step was to manually input the data I had uncovered from academic sources into a quick and small Excel spreadsheet. 

  1. Data Visualization 

This was a very basic step. I had two major categories of interest which replicated my previous visuals for Insulin costs. The major areas were cost and countries. 

ER Visits Dataset Breakdown 

  1. Data Scraping 

After days of researching, the dataset found on Health Care Cost Institute was the most fulfilling in terms of data needed to display ER visits and out-of-pocket costs which would smoothly transition into the third section, “Income Disparities in U.S. Healthcare.”

I downloaded the CSV file from the database. I converted it into the Excel file that was needed.

  1. Data Cleaning 

The data that was downloaded came in five sheets. The sheet that captured the project’s interests was “State_All_ER_Visits.”  This gave us overall costs for an ER visit for every single state. I deleted severity claims, claim types, network status and CPT codes due to its lack of necessity. Once I narrowed down my E and M data points within the overall state ER visits, I was ready to go into Tableau.

  1. Data Visualization 

Once the data was in Tableau, there were numerous problems that had to be resolved. For starters, the data set was on a state-by-state basis for ER prices. This had to be filtered from ‘state’ to ‘national.’ I de-selected all states and selected national as the major filter. The bar chart was still quite convoluted. I had to then make the bar graphs flow and not be split up by columns. I wanted a basic X and Y graph. I then had to make sure that Measure Names was a color-coded marker. By doing so, it allowed the chart to have a sense of mobility in terms of displaying an interface of choice. 

Income Disparities in U.S. Healthcare

By Brandon Hill

Download the data from the 2018 Medical Expenditure Panel Survey

Download the data from the U.S. Bureau of Labor Statistics

Data Scraping

Downloading the data from the 2018 Medical Expenditure Panel Survey (MEPS) was easy enough. The Agency for Healthcare Research and Quality makes the data available yearly on their website. 

The data from the U.S. Bureau of Labor Statistics was gathered using table capture from their 2018 report on healthcare expenditure. 

Data Cleaning

Cleaning the MEPS data was the bulk of the work on this section of the project. The spreadsheet contained over a thousand variables for a survey of more than 30,000 individuals. Each variable is listed by a specific code, and each code is referenced in a 200 page .pdf file that describes the methods of collection with another code. 

For example, one of the variables that I needed for the story, was the reported yearly income of the individuals surveyed. So I read the section of the guidebook on the methods of income recording and found the corresponding code: “TTLP18X” for “person’s total income.” I copied this column into a new spreadsheet. For variables like “wage income,” I also had to copy another variable from the chart (WAGIMP18) that contained a code 1-6 that referred to the method that was used to collect each figure for the wage income. 

Next, I followed the same process for collecting all of the data on categories of healthcare expenditure. I used Open Refine to combine the various categories of expenditure into the three categories I used for the visualization: government expenditure (combined spending from the veteran’s administration, Medicare, Medicaid, etc.), private expenditure (out-of-pocket spending by self and family) and healthcare expenditure. 

I followed the same process for recording the types of healthcare coverage that the individuals had as well. Except this had the extra step of needing to translate the codes 1-8 into the different categories of the coverage type. These included separate codes for insurance types of people over and under 65, since that’s the age that Medicare coverage kicks in. 

In Open Refine, I sorted the spreadsheet by income level and then eliminated the lines for people reporting a negative income, but it was unclear whether those people were reporting business losses or loans as a reflection of net income and it was impossible to tell from the data alone whether they would be an accurate portrayal of the low-income bracket of the data. For similar reasons, I separated the people who reported exactly 0 income from the data as well. This was about a third of the people in the survey. 

In order to quintile the remaining individuals by income, I took the total number remaining and divided it by five. This gave me the number of individuals for each quintile. I then sectioned out the income brackets by using this number. 

Data Visualization

Despite the complexity of the cleaning process for my data, the creation of the visuals wasn’t overly complicated. To bracket the types of healthcare coverage by income levels, I took a separate spreadsheet for each quintile and put it into Open Refine. Using the text facet function, I was able to get a quantity for each type of coverage. I plugged those quantities for each quintile into another spreadsheet and then transferred the data over to Datawrapper. 

Creating the pie charts in data rapper was pretty straightforward once I had the actual quantities of each type of coverage. I choose to highlight the categories for private coverage and for uncovered individuals, because of the way that private coverage grows massively as a percentage over the growth of income, and because despite appearing small, the percentage of people without any coverage is of significant importance to the topic of the story and I didn’t want it to go unnoticed in the other categories. 

For my bar charts on average individual expenditure by category, I first put the data into Tableau where I used the filter fiction to set the income range of each quintile. I then created a calculated field to divide the expenditures by the number of individuals in each quintile in order to normalize the data and provide the average expenditure for one person. It felt like it would be easier for a reader to comprehend the expenditure as individual averages because the numbers would be more contextualized by the readers’ knowledge of their own expenditures. 

I chose not to break my final bar chart into quintiles because I wanted to make the point that no matter what income level you are at, the average money spent on insurance premiums makes up a vast majority of all healthcare expenditure. 

Leave a Reply

Your email address will not be published.