Code, Research

Launching the flu severity index application

TL;DR version —

Click here to check out my new Shiny app, which displays the U.S. seasonal influenza severity index as calculated from Centers for Disease Control and Prevention ILINet data from 1997-98 to 2013-14.


 

My recent paper proposed new methods for quantifying seasonal influenza severity by looking at the relative risk of influenza-like illness between adults and children at varying points in the flu season in the United States. Don’t worry, this isn’t a repeat of my recent blog post on the paper itself.

As a proponent of open science, I had always been planning to post the code I had used to generate the data and figures that appear in the main manuscript. Due to the proprietary uses of the medical claims data, the primary data source in the paper, however, it was clear that we could not post any of the data itself. In these circumstances, I asked myself — why post code that wouldn’t add value beyond the findings of the paper?

As an alternative, I’m excited to announce the launch of a web application that displays the seasonal influenza severity index, as calculated with U.S. CDC’s ILINet data. These data are publicly available from CDC’s website through FluView Interactive, and I showed these results in the Supporting Material. The original analyses were conducted in Python, but I’ve developed the web application with the Shiny package in R (post to follow about that experience!).

Data from the 1997-98 to 2013-14 flu seasons are pre-loaded into the application. Users can use the drop-down menu to view two figures from a specific season: 1) adult and child ILI rates from week 40 (first week of October) to week 39 in the following year, and 2) the population-level severity index, as calculated according to the methods in the paper.

The goal of this web application is to make the results and “intuition” derived from the paper more accessible to researchers, policymakers, and the public. I hope to add features to the application in the future (e.g., ability for users to upload their own data), so suggestions are welcome!

Check out the seasonal influenza severity index application here!

sevixFluApp

 

Code

tidyr and dplyr for data cleaning

Part of being a scientist in a quantitative field is growing your ‘toolbox.’ In a broad sense, this covers two types of tools:

  • mathematical and statistical methods – the ways in which you approach the problem
  • software and programming languages – the tools you use to implement your methods

I want to spend a little bit of time talking about the software and tools that I use in my regular day-to-day workflow over the next few posts.

To clean data:

I think the ‘dplyr’ and ‘tidyr’ packages in R have started to infuse some joy into the data cleaning process for me. The simple use of piping, direct calls of variable names, easy execution of functions across ‘grouping variables’, and intuitive function names make these tools a “must-try” in my opinion.

Say you have monthly flu case data from 2010 to 2012, where rows indicate months and you have one column per year.

month year_2010 year_2011 year_2012
January 34 42 65
 …

What if you want to convert your data into the long format?

month year cases
January 2010 34
January 2011 42
January 2012 65

This can be done in a single line of code: simply gather and substring the year as an integer from the year variable names.

data.long <- data %>% gather(“year”, “cases”, 2:4) %>% mutate(year = as.integer(substring(year, 6, 9)))

What if you wanted to sum cases by month across all three years? Again, this can be done in a single line of code!

case.sum <- data.long %>% group_by(month) %>% summarise(cases = sum(cases))


These are just a few example cases to start. There are a number of resources for learning these tools:

  • Rstudio data wrangling “cheat sheet”: I have a printed copy of this on my desk for ready reference. Warning: the packages have been updated slightly since this was put out, and it’s not comprehensive, but it’s a wonderful example of a visual learning aid.
  • Rstudio introduction: This vignette discusses several key functions through the use of examples.

More to come!