Plotting Comparative Covid-19 Incidence with ggplot2

Alice Tivarovsky 2022-01-30 12 minute read

Motivation

Covid-19 - the bug that won’t go away. It’s hard to believe we’ve been here for over 2 years. Perhaps like you, I went from checking the Epi curves daily back in 2019, to loneliness and pandemic fatigue, to baking sourdough and planting stuff, to kind of, sort of, almost going back to normal. And then… Omicron. So here we are, back in the familiar bubble.

Speaking of Epi curves, it’s very easy to make one of your own - certainly easier than sourdough starter. Here’s one using ggplot2 and the Covid-19 global dataset maintained by Our World in Data, OWID. OWID compiles this dataset from several sources including the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, national government reports, and the Human Mortality Database (2021).

library(tidyverse)

Data Preparation

First, let’s import and examine the .csv from the “Our World in Data” Github repo.

raw_data = read.csv("./data//owid-covid-data.csv")
str(raw_data)

## 'data.frame':    157936 obs. of  67 variables:
##  $ iso_code                                  : chr  "AFG" "AFG" "AFG" "AFG" ...
##  $ continent                                 : chr  "Asia" "Asia" "Asia" "Asia" ...
##  $ location                                  : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ date                                      : chr  "2020-02-24" "2020-02-25" "2020-02-26" "2020-02-27" ...
##  $ total_cases                               : num  5 5 5 5 5 5 5 5 5 5 ...
##  $ new_cases                                 : num  5 0 0 0 0 0 0 0 0 0 ...
##  $ new_cases_smoothed                        : num  NA NA NA NA NA 0.714 0.714 0 0 0 ...
##  $ total_deaths                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_deaths                                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_deaths_smoothed                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_cases_per_million                   : num  0.126 0.126 0.126 0.126 0.126 0.126 0.126 0.126 0.126 0.126 ...
##  $ new_cases_per_million                     : num  0.126 0 0 0 0 0 0 0 0 0 ...
##  $ new_cases_smoothed_per_million            : num  NA NA NA NA NA 0.018 0.018 0 0 0 ...
##  $ total_deaths_per_million                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_deaths_per_million                    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_deaths_smoothed_per_million           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ reproduction_rate                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ icu_patients                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ icu_patients_per_million                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ hosp_patients                             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ hosp_patients_per_million                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ weekly_icu_admissions                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ weekly_icu_admissions_per_million         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ weekly_hosp_admissions                    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ weekly_hosp_admissions_per_million        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_tests                                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_tests                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_tests_per_thousand                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_tests_per_thousand                    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_tests_smoothed                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_tests_smoothed_per_thousand           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ positive_rate                             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ tests_per_case                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ tests_units                               : chr  "" "" "" "" ...
##  $ total_vaccinations                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ people_vaccinated                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ people_fully_vaccinated                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_boosters                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_vaccinations                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_vaccinations_smoothed                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_vaccinations_per_hundred            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ people_vaccinated_per_hundred             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ people_fully_vaccinated_per_hundred       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_boosters_per_hundred                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_vaccinations_smoothed_per_million     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_people_vaccinated_smoothed            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_people_vaccinated_smoothed_per_hundred: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stringency_index                          : num  8.33 8.33 8.33 8.33 8.33 ...
##  $ population                                : num  39835428 39835428 39835428 39835428 39835428 ...
##  $ population_density                        : num  54.4 54.4 54.4 54.4 54.4 ...
##  $ median_age                                : num  18.6 18.6 18.6 18.6 18.6 18.6 18.6 18.6 18.6 18.6 ...
##  $ aged_65_older                             : num  2.58 2.58 2.58 2.58 2.58 ...
##  $ aged_70_older                             : num  1.34 1.34 1.34 1.34 1.34 ...
##  $ gdp_per_capita                            : num  1804 1804 1804 1804 1804 ...
##  $ extreme_poverty                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cardiovasc_death_rate                     : num  597 597 597 597 597 ...
##  $ diabetes_prevalence                       : num  9.59 9.59 9.59 9.59 9.59 9.59 9.59 9.59 9.59 9.59 ...
##  $ female_smokers                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ male_smokers                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ handwashing_facilities                    : num  37.7 37.7 37.7 37.7 37.7 ...
##  $ hospital_beds_per_thousand                : num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
##  $ life_expectancy                           : num  64.8 64.8 64.8 64.8 64.8 ...
##  $ human_development_index                   : num  0.511 0.511 0.511 0.511 0.511 0.511 0.511 0.511 0.511 0.511 ...
##  $ excess_mortality_cumulative_absolute      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ excess_mortality_cumulative               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ excess_mortality                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ excess_mortality_cumulative_per_million   : num  NA NA NA NA NA NA NA NA NA NA ...

head(raw_data)

##   iso_code continent    location       date total_cases new_cases
## 1      AFG      Asia Afghanistan 2020-02-24           5         5
## 2      AFG      Asia Afghanistan 2020-02-25           5         0
## 3      AFG      Asia Afghanistan 2020-02-26           5         0
## 4      AFG      Asia Afghanistan 2020-02-27           5         0
## 5      AFG      Asia Afghanistan 2020-02-28           5         0
## 6      AFG      Asia Afghanistan 2020-02-29           5         0
##   new_cases_smoothed total_deaths new_deaths new_deaths_smoothed
## 1                 NA           NA         NA                  NA
## 2                 NA           NA         NA                  NA
## 3                 NA           NA         NA                  NA
## 4                 NA           NA         NA                  NA
## 5                 NA           NA         NA                  NA
## 6              0.714           NA         NA                  NA
##   total_cases_per_million new_cases_per_million new_cases_smoothed_per_million
## 1                   0.126                 0.126                             NA
## 2                   0.126                 0.000                             NA
## 3                   0.126                 0.000                             NA
## 4                   0.126                 0.000                             NA
## 5                   0.126                 0.000                             NA
## 6                   0.126                 0.000                          0.018
##   total_deaths_per_million new_deaths_per_million
## 1                       NA                     NA
## 2                       NA                     NA
## 3                       NA                     NA
## 4                       NA                     NA
## 5                       NA                     NA
## 6                       NA                     NA
##   new_deaths_smoothed_per_million reproduction_rate icu_patients
## 1                              NA                NA           NA
## 2                              NA                NA           NA
## 3                              NA                NA           NA
## 4                              NA                NA           NA
## 5                              NA                NA           NA
## 6                              NA                NA           NA
##   icu_patients_per_million hosp_patients hosp_patients_per_million
## 1                       NA            NA                        NA
## 2                       NA            NA                        NA
## 3                       NA            NA                        NA
## 4                       NA            NA                        NA
## 5                       NA            NA                        NA
## 6                       NA            NA                        NA
##   weekly_icu_admissions weekly_icu_admissions_per_million
## 1                    NA                                NA
## 2                    NA                                NA
## 3                    NA                                NA
## 4                    NA                                NA
## 5                    NA                                NA
## 6                    NA                                NA
##   weekly_hosp_admissions weekly_hosp_admissions_per_million new_tests
## 1                     NA                                 NA        NA
## 2                     NA                                 NA        NA
## 3                     NA                                 NA        NA
## 4                     NA                                 NA        NA
## 5                     NA                                 NA        NA
## 6                     NA                                 NA        NA
##   total_tests total_tests_per_thousand new_tests_per_thousand
## 1          NA                       NA                     NA
## 2          NA                       NA                     NA
## 3          NA                       NA                     NA
## 4          NA                       NA                     NA
## 5          NA                       NA                     NA
## 6          NA                       NA                     NA
##   new_tests_smoothed new_tests_smoothed_per_thousand positive_rate
## 1                 NA                              NA            NA
## 2                 NA                              NA            NA
## 3                 NA                              NA            NA
## 4                 NA                              NA            NA
## 5                 NA                              NA            NA
## 6                 NA                              NA            NA
##   tests_per_case tests_units total_vaccinations people_vaccinated
## 1             NA                             NA                NA
## 2             NA                             NA                NA
## 3             NA                             NA                NA
## 4             NA                             NA                NA
## 5             NA                             NA                NA
## 6             NA                             NA                NA
##   people_fully_vaccinated total_boosters new_vaccinations
## 1                      NA             NA               NA
## 2                      NA             NA               NA
## 3                      NA             NA               NA
## 4                      NA             NA               NA
## 5                      NA             NA               NA
## 6                      NA             NA               NA
##   new_vaccinations_smoothed total_vaccinations_per_hundred
## 1                        NA                             NA
## 2                        NA                             NA
## 3                        NA                             NA
## 4                        NA                             NA
## 5                        NA                             NA
## 6                        NA                             NA
##   people_vaccinated_per_hundred people_fully_vaccinated_per_hundred
## 1                            NA                                  NA
## 2                            NA                                  NA
## 3                            NA                                  NA
## 4                            NA                                  NA
## 5                            NA                                  NA
## 6                            NA                                  NA
##   total_boosters_per_hundred new_vaccinations_smoothed_per_million
## 1                         NA                                    NA
## 2                         NA                                    NA
## 3                         NA                                    NA
## 4                         NA                                    NA
## 5                         NA                                    NA
## 6                         NA                                    NA
##   new_people_vaccinated_smoothed new_people_vaccinated_smoothed_per_hundred
## 1                             NA                                         NA
## 2                             NA                                         NA
## 3                             NA                                         NA
## 4                             NA                                         NA
## 5                             NA                                         NA
## 6                             NA                                         NA
##   stringency_index population population_density median_age aged_65_older
## 1             8.33   39835428             54.422       18.6         2.581
## 2             8.33   39835428             54.422       18.6         2.581
## 3             8.33   39835428             54.422       18.6         2.581
## 4             8.33   39835428             54.422       18.6         2.581
## 5             8.33   39835428             54.422       18.6         2.581
## 6             8.33   39835428             54.422       18.6         2.581
##   aged_70_older gdp_per_capita extreme_poverty cardiovasc_death_rate
## 1         1.337       1803.987              NA               597.029
## 2         1.337       1803.987              NA               597.029
## 3         1.337       1803.987              NA               597.029
## 4         1.337       1803.987              NA               597.029
## 5         1.337       1803.987              NA               597.029
## 6         1.337       1803.987              NA               597.029
##   diabetes_prevalence female_smokers male_smokers handwashing_facilities
## 1                9.59             NA           NA                 37.746
## 2                9.59             NA           NA                 37.746
## 3                9.59             NA           NA                 37.746
## 4                9.59             NA           NA                 37.746
## 5                9.59             NA           NA                 37.746
## 6                9.59             NA           NA                 37.746
##   hospital_beds_per_thousand life_expectancy human_development_index
## 1                        0.5           64.83                   0.511
## 2                        0.5           64.83                   0.511
## 3                        0.5           64.83                   0.511
## 4                        0.5           64.83                   0.511
## 5                        0.5           64.83                   0.511
## 6                        0.5           64.83                   0.511
##   excess_mortality_cumulative_absolute excess_mortality_cumulative
## 1                                   NA                          NA
## 2                                   NA                          NA
## 3                                   NA                          NA
## 4                                   NA                          NA
## 5                                   NA                          NA
## 6                                   NA                          NA
##   excess_mortality excess_mortality_cumulative_per_million
## 1               NA                                      NA
## 2               NA                                      NA
## 3               NA                                      NA
## 4               NA                                      NA
## 5               NA                                      NA
## 6               NA                                      NA

The dataset is clean and in wide format. Countries are contained in the location variable (with corresponding values for iso_code given), and we have daily values for quite a few variables including new_cases (we’ll use this to plot disease incidence), new_cases_per_million, new_tests, total_vaccinations and booster data is now available as total_boosters.

Analysis

Let’s first plot disease incidence using the new_cases variable.

us_data = 
  raw_data %>% 
  filter(location == "United States") %>% 
  mutate(date = as.Date(date)) 

ggplot(us_data, aes(x = date, y = new_cases)) +
  geom_point()

It looks a little crazy, almost like there are several superimposed curves. What’s really happening is that we’re seeing a lot of noise from day-to-day fluctuation. Data aren’t reported uniformly, so it’s not uncommon to observe a “spike” in reports on, say, Mondays when reporting for the weekend is done.

For this reason, we used smoothed data, usually amounting to a 7-day average. This technique allows us to eliminate the noise attributable to reporting fluctuations through the week. OWID gives us a smoothed version of new_cases which achieves just that.

us_plot = 
  ggplot(us_data, aes(x = date, y = new_cases_smoothed)) +
  geom_point() + 
  labs(title = "Covid-19 Incidence in the United States")

us_plot

Much better. Let’s now compare the US to a few other (arbitrarily chosen) countries. Here’s Peru.

peru_plot = 
  raw_data %>% 
  filter(location == "Peru") %>% 
  mutate(date = as.Date(date)) %>% 
  ggplot(aes(x = date, y = new_cases_smoothed)) +
  geom_point() + 
  labs(title = "Covid-19 Incidence in Peru")

peru_plot

And here’s Japan.

japan_plot = 
  raw_data %>% 
  filter(location == "Japan") %>% 
  mutate(date = as.Date(date)) %>% 
  ggplot(aes(x = date, y = new_cases_smoothed)) +
  geom_point() + 
  labs(title = "Covid-19 Incidence in Japan")

japan_plot

Here’s Australia.

aus_plot = 
  raw_data %>% 
  filter(location == "Australia") %>% 
  mutate(date = as.Date(date)) %>% 
  ggplot(aes(x = date, y = new_cases_smoothed)) +
  geom_point() + 
  labs(title = "Covid-19 Incidence in Australia")

aus_plot

Here’s Zimbabwe.

zimbabwe_plot = 
  raw_data %>% 
  filter(location == "Zimbabwe") %>% 
  mutate(date = as.Date(date)) %>% 
  ggplot(aes(x = date, y = new_cases_smoothed)) +
  geom_point() + 
  labs(title = "Covid-19 Incidence in Zimbabwe")

zimbabwe_plot

Finally, here’s Turkey.

turkey_plot = 
  raw_data %>% 
  filter(location == "Turkey") %>% 
  mutate(date = as.Date(date)) %>% 
  ggplot(aes(x = date, y = new_cases_smoothed)) +
  geom_point() + 
  labs(title = "Covid-19 Incidence in Turkey")

turkey_plot

It’s evident that Omicron has a foothold all over the world, but to really compare the penetration we’ll need to scale the case loads to the population. Again, OWID gives us a variable where cases/population is already calculated: new_cases_per_million and new_cases_smoothed_per_million. We’ll used the smoothed version and combine the incidence of all six countries onto one plot.

combined_plot = 
  raw_data %>% 
  filter(location %in% c("United States", "Peru", "Japan", "Australia", "Zimbabwe", "Turkey")) %>% 
  mutate(date = as.Date(date)) %>% 
  ggplot(aes(x = date, y = new_cases_smoothed_per_million, group = location)) +
  geom_line(aes(color = location))

combined_plot

Interestingly, Australia has had the highest Omicron peak, almost double that of the US. This is particularly unfortunate, given Australia’s ultra-high vaccination rate.

Now, having learned something, let’s polish up the curves.

final_plot = 
  combined_plot +
  labs(title = "Covid-19 Incidence in Australia, Japan, Peru, Turkey, the United States, and Zimbabwe, Cases per Million, Smoothed 7-Day Average", 
       x = "Date", 
       y = "Cases per Million")

final_plot + theme_minimal()

Conclusion

While you can certainly just google Covid-19 in any country that interests you, getting to an accurate, scaled comparison between countries probably requires a bit more effort. Thankfully, Our World in Data and ggplot2 make the task entirely painless.