Mapping Air Pollution with leaflet
Motivation
These days, it seems like there is an ever-growing list of massive, complicated issues to worry about - between climate change, nuclear war, mega-droughts, wildfires. Reading the news feels like an exercise in adding to that list.
But sometimes, the news gives us a bright spot. My home state of California - a place, which was once home to some of the worst air pollution in the world, recently passed a resolution to eliminate gasoline-powered cars by 2035. I was modestly excited by this, but I was more interested in the current air pollution situation. Since I don’t think about air pollution all that much, I ended up in a rabbit-hole of unstructured research. How bad is air pollution now? How does my state, city, and county compare to others in the US? Is air pollution getting better or worse?
I will attempt to answer some of those questions below. Of course, others have already done that, and probably in a more compelling manner. But I learned a lot going through the process myself, and if you want to learn a bit about geospatial data andleaflet, I think an exercise like this is highly useful. For a thorough tutorial on the capabilities of leaflet in R, please refer to the official tutorial here.
Background
Globally, over 6 million people die each year as a result of air pollution. It’s not something we think much about in the United States or other parts of the developed world (aside from, maybe, complaining about the smog in Los Angeles), but it’s a leading cause of death in parts of Africa and Asia. And unfortunately, it seems to be getting worse in other parts of the world.
Although progress has been made through better policy and environmental regulation, this magnitude of premature deaths should surely attract more attention than it does now. Some have posed plausible reasons for the public’s lack of interest (mainly that it’s an “over there” problem). Others have scolded countries for not doing more to protect public health. Either way, the health effects of air pollution require more research and attention.
Data Preparation
library(tidyverse)
library(readxl)
library(leaflet)
library(rgdal) # Geospatial data abstraction library
library(stats)
library(sp)
library(RColorBrewer)
In the United States, the EPA maintains tens of thousands of outdoor air quality monitors and provides data to the public through several sources, including the AirData website. First, we’ll practice mapping skills by pinning all these monitors on a map. The geo-locations for all active monitors were downloaded here. We read in the data, call leaflet, add tiles, and center on a point in the middle of the country.
sites <- read.csv("./data/aqs_sites 3.csv", header = TRUE)
# Choose coordinates in middle of country
naked_map <-
leaflet() %>%
setView(lng = -95, lat = 40, zoom = 4) %>%
addTiles()
naked_map
It never fails to amaze me how little code it takes to make a map. Next, we’ll pass the sites object (EPA refers to the location monitors as sites) to leaflet and visualize them as markers on the map.
sites_map <-
leaflet(sites) %>%
setView(lng = -95, lat = 40, zoom = 4) %>%
addTiles() %>%
addCircleMarkers(lat = ~Latitude,
lng = ~Longitude,
radius = 1)
sites_map
Most states look like blue blobs, but we can zoom in and find the ones in our neighborhood (unless, of course, we live in the middle of Nevada).
Analysis
Now let’s get a sense of what those monitors are measuring.
As someone not well-versed in environmental pollutants, figuring out what dataset and which measures to use was a bit complicated. Two common measures are PM 2.5 (particulate matter 2.5 micrometers or smaller in diameter) and ground level ozone.
But a more general metric, one which combines these and other toxic pollutants, is the Air Quality Index (AQI). Even if you’re not terribly concerned about air pollution, you’ve probably seen AQI measures before, possibly through your weather app telling you the air quality is bad so close your windows and don’t even think about leaving your house. Or perhaps you’ve seen a local alert or warning to that effect. 1
Given its ubiquity, and the fact that it’s a more general metric than PM 2.5, ground level ozone, or any other specific particulate, I decided to look at AQI. More specifically, we’ll be looking at annual median AQI by county. State-level measures are too large for air pollution data, and city level is likely too small. I’ll be comparing 2021 data to 2013 (the years were chosen arbitrarily). We’ll load the data and look at a snapshot.
# annual 'concentration by monitor' dataset for 2013
full_2013 <-
read.csv("./data/annual_aqi_by_county_2013.csv", header = TRUE) %>%
janitor::clean_names()
head(full_2013)
## state county year days_with_aqi good_days moderate_days
## 1 Alabama Baldwin 2013 273 235 38
## 2 Alabama Clay 2013 118 100 18
## 3 Alabama Colbert 2013 285 252 33
## 4 Alabama DeKalb 2013 360 319 41
## 5 Alabama Elmore 2013 244 229 15
## 6 Alabama Etowah 2013 281 244 37
## unhealthy_for_sensitive_groups_days unhealthy_days very_unhealthy_days
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## hazardous_days max_aqi x90th_percentile_aqi median_aqi days_co days_no2
## 1 0 87 54 36 0 0
## 2 0 65 52 32 0 0
## 3 0 80 51 38 0 0
## 4 0 93 52 38 0 0
## 5 0 87 48 36 0 0
## 6 0 90 52 37 0 0
## days_ozone days_pm2_5 days_pm10
## 1 200 73 0
## 2 0 118 0
## 3 202 83 0
## 4 305 55 0
## 5 244 0 0
## 6 197 84 0
# annual 'concentration by monitor' dataset for 2021
full_2021 <-
read.csv("./data/annual_aqi_by_county_2021.csv", header = TRUE) %>%
janitor::clean_names()
head(full_2021)
## state county year days_with_aqi good_days moderate_days
## 1 Alabama Baldwin 2021 280 264 16
## 2 Alabama Clay 2021 110 101 9
## 3 Alabama DeKalb 2021 361 331 30
## 4 Alabama Elmore 2021 241 238 3
## 5 Alabama Etowah 2021 286 253 33
## 6 Alabama Jefferson 2021 365 174 186
## unhealthy_for_sensitive_groups_days unhealthy_days very_unhealthy_days
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 3 2 0
## hazardous_days max_aqi x90th_percentile_aqi median_aqi days_co days_no2
## 1 0 61 48 35 0 0
## 2 0 67 47 27 0 0
## 3 0 84 49 36 0 0
## 4 0 87 44 32 0 0
## 5 0 77 51 36 0 0
## 6 0 179 77 51 0 1
## days_ozone days_pm2_5 days_pm10
## 1 221 59 0
## 2 0 110 0
## 3 320 41 0
## 4 241 0 0
## 5 213 73 0
## 6 83 280 1
Now we need to figure out how to present the AQI data visually - one way might be to color the marker representing each monitor in accordance with the level of pollution it measured for some time interval. But given the high concentration of dots, it might be hard to read. Instead, let’s try to divide the United States by county lines. The intent of this approach is to mimic what the Washington Post used in this 2019 article, where it reported an excess of 10,000 deaths due to air pollution over a 2-year period.
To do this, we’ll need county shapefiles- geo-spatial boundary files that will trace the shape of each county - which will comprise a layer over our bare leaflet map. The Census Bureau maintains shapefiles in accordance with current county boundaries, and the ones used below were found here.
Reading these shapefiles into R requires the rgdal package and readOGR function.
# Source for shapefiles: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
counties <-
readOGR("cb_2018_us_county_5m", layer = "cb_2018_us_county_5m", GDAL1_integer64_policy = TRUE)
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/AliceTivarovsky/DataScience/my_website/content/blog/2022-08-25-leaflet_pollution/cb_2018_us_county_5m", layer: "cb_2018_us_county_5m"
## with 3233 features
## It has 9 fields
## Integer64 fields read as doubles: ALAND AWATER
counties_map <-
leaflet(counties) %>%
setView(lng = -95, lat = 40, zoom = 4) %>%
addTiles() %>%
addPolygons(weight = 0.1, smoothFactor = 0.5, opacity = 1.0, fillOpacity = 0.2)
counties_map
The shapefiles are working as expected - we can see well-defined state and county borders, overlaid nicely on top of the map. But what we want is a map, like the one above, but color-coded according to the value of the selected parameter - in our case, the median annual AQI. In map-speak, this is called a chloropleth, and in order to make one, we need to give leaflet the following things:
- A color palette
- A numeric vector defining “bins” (i.e cutoff points at which one color in the palette will change to the next)
Since we don’t yet know the range of values for median AQI, let’s run some descriptive statistics on it.
# 2013 data
range(full_2013$median_aqi)
## [1] 4 89
quantile(full_2013$median_aqi)
## 0% 25% 50% 75% 100%
## 4 34 38 43 89
# 2021 data
range(full_2021$median_aqi)
## [1] 3 122
quantile(full_2021$median_aqi)
## 0% 25% 50% 75% 100%
## 3 33 37 41 122
Without any calculations or data visualization, we can already see that the 2021 maximum is much higher than 2013. But let’s press on, now that we have a general sense of the spread of AQI. Note that the color palette was chosen arbitrarily from RColorBrewer.
# 2013
bins_13 <- c(0, 20, 40, 60, 80, 100, 120, 140)
pal_13 <- colorBin("Reds", domain = full_2013$median_aqi, bins = bins_13)
# 2021
bins_21 <- c(0, 20, 40, 60, 80, 100, 120, 140)
pal_21 <- colorBin("Reds", domain = full_2021$median_aqi, bins = bins_21)
And now, we get to the somewhat tricky part of the process. We need to combine the shapefile object counties with the 2013 and 2021 air pollution datasets stored in full_2013 and full_2021. This seems impossible since we’re combining a geospatial object with a standard dataset, but it turns out to be pretty simple with the merge function in the sp package:
# 2013 merge
full_2013_1 <-
full_2013 %>%
select(county, median_aqi) %>%
rename("NAME" = county)
counties_13 <- merge(counties, full_2013_1, by = "NAME", duplicateGeoms = TRUE)
# 2021 merge
full_2021_1 <-
full_2021 %>%
select(county, median_aqi) %>%
rename("NAME" = county)
counties_21 <- merge(counties, full_2021_1, by = "NAME", duplicateGeoms = TRUE)
Finally, we’re ready to map the data. Notice that we’re using a smaller value in the weight argument of addPolygons than we did above. This is to make the boundaries seamless and minimize visual clutter now that the counties are presented with different colors. We are also using a pop-up label, so that the user can see the name of the county corresponding to the polygon.
map_2013 <-
leaflet(counties_13) %>%
setView(lng = -95, lat = 40, zoom = 4) %>%
addTiles() %>%
addPolygons(weight = 0.05, fillOpacity = 0.8, fillColor = ~pal_13(counties_13$median_aqi), popup = ~paste(counties_13$NAME, "<br>", "Median Annual AQI:", counties_13$median_aqi))
map_2013
map_2021 <-
leaflet(counties_21) %>%
setView(lng = -95, lat = 40, zoom = 4) %>%
addTiles() %>%
addPolygons(weight = 0.01, fillOpacity = 0.8, fillColor = ~pal_21(counties_21$median_aqi), popup = ~paste(counties_21$NAME, "<br>", "Median Annual AQI:", counties_21$median_aqi))
map_2021
There’s definitely quite a bit more red on the 2021 map. How much more is hard to quantify just eyeballing, so we’d want a good statistical test if we were to go any further. What is useful, though, is in-map comparison across counties.
Conclusion
I hope this walk-through inspired you to play with leaflet on your own. I said it before, but I never cease to be amazed by the capabilities of interactive packages in R, and leaflet is as nice as they get. Please don’t hesitate to contact me if you have any questions or suggestions. Happy mapping!
References
- https://sesync-ci.github.io/maps-in-R-lesson/2016/09/14/#/slides/shinyleaflet
- https://aqs.epa.gov/aqsweb/airdata/download_files.html#Annual (Air Quality datasets); used Annual Concentration by Monitor
Here in Los Angeles, we see them frequently during fire season.↩︎