Writing about visualization, demographics, dashboards, and spatial data science.

Interested in learning more? Hire me for a workshop or to consult on your next project. See the Services page for more details.

Compare US metropolitan area characteristics in R with tidycensus and tigris

· by Kyle Walker · Read in about 4 min · (748 Words)
r census tidycensus tigris

As I’ve discussed in a previous post, practitioners commonly analyze demographic or economic topics at the scale of the metropolitan area. Since I wrote that post, I’ve released the tidycensus package, giving R users access to linked Census geometry and attributes in a single function call. This makes metropolitan area analysis even faster, with help from the tigris and sf packages.

First, we load up some packages and set some options.

library(tidycensus)
library(tidyverse)
library(tigris)
library(sf)
options(tigris_class = "sf")
options(tigris_use_cache = TRUE)
# census_api_key("YOUR KEY HERE")

Let’s say you are an analyst who wants to compare the distribution of median gross rent by Census tract for three West Coast metropolitan areas: Seattle, Portland, and San Francisco-Oakland. We’ll get data from the 2011-2015 American Community Survey’s Data Profile, using variable DP04_0134. As this requires data for multiple states, we’ll use a new feature in tidycensus that allows us to supply a vector of states to the state parameter in get_acs() for Census tracts, improving on the approach I outlined in my previous post.

rent <- get_acs(geography = "tract", variables = "DP04_0134", 
                state = c("WA", "OR", "CA"), geometry = TRUE)

head(rent)
## Simple feature collection with 6 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -124.7625 ymin: 45.99541 xmax: -116.916 ymax: 48.35451
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
## # A tibble: 6 x 6
##         GEOID                                         NAME  variable
##         <chr>                                        <chr>     <chr>
## 1 53003960100 Census Tract 9601, Asotin County, Washington DP04_0134
## 2 53005011100  Census Tract 111, Benton County, Washington DP04_0134
## 3 53007960500 Census Tract 9605, Chelan County, Washington DP04_0134
## 4 53007961000 Census Tract 9610, Chelan County, Washington DP04_0134
## 5 53009000200   Census Tract 2, Clallam County, Washington DP04_0134
## 6 53009000600   Census Tract 6, Clallam County, Washington DP04_0134
## # ... with 3 more variables: estimate <dbl>, moe <dbl>,
## #   geometry <simple_feature>

We now have median gross rent information for all Census tracts in Washington, Oregon, and California. Subsetting this to our desired metropolitan areas only takes a couple steps. First, we use the tigris package to obtain metropolitan area boundaries with the core_based_statistical_areas() function, taking care to set cb = TRUE as this is the default geometry used by tidycensus, and subset by ID for our desired metros. Next, we use an inner spatial join with sf’s st_join() function to subset for the specific tracts we need.

metros <- core_based_statistical_areas(cb = TRUE) %>%
  filter(GEOID %in% c("38900", "41860", "42660")) %>%
  select(metro_name = NAME)

wc_rent <- st_join(rent, metros, join = st_within, 
                   left = FALSE) 

head(wc_rent)
## Simple feature collection with 6 features and 6 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.693 ymin: 45.62099 xmax: -122.2963 ymax: 47.73405
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
##          GEOID                                          NAME  variable
## 8  53011040706 Census Tract 407.06, Clark County, Washington DP04_0134
## 9  53011040910 Census Tract 409.10, Clark County, Washington DP04_0134
## 10 53011041206 Census Tract 412.06, Clark County, Washington DP04_0134
## 11 53011041700    Census Tract 417, Clark County, Washington DP04_0134
## 20 53033000200       Census Tract 2, King County, Washington DP04_0134
## 21 53033002800      Census Tract 28, King County, Washington DP04_0134
##    estimate moe                          metro_name
## 8       877  48 Portland-Vancouver-Hillsboro, OR-WA
## 9      1490 206 Portland-Vancouver-Hillsboro, OR-WA
## 10     1100 123 Portland-Vancouver-Hillsboro, OR-WA
## 11      832  32 Portland-Vancouver-Hillsboro, OR-WA
## 20     1128  63         Seattle-Tacoma-Bellevue, WA
## 21     1296 261         Seattle-Tacoma-Bellevue, WA
##                          geometry
## 8  MULTIPOLYGON(((-122.552545 ...
## 9  MULTIPOLYGON(((-122.693002 ...
## 10 MULTIPOLYGON(((-122.58041 4...
## 11 MULTIPOLYGON(((-122.651537 ...
## 20 MULTIPOLYGON(((-122.323566 ...
## 21 MULTIPOLYGON(((-122.355305 ...

Tracts are identified by metropolitan area, with a new column, metro_name, that includes the metropolitan area name.

Exploratory analysis by metropolitan area is now straightforward. We can look at faceted histograms of median gross rent by Census tract by metro area with ggplot2:

ggplot(wc_rent, aes(x = estimate)) + 
  geom_histogram() + 
  facet_wrap(~metro_name)

As we have feature geometry as well, we can make faceted maps with geom_sf(), found in the development version of ggplot2. Be sure to set scales = "free" and theme(aspect.ratio = 1) if you want this to work correctly.

library(viridis)

ggplot(wc_rent, aes(fill = estimate, color = estimate)) + 
  geom_sf() + 
  coord_sf(crs = 26910) + 
  facet_wrap(~metro_name, scales = "free", nrow = 1) + 
  theme_minimal() + 
  theme(aspect.ratio = 1) + 
  scale_fill_viridis() + 
  scale_color_viridis()