The main intent of the tidycensus package is to return population characteristics of the United States in tidy format allowing for integration with simple feature geometries. Its intent is not, and has never been, to wrap the universe of APIs and datasets available from the US Census Bureau. For datasets not included in tidycensus, I recommend Hannah Recht’s censusapi package (https://github.com/hrecht/censusapi), which allows R users to access all Census APIs, and packages such as Jamaal Green’s lehdr package (https://github.com/jamgreen/lehdr) which grants R users access to Census Bureau LODES data.

However, tidycensus will ultimately incorporate a select number of Census Bureau datasets outside the decennial Census and ACS that are aligned with the basic goals of the package. One such dataset is the Population Estimates API, which includes information on a wide variety of population characteristics that is updated annually.

Population estimates are available in tidycensus through the get_estimates() function. Estimates are organized into products, which in tidycensus include "population", "components", "housing", and "characteristics". The population and housing products contain population/density and housing unit estimates, respectively. The components of change and characteristics products, in contrast, include a wider range of possible variables.

Components of change population estimates

By default, specifying "population", "components", or "housing" as the product in get_estimates() returns all variables associated with that component. For example, we can request all components of change variables for US states in 2017:

library(tidycensus)
library(tidyverse)
options(tigris_use_cache = TRUE)

us_components <- get_estimates(geography = "state", product = "components")

us_components
## # A tibble: 624 x 4
##    NAME                 GEOID variable  value
##    <chr>                <chr> <chr>     <dbl>
##  1 Alabama              01    BIRTHS    59637
##  2 Alaska               02    BIRTHS    11335
##  3 Arizona              04    BIRTHS    86765
##  4 Arkansas             05    BIRTHS    38779
##  5 California           06    BIRTHS   500353
##  6 Colorado             08    BIRTHS    66345
##  7 Connecticut          09    BIRTHS    36319
##  8 Delaware             10    BIRTHS    11026
##  9 District of Columbia 11    BIRTHS     9652
## 10 Florida              12    BIRTHS   221755
## # … with 614 more rows

The variables included in the components of change product consist of both estimates of counts and rates. Rates are preceded by an R in the variable name and are calculated per 1000 residents.

unique(us_components$variable)
##  [1] "BIRTHS"            "DEATHS"            "DOMESTICMIG"      
##  [4] "INTERNATIONALMIG"  "NATURALINC"        "NETMIG"           
##  [7] "RBIRTH"            "RDEATH"            "RDOMESTICMIG"     
## [10] "RINTERNATIONALMIG" "RNATURALINC"       "RNETMIG"

Available geographies include "us", "state", "county", "metropolitan statistical area/micropolitan statistical area", and "combined statistical area".

If desired, users can request a specific component or components by supplying a character vector to the variables parameter, as in other tidycensus functions. get_estimates() also supports simple feature geometry integration to facilitate mapping. In the example below, we’ll acquire data on the net migration rate between 2016 and 2017 for all counties in the United States, and request shifted and re-scaled feature geometry for Alaska and Hawaii to facilitate national mapping.

## Simple feature collection with 3142 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -2100000 ymin: -2500000 xmax: 2516374 ymax: 732103.3
## epsg (SRID):    NA
## proj4string:    +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
## # A tibble: 3,142 x 5
##    GEOID NAME        variable  value                               geometry
##    <chr> <chr>       <chr>     <dbl>                     <MULTIPOLYGON [m]>
##  1 01001 Autauga Co… RNETMIG  -1.86  (((1269841 -1303980, 1248372 -1300830…
##  2 01009 Blount Cou… RNETMIG  -1.29  (((1240383 -1149119, 1222632 -1143475…
##  3 01017 Chambers C… RNETMIG   1.59  (((1382944 -1225846, 1390214 -1235634…
##  4 01021 Chilton Co… RNETMIG  -1.78  (((1257515 -1230045, 1259055 -1240041…
##  5 01033 Colbert Co… RNETMIG   0.919 (((1085910 -1080751, 1085892 -1080071…
##  6 01045 Dale Count… RNETMIG  -3.48  (((1382203 -1366760, 1387076 -1400145…
##  7 01051 Elmore Cou… RNETMIG   2.51  (((1278144 -1255151, 1279961 -1256403…
##  8 01065 Hale Count… RNETMIG  -4.12  (((1176099 -1258997, 1172005 -1264523…
##  9 01079 Lawrence C… RNETMIG  -8.08  (((1178216 -1055420, 1179636 -1066254…
## 10 01083 Limestone … RNETMIG   7.69  (((1197770 -1018013, 1199180 -1017791…
## # … with 3,132 more rows

We’ll next use tidyverse tools to generate a groups column that bins the net migration rates into comprehensible categories, and plot the result using geom_sf() and ggplot2.

Estimates of population characteristics

The fourth population estimates product available in get_estimates(), "characteristics", is formatted differently than the other three. It returns population estimates broken down by categories of AGEGROUP, SEX, RACE, and HISP, for Hispanic origin. Requested breakdowns should be specified as a character vector supplied to the breakdown parameter when the product is set to "characteristics".

By default, the returned categories are formatted as integers that map onto the Census Bureau definitions explained here: https://www.census.gov/data/developers/data-sets/popest-popproj/popest/popest-vars/2017.html. However, by specifying breakdown_labels = TRUE, the function will return the appropriate labels instead. For example:

## # A tibble: 210 x 6
##    GEOID NAME                    value SEX      AGEGROUP     HISP          
##    <chr> <chr>                   <dbl> <chr>    <fct>        <chr>         
##  1 06037 Los Angeles County,… 10105518 Both se… All ages     Both Hispanic…
##  2 06037 Los Angeles County,…  5190231 Both se… All ages     Non-Hispanic  
##  3 06037 Los Angeles County,…  4915287 Both se… All ages     Hispanic      
##  4 06037 Los Angeles County,…  4981895 Male     All ages     Both Hispanic…
##  5 06037 Los Angeles County,…  2529798 Male     All ages     Non-Hispanic  
##  6 06037 Los Angeles County,…  2452097 Male     All ages     Hispanic      
##  7 06037 Los Angeles County,…  5123623 Female   All ages     Both Hispanic…
##  8 06037 Los Angeles County,…  2660433 Female   All ages     Non-Hispanic  
##  9 06037 Los Angeles County,…  2463190 Female   All ages     Hispanic      
## 10 06037 Los Angeles County,…   603555 Both se… Age 0 to 4 … Both Hispanic…
## # … with 200 more rows

With some additional data wrangling, the returned format facilitates analysis and visualization. For example, we can compare population pyramids for Hispanic and non-Hispanic populations in Los Angeles County: