The main intent of the tidycensus package is to return population characteristics of the United States in tidy format allowing for integration with simple feature geometries. Its intent is not, and has never been, to wrap the universe of APIs and datasets available from the US Census Bureau. For datasets not included in tidycensus, I recommend Hannah Recht’s censusapi package (https://github.com/hrecht/censusapi), which allows R users to access all Census APIs, and packages such as Jamaal Green’s lehdr package (https://github.com/jamgreen/lehdr) which grants R users access to Census Bureau LODES data.

However, tidycensus will ultimately incorporate a select number of Census Bureau datasets outside the decennial Census and ACS that are aligned with the basic goals of the package. One such dataset is the Population Estimates API, which includes information on a wide variety of population characteristics that is updated annually.

Population estimates are available in tidycensus through the get_estimates() function. Estimates are organized into products, which in tidycensus include "population", "components", "housing", and "characteristics". The population and housing products contain population/density and housing unit estimates, respectively. The components of change and characteristics products, in contrast, include a wider range of possible variables.

Components of change population estimates

By default, specifying "population", "components", or "housing" as the product in get_estimates() returns all variables associated with that component. For example, we can request all components of change variables for US states in 2017:

## # A tibble: 624 x 4
##    NAME                 GEOID variable  value
##    <chr>                <chr> <chr>     <dbl>
##  1 Alabama              01    BIRTHS    58389
##  2 Alaska               02    BIRTHS    11163
##  3 Arizona              04    BIRTHS    85634
##  4 Arkansas             05    BIRTHS    38236
##  5 California           06    BIRTHS   487916
##  6 Colorado             08    BIRTHS    67638
##  7 Connecticut          09    BIRTHS    35183
##  8 Delaware             10    BIRTHS    11010
##  9 District of Columbia 11    BIRTHS     9715
## 10 Florida              12    BIRTHS   225447
## # ... with 614 more rows

The variables included in the components of change product consist of both estimates of counts and rates. Rates are preceded by an R in the variable name and are calculated per 1000 residents.

##  [1] "BIRTHS"            "DEATHS"            "DOMESTICMIG"      
##  [4] "INTERNATIONALMIG"  "NATURALINC"        "NETMIG"           
##  [7] "RBIRTH"            "RDEATH"            "RDOMESTICMIG"     
## [10] "RINTERNATIONALMIG" "RNATURALINC"       "RNETMIG"

Available geographies include "us", "state", "county", "metropolitan statistical area/micropolitan statistical area", and "combined statistical area".

If desired, users can request a specific component or components by supplying a character vector to the variables parameter, as in other tidycensus functions. get_estimates() also supports simple feature geometry integration to facilitate mapping. In the example below, we’ll acquire data on the net migration rate between 2016 and 2017 for all counties in the United States, and request shifted and re-scaled feature geometry for Alaska and Hawaii to facilitate national mapping.

## Simple feature collection with 3142 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -2100000 ymin: -2500000 xmax: 2516374 ymax: 732103.3
## epsg (SRID):    NA
## proj4string:    +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
## # A tibble: 3,142 x 5
##    GEOID NAME       variable  value                               geometry
##    <chr> <chr>      <chr>     <dbl>                     <MULTIPOLYGON [m]>
##  1 01001 Autauga C… RNETMIG    1.32 (((1269841 -1303980, 1248372 -1300830…
##  2 01009 Blount Co… RNETMIG    7.29 (((1240383 -1149119, 1222632 -1143475…
##  3 01017 Chambers … RNETMIG    2.43 (((1382944 -1225846, 1390214 -1235634…
##  4 01021 Chilton C… RNETMIG    4.35 (((1257515 -1230045, 1259055 -1240041…
##  5 01033 Colbert C… RNETMIG    4.43 (((1085910 -1080751, 1085892 -1080071…
##  6 01045 Dale Coun… RNETMIG   -1.91 (((1382203 -1366760, 1387076 -1400145…
##  7 01051 Elmore Co… RNETMIG    4.36 (((1278144 -1255151, 1279961 -1256403…
##  8 01065 Hale Coun… RNETMIG   -1.69 (((1176099 -1258997, 1172005 -1264523…
##  9 01079 Lawrence … RNETMIG   -4.44 (((1178216 -1055420, 1179636 -1066254…
## 10 01083 Limestone… RNETMIG   13.2  (((1197770 -1018013, 1199180 -1017791…
## # ... with 3,132 more rows

We’ll next use tidyverse tools to generate a groups column that bins the net migration rates into comprehensible categories, and plot the result using geom_sf() and ggplot2.

Estimates of population characteristics

The fourth population estimates product available in get_estimates(), "characteristics", is formatted differently than the other three. It returns population estimates broken down by categories of AGEGROUP, SEX, RACE, and HISP, for Hispanic origin. Requested breakdowns should be specified as a character vector supplied to the breakdown parameter when the product is set to "characteristics".

By default, the returned categories are formatted as integers that map onto the Census Bureau definitions explained here: https://www.census.gov/data/developers/data-sets/popest-popproj/popest/popest-vars/2017.html. However, by specifying breakdown_labels = TRUE, the function will return the appropriate labels instead. For example:

## # A tibble: 210 x 6
##    GEOID NAME                              value SEX        AGEGROUP HISP 
##    <chr> <chr>                             <dbl> <chr>      <fct>    <chr>
##  1 06037 Los Angeles County, California 10150558 Both sexes All ages Both…
##  2 06037 Los Angeles County, California   626115 Both sexes Age 0 t… Both…
##  3 06037 Los Angeles County, California   621599 Both sexes Age 5 t… Both…
##  4 06037 Los Angeles County, California   613392 Both sexes Age 10 … Both…
##  5 06037 Los Angeles County, California   656343 Both sexes Age 15 … Both…
##  6 06037 Los Angeles County, California   740057 Both sexes Age 20 … Both…
##  7 06037 Los Angeles County, California   853568 Both sexes Age 25 … Both…
##  8 06037 Los Angeles County, California   769217 Both sexes Age 30 … Both…
##  9 06037 Los Angeles County, California   708672 Both sexes Age 35 … Both…
## 10 06037 Los Angeles County, California   680393 Both sexes Age 40 … Both…
## # ... with 200 more rows

With some additional data wrangling, the returned format facilitates analysis and visualization. For example, we can compare population pyramids for Hispanic and non-Hispanic populations in Los Angeles County:

ACS migration flows

My next project prior to the 1.0 release of tidycensus is to incorporate access to the American Community Survey Migration Flows API. Flows data will come integrated with simple feature geometry allowing for visualization of migration flows. Stay tuned!