Writing about visualization, demographics, dashboards, and spatial data science.

Interested in learning more? Hire me for a workshop or to consult on your next project. See the Services page for more details.
last update:

Version 0.3 of the tidycensus R package is now available on CRAN. The big change in this new release is the ability to fetch entire tables of Census or ACS data without having to construct a list of variable names. The table prefix should be passed to the new table parameter in the get_decennial() or get_acs() functions to work. I’d like to illustrate this below by showing you how to create faceted population pyramids with the geofacet R package, a package that allows you to create faceted ggplot2 plots in a way that represents the geographic position of the plot data.

Interested in more tips on working with Census data? Click here to join my email list! Want to implement this in your organization? Contact me at kwalkerdata@gmail.com to discuss a training or consulting partnership. As I’ve discussed in a previous post, practitioners commonly analyze demographic or economic topics at the scale of the metropolitan area. Since I wrote that post, I’ve released the tidycensus package, giving R users access to linked Census geometry and attributes in a single function call.

Interested in more tips on working with Census data? Click here to join my email list! Last week, I published the development version of my new R package, tidycensus. You can read through the documentation and some examples at https://walkerke.github.io/tidycensus/. I’m working on getting the package CRAN-ready with better error handling; in the meantime, I’m sharing a few examples to demonstrate its functionality. If you are working on a national project that includes demographic data as a component, you might be interested in acquiring Census tract data for the entire United States.

Need help working with Census data in your project? Contact me at kwalkerdata@gmail.com to discuss consulting support or a training workshop! Commonly, studies that use US Census data focus on topics at the scale of the metropolitan area. However, subsetting Census geographic data by metropolitan area is not always straightforward. Such a workflow for Census tracts might look something like: Manually downloading Census tract shapefiles (often available by state); Looking up the counties in a given metropolitan area, along with their FIPS codes; Subsetting the data by those FIPS codes.

I am excited to announce that tigris 0.5 is now on CRAN. This is a major release that has been in the works for several months. Get it with install.packages("tigris"). One major new feature is support for the simple features data model via the sf R package. sf allows for the representation of spatial objects in R like data frames, but with a list-column containing feature geometry. This has multiple advantages, as objects of class sf accept tidyverse functions for data wrangling and are much faster to work with.

Recently, I had a need to automate some GIS operations using ArcGIS Pro and the ArcPy Python site package. As of version 1.3, ArcGIS Pro ships with Anaconda as its Python installation, which makes it easier to work with ArcGIS as part of a broader data science workflow. I wanted to do my work in my Python IDE of choice, Yhat’s Rodeo; however, this didn’t work out of the box. I’m sharing the process I used to connect Rodeo to ArcGIS Pro’s Anaconda Python 3.

Every fall, I teach a course on exploratory data analysis and data visualization using Python via the Anaconda distribution and the Jupyter Notebook. This past semester, I ran the course using SageMathCloud, an online platform created by William Stein that delivers a cloud-based computational data analysis environment with access to Python, R, Julia, and several other languages. My experience with SageMathCloud was incredibly positive - I’ll go so far as to say that it is the best teaching tool I have ever used in my career.

Last week, I had the opportunity to lead a Geographic Information Systems workshop at the 2017 Society for Historical Archaeology Conference. During the day-long workshop, I introduced participants to a wide variety of key GIS concepts, and taught them how to apply those concepts to a series of historical topics. During the workshop, we used ArcGIS (ArcMap and ArcScene) as well as CARTO. By the end of the day, students had learned how to interactively map Civil War battle sites from the National Park Service in CARTO that can be filtered with a time slider widget, as in the embedded map below:

In November, the new simple features package for R sf hit CRAN. The package is like rgdal, sp, and rgeos rolled into one, is much faster, and allows for data processing with dplyr verbs! Also, as sf objects are represented in a much simpler way than sp objects, it allows for spatial analysis in R within magrittr pipelines. This post showcases some of this functionality in a simulated spatial analysis workflow, in which an analyst wants to determine whether customers have visited a point of interest (POI) based on GPS tracking data.

I strongly believe that interactive reports, presentations, and scholarly articles are going to become much more prominent in the years ahead. Whereas a PDF article or presentation can often only show a limited aspect of a research project, interactive documents can allow a reader or presenter to explore project content in a much broader sense. For dynamic research documents, an excellent option is to combine Shiny with R Markdown to generate a report that can execute R code from a Shiny server.