Data visualization best practices

GEOG 30323

April 2, 2024

Data visualization

  • Thus far: we’ve learned how to use data visualization to explore our data
  • In the weeks to come:
    • Best practices in data visualization
    • Advanced chart types
    • Interactive visualization
    • Geographic visualization (maps!)
    • Putting it all together!

Source: Wikimedia Commons
Source: Nathan Yau/FlowingData

Anscombe’s Quartet

Source: Wikimedia Commons

Considerations when visualizing data

  • What are you visualizing?
  • Who is your audience?
  • In what format will you be presenting the visualization?

Visual variables

Source: Data Points

Color

  • Hue: color, commonly understood (red, blue, green)
  • Lightness or Value: extent to which color is light or dark
  • Saturation: vividness of the color

Color schemes

Source: Data Points

Color and context

Source: FiveThirtyEight.com

Color-blindness

SBNation.com

Good use of color

Source: Kirk Goldsberry/Grantland

Poor use of color

Source: Jonathan Cohn via Kenneth Field/Cartonerd

Color and visual variables

Image link/source

Data for examples

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import wb
sns.set(style = "whitegrid", rc = {"figure.figsize": (10, 8)})

eu_countries = ['BE', 'BG', 'CZ', 'DK', 'DE', 'EE', 'IE', 'GR', 'ES', 'FR', 'HR', 
               'IT', 'CY', 'LV', 'LT', 'LU', 'HU', 'MT', 'NL', 'AT', 'PL', 'PT', 
               'RO', 'SI', 'SK', 'FI', 'SE', 'GB']
               
ue = wb.download(indicator = "SL.UEM.TOTL.ZS", 
                 country = eu_countries, start = 1991, 
                 end = 2019)

ue.reset_index(inplace = True)

ue.columns = ['country', 'year', 'unemployment']

The ‘heat map’

Source: The Wall Street Journal

Heat maps in seaborn

  • Available in seaborn’s heatmap() function; takes a wide data frame with x-values in the index and y-values as column headers
ue_wide = ue.pivot(index = 'country', columns = 'year', 
                   values = 'unemployment')

sns.heatmap(ue_wide)

The seaborn ‘heat map’

Color palettes in seaborn

  • ColorBrewer: popular color schemes for visualization
  • Support for ColorBrewer built into seaborn; available in the color_palette() function
  • See more at http://colorbrewer2.org/

Color in seaborn

ue19 = (ue
  .query('year == "2019"')
  .sort_values('unemployment',
                ascending = False)
)

sns.barplot(x = 'unemployment', y = 'country',
            data = ue19, palette = "Greens_r")

Color in seaborn

Highlighting and annotation

Source: Data Points

The “spaghetti” chart

sns.lineplot(data = ue, x = "year", y = "unemployment", 
             hue = "country")

Highlighting

Highlighting code

# Convert the year to integer for better labels
ue = ue.assign(year = ue.year.astype(int))
# Make a Greece-only dataset
greece = ue.query('country == "Greece"')
# Plot everything as grey, first
sns.lineplot(data = ue, x = "year", y = "unemployment",
             hue = "country", palette = ["grey"] * 28,
             legend = False)
# Then, plot Greece on top
sns.lineplot(data = greece, x = "year", y = "unemployment",
             hue = "country", palette = ["blue"], linewidth = 3)

Annotation in Python

Annotation code


plt.annotate('Global recession \nspreads to Europe', xy = (2009, 9.5), 
             xycoords = 'data', xytext = (2005, 23), textcoords = 'data', 
             arrowprops = dict(arrowstyle = 'simple', color = '#000000'))

Small multiples

Source: Data Points

Small multiples in Python

ue.sort_values("country", inplace = True)

sns.relplot(data = ue, x = "year", y = "unemployment", 
            col = "country", col_wrap = 7,
            kind = "line", height = 2.5)

Small multiples in Python

Modifying chart options

  • seaborn is a wrapper around matplotlib, the main plotting engine for Python
  • In turn, all matplotlib customization methods are available for your seaborn plots - and there are many!
  • To get access: import matplotlib.pyplot as plt

Formatting axes & labels

  • Example:
plt.figure(figsize = (10, 7))

sns.heatmap(ue_wide, cmap = 'YlGnBu')

plt.ylabel("")
plt.xlabel("")
plt.title("Unemployment in Europe, 1991-2019")
plt.xticks(rotation = 45)

Modified heatmap

seaborn and matplotlib

  • seaborn returns a matplotlib object that can be modified by the options in the pyplot module
  • Often, these options are wrapped by seaborn and available as arguments - so check the documentation to see what you can do!

Image resolution

  • Higher resolution: greater detail in an image
  • Commonly: dpi (dots per inch)

Exporting your visualizations

  • To save your visualizations from the Jupyter Notebook:
plt.savefig('destfile.jpg', dpi = 300)