This is once again a fairly open-ended Data Challenge, where you’ll be demonstrating your ability to examine a dataset and then, most importantly, thoughtfully visualize some information from it.

You will be working with data from the Public Plans Database (PPD), which was compiled by Boston College’s Center for Retirement Research. Our dataset contains plan-level data from 2001 to 2018 for 180 state and local pension plans.

The requirements for this Challenge are threefold:

  1. You should produce two data visualizations that use distinct forms. (For example, one line chart and one map.)
    • Feel free to use any data visualization tool you are familiar with, including Infogram, Flourish, Datawrapper, and ggplot2.
    • For inspiration, check out the Data Visualization Catalogue.
    • For each visualization, reflect on the following:
      • Is the visualization of production quality (i.e., could appear alongside a professional news article)?
      • Is all of the information explained clearly and presented in a logical form? (For example, is there a compelling title and, if appropriate, subtitle?)
      • Are appropriate visual cues (e.g., position, shapes, area, color selection) being used to help the viewer understand the information?
      • Is there a clear reference to the data source?
      • Might the visualization unintentionally mislead the viewer (e.g., improper scaling)?
  2. For each visualization, include any R code necessary to create either the visualization or the subset of data used in the visualization.

  3. For each visualization, describe what you want the viewer to take away from the visualization. Then, explain how your visual choices are appropriate for communicating that point/those points. This exercise will help with concretely reflecting on the elements outlined above.
    • For example, why did you choose a particular chart type? Why did you select certain colors? Why did you label the axes in a particular way? Why did you give it that specific title? Explain all of your selections.

You can download an abridged copy of this notebook (to make getting started easier) by clicking here.

 

Dataset

The complete dataset may be loaded with the following code:

library(tidyverse)
ppd <- read_csv("http://projects.rodrigozamith.com/datastorytelling/data/public_plans_database.csv", guess_max=3000)

That dataset has a large number of variables (270!), some of which can be difficult to decipher from the variable name alone. Thankfully, they provide detailed and helpful data documentation. I highly recommend reviewing that before playing with the data.

 

Useful functions

Here are some useful functions to help with this Challenge:


Data Visualization #1

Finished Visualization

Include a link to your visualization here. (Make sure you can also view it in your browser’s private browsing mode.)

Data Analysis

Include all of the code used to create that visualization:

# Your code here

Rationale

Please explain your objective and rationale here, per the Challenge instructions.


Data Visualization #2

Finished Visualization

Include a link to your visualization here. (Make sure you can also view it in your browser’s private browsing mode.)

Data Analysis

Include all of the code used to create that visualization:

# Your code here

Rationale (10 points)

Please explain your objective and rationale here, per the Challenge instructions.