Exploring a Dataset

Over the course of the semester, we’ll be repeatedly referencing data from Matthew Desmond’s Eviction Lab in our in-class activities and tutorials. However, before we start looking at the data, it’s helpful to know what the data look like.

The Eviction Lab posts all of their data here. The datasets generally contain the same variables, but they aggregate information at different Census-defined levels: block groups, tracts, cities, counties, and states. Individual datasets are available for each of those levels, and both for individual states as well as the entire U.S.

This activity will require you to review three items:

Data Dictionary - This contains a listing of the variables available in the datasets, with a brief description of each variable.
Data Sample - This is a copy of the dataset for Massachusetts, aggregated to the cities level.
Methods FAQ - This help page contains answers to questions commonly asked of the Lab.

After you review the documents, try getting into your “data state of mind” and see if you can answer the following questions:

Why are these data trustworthy or not trustworthy? Would you feel confident writing a news story using them? What limitations can you already envision?
Do you see any immediate ethical concerns involving these data?
What are five data-driven questions you could answer using just these data? In crafting those questions, try to reference measured variables from the dataset. (An example question: Which U.S. city has the highest eviction rate?) What would you need to do to answer those questions? (Don’t focus on the specific R commands you’d run but the sorts of things you’d need to tell R to do.)

Exploring a Dataset

Rodrigo Zamith