For this Data Challenge, you will be analyzing data about journalists killed while performing their jobs, which was collected by the Committee to Protect Journalists. CPJ makes some methodological details about the data available here. These data only contain killings where the motive was confirmed and only covers journalists (not all media workers).
For each question below, try to describe (a) your logic for how you intend to answer the question, (b) the code itself, and (c) a final answer. By following that process, you’ll increase the likelihood that the material will “stick” because you’re expressing things both conceptually and technically.
You can download a copy of this notebook (to make getting started easier) by clicking here.
library(tidyverse)
journalists_killed <- read_csv("http://projects.rodrigozamith.com/datastorytelling/data/cpj_journalists_killed.csv")
Do you trust this data source? Why or why not? (Please offer your best evaluation of the data source.)
Put your answer here.
Do you see any ethical issues with the way these data were collected? Are there any ethical considerations you believe a journalist should be mindful of when writing a story with these data and/or publishing these data?
Here.
What was the deadliest year for journalists? How many journalists died that year?
Describe your rationale here.
# Insert your code here
Put your answer here.
What was the deadliest country for journalists? How many journalists died in that country?
Describe your rationale here.
# Insert your code here
Put your answer here.
Were the journalists who died in Iraq primarily freelancers, staff reporters, or something else?
Describe your rationale here.
# Insert your code here
Put your answer here.
What proportion of reporters killed in Iraq in 2006 were male?
Describe your rationale here.
# Insert your code here
Put your answer here.
How many American journalists have been killed since 2005?
Describe your rationale here.
# Insert your code here
Put your answer here.
Please produce the output (data) of a data frame that contains information about (a) the year of death, (b) the journalist’s name, (c) the type of death, and (d) the organizations associated with the journalist for each journalist slayed between 2010 and 2016 in Syria.
Describe your rationale here.
# Insert your code here
See the above table.
What was the mean number of journalists killed in the Philippines between 2008 and 2014? What was the median? Which of those numbers best represents the average number of journalists killed during that time period?
Describe your rationale here.
# Insert your code here
Put your answer here.
Please perform some original data analysis using this dataset and produce a news brief (1-2 paragraphs) from that information. (Include all the necessary code below.)
# Insert your code here
Put your brief here.