For this Data Challenge, you will be analyzing data about soccer transfer fees between the 1992 and 2018 seasons, as estimated by TransferMarkt. These data were scraped from that website by the instructor. They only include data for players who either joined or left a Premier League team (at the time of sale). For example, a transfer between Arsenal and Barcelona would be included in these data but not one between Barcelona and Real Madrid.

In order to permit the calculation of inflation-adjusted values, data for the Retail Prices Index were also obtained from Britain’s Office for National Statistics and added to the dataset for you.

For each question below, try to describe (a) your logic for how you intend to answer the question, (b) the code itself, and (c) a final answer. By following that process, you’ll increase the likelihood that the material will “stick” because you’re expressing things both conceptually and technically.

You can download a copy of this notebook (to make getting started easier) by clicking here.

Load Data

library(tidyverse)
football_transfers <- read_csv("http://projects.rodrigozamith.com/datastorytelling/data/football_transfers.csv") # Data from TransferMarkt
uk_price_index <- read_csv("http://projects.rodrigozamith.com/datastorytelling/data/uk_price_index.csv") # Data from the Office for National Statistics
football_transfers <- left_join(football_transfers, uk_price_index, by=c("season"="year")) # Combine the two datasets
rm(uk_price_index) # Remove that dataset to reduce clutter in our environment

Please note that the variable names map to the following:

Question 1

Which player had the highest transfer fee? Which club did that player leave and which team did he join?

Rationale

Describe your rationale here.

Code

# Insert your code here

Answer

Put your answer here.

Question 2

Which English Premier League club spent the most money on transfers between the 2014 and 2018 seasons (inbound transfers)? How much did they spend?

Rationale

Describe your rationale here.

Code

# Insert your code here

Answer

Put your answer here.

Question 3

Create a new variable called player_position_group and fill it out so that you have four groups based on the player_position variable: Defenders (Centre-Backs, Defenders, Left-Backs, Right-Backs, and Sweepers), Midfielders (Attacking Midfielders, Central Midfielders, Defensive Midfielders, Left Midfielders, Midfielders, and Right Midfielders), Attackers (Attackers, Centre-Forwards, Forward, Left Wingers, Right Wingers, and Second Strikers), and Goalkeeper (Goalkeepers).

Among inbound players, which position group had the highest median transfer fee in 2018? What was that fee?

Rationale

Describe your rationale here.

Code

# Insert your code here

Answer

Put your answer here.

Question 4

Using that player_position_group variable, create a simple line graph charting the change in the median fee for inbound players in each position group between 2008 and 2018. What is your take-away from that chart?

Rationale

Describe your rationale here.

Code

# Insert your code here

Answer

Put your answer here.

Question 5

Create a new variable called player_fee_adjusted that reflects the inflation-adjusted transfer fee, using 2018 pound sterling (£) values. Which outbound Arsenal FC player had the highest inflation-adjusted transfer fee? What was that fee?

(You can calculate this using a mutation. See “How the inflation calculator works” using this link: https://www.bankofengland.co.uk/monetary-policy/inflation/inflation-calculator. I recommend plugging some of your calculated values into that calculator to double-check your work.)

Rationale

Describe your rationale here.

Code

# Insert your code here

Answer

Put your answer here.

Question 6

Repeat your analysis from Question 4 using the adjusted-inflation figures. How does the new chart alter your take-away point, if at all?

Rationale

Describe your rationale here.

Code

# Insert your code here

Answer

Put your answer here.

Question 7

Produce a bar graph that shows the median per-player, inflation-adjusted expenditure (inbound transfers) between 2008 and 2018 for the ten biggest per-player spending clubs. Which team spent the most money on a per-player basis during that stretch of time? How much did they spend?

Rationale

Describe your rationale here.

Code

# Insert your code here

Answer

Put your answer here.

Question 8

Produce a bar graph that shows the net spend (cost of inbound transfers minus amount received for outbound transfers, not adjusted for inflation) between 2008 and 2018 for the ten biggest spenders. Which team had the highest net spend during that stretch of time? How much did they spend?

Rationale

Describe your rationale here.

Code

# Insert your code here

Answer

Put your answer here.

Question 9

Please perform some original data analysis using this dataset and produce a news brief (1-2 paragraphs) from that information. (Include all the necessary code below.)

Code

# Code here

Answer

Put your brief here.