For this Data Challenge, you will be analyzing data about soccer transfer fees between the 1992 and 2018 seasons, as estimated by TransferMarkt. These data were scraped from that website by the instructor. They only include data for players who either joined or left a Premier League team (at the time of sale). For example, a transfer between Arsenal and Barcelona would be included in these data but not one between Barcelona and Real Madrid.
In order to permit the calculation of inflation-adjusted values, data for the Retail Prices Index were also obtained from Britain’s Office for National Statistics and added to the dataset for you.
For each question below, try to describe (a) your logic for how you intend to answer the question, (b) the code itself, and (c) a final answer. By following that process, you’ll increase the likelihood that the material will “stick” because you’re expressing things both conceptually and technically.
You can download a copy of this notebook (to make getting started easier) by clicking here.
library(tidyverse)
football_transfers <- read_csv("http://projects.rodrigozamith.com/datastorytelling/data/football_transfers.csv") # Data from TransferMarkt
uk_price_index <- read_csv("http://projects.rodrigozamith.com/datastorytelling/data/uk_price_index.csv") # Data from the Office for National Statistics
football_transfers <- left_join(football_transfers, uk_price_index, by=c("season"="year")) # Combine the two datasets
rm(uk_price_index) # Remove that dataset to reduce clutter in our environment
Please note that the variable names map to the following:
season
: The season the player was transferred (e.g., 2018 == 2018-2019 season)club
: The club (team) the player is being associated withdirection
: The direction of the transfer; “In” = joining club
, “Out” = leaving club
player_name
: The name of the playerplayer_age
: The age of the player at the time of the transferplayer_nationality
: The primary nationality of the playerplayer_position
: The player’s primary position on the pitchplayer_market_value
: The player’s market value at the time, as estimated by TransferMarkt (in British pounds, £)player_other_club
: The team transacted with (e.g., team the player left with “In” transfers, or the team they joined with “Out” transfers)player_other_country
: The country of the club transacted withplayer_fee
: The player’s estimated transfer fee (cost) (in British pounds, £)price_index
: The retail price index for calculating inflation costs, as specified by Britain’s Office for National StatisticsWhich player had the highest transfer fee? Which club did that player leave and which team did he join?
Describe your rationale here.
# Insert your code here
Put your answer here.
Which English Premier League club spent the most money on transfers between the 2014 and 2018 seasons (inbound transfers)? How much did they spend?
Describe your rationale here.
# Insert your code here
Put your answer here.
Create a new variable called
player_position_group
and fill it out so that you have four groups based on theplayer_position
variable: Defenders (Centre-Backs, Defenders, Left-Backs, Right-Backs, and Sweepers), Midfielders (Attacking Midfielders, Central Midfielders, Defensive Midfielders, Left Midfielders, Midfielders, and Right Midfielders), Attackers (Attackers, Centre-Forwards, Forward, Left Wingers, Right Wingers, and Second Strikers), and Goalkeeper (Goalkeepers).Among inbound players, which position group had the highest median transfer fee in 2018? What was that fee?
Describe your rationale here.
# Insert your code here
Put your answer here.
Using that
player_position_group
variable, create a simple line graph charting the change in the median fee for inbound players in each position group between 2008 and 2018. What is your take-away from that chart?
Describe your rationale here.
# Insert your code here
Put your answer here.
Create a new variable called
player_fee_adjusted
that reflects the inflation-adjusted transfer fee, using 2018 pound sterling (£) values. Which outbound Arsenal FC player had the highest inflation-adjusted transfer fee? What was that fee?(You can calculate this using a mutation. See “How the inflation calculator works” using this link: https://www.bankofengland.co.uk/monetary-policy/inflation/inflation-calculator. I recommend plugging some of your calculated values into that calculator to double-check your work.)
Describe your rationale here.
# Insert your code here
Put your answer here.
Repeat your analysis from Question 4 using the adjusted-inflation figures. How does the new chart alter your take-away point, if at all?
Describe your rationale here.
# Insert your code here
Put your answer here.
Produce a bar graph that shows the median per-player, inflation-adjusted expenditure (inbound transfers) between 2008 and 2018 for the ten biggest per-player spending clubs. Which team spent the most money on a per-player basis during that stretch of time? How much did they spend?
Describe your rationale here.
# Insert your code here
Put your answer here.
Produce a bar graph that shows the net spend (cost of inbound transfers minus amount received for outbound transfers, not adjusted for inflation) between 2008 and 2018 for the ten biggest spenders. Which team had the highest net spend during that stretch of time? How much did they spend?
Describe your rationale here.
# Insert your code here
Put your answer here.
Please perform some original data analysis using this dataset and produce a news brief (1-2 paragraphs) from that information. (Include all the necessary code below.)
# Code here
Put your brief here.