Tidy Tuesday: US Tuition Data

Jonny Law
2020-03-10

tuesdata <- tidytuesdayR::tt_load(2020, week = 11)

    Downloading file 1 of 5: `diversity_school.csv`
    Downloading file 2 of 5: `historical_tuition.csv`
    Downloading file 3 of 5: `salary_potential.csv`
    Downloading file 4 of 5: `tuition_cost.csv`
    Downloading file 5 of 5: `tuition_income.csv`

This weeks data consists of tuition costs, salary potential and diversity information of US colleges. This includes 2 year colleges which offer associate degrees, certificates and diplomas and 4 year colleges which offer bachelors and masters degrees. These are further split by private institutions, public and for profit. Additionally, Universities in the US charge different tuition fees for in-state or out-of-state students. Also, the ticket price is not always reflective of the students costs. The fees can be wholly or partially subsidised by scholarships and financial aid.

The first question which I wanted to answer is which universities have the highest tuition cost and what type of institution are they.


tuition_cost <- tuesdata$tuition_cost

tuition_cost %>% 
  top_n(30, wt = in_state_tuition) %>%
  mutate(name = forcats::fct_reorder(name, in_state_tuition)) %>%
  ggplot(aes(x = name, y = in_state_tuition, fill = type)) +
  geom_col() +
  coord_flip() +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "Top 30 most expensive colleges") +
  ylab("College") +
  xlab("In State Tuition")

tidybayes can be used to plot the distribution of in state costs and out of state costs.


tuition_cost %>% 
  group_by(type) %>% 
  pivot_longer(c("out_of_state_total", "in_state_total"), names_to = "tuition_type", values_to = "tuition") %>% 
  ggplot(aes(x = tuition, y = tuition_type)) +
  stat_halfeyeh() +
  scale_x_continuous(labels = scales::dollar_format()) +
  labs(title = "")

Historical Tuition Data

Another dataset contains the historical tuition values in adjusted US dollars. We can see that private and public tuitions have doubled for four year courses since 1985. It’s quite a lot more expensive to attend college in the US now than it was 35 years ago!


tuesdata$historical_tuition %>% 
  mutate(year = substr(year, 1, 4) %>% as.numeric()) %>% 
  ggplot(aes(x = year, y = tuition_cost, colour = tuition_type)) +
  geom_line() +
  facet_wrap(~type, ncol = 3) +
  theme_bw() +
  theme(legend.position = "bottom") +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "Tuition cost in 2016/17 dollars")

State tuition map

The package urbnmapr allows us to plot US states and overlay information at a state level.


tuition_cost <- tuesdata$tuition_cost

states_sf <- get_urbn_map("states", sf = TRUE)

tuition_by_state <- states_sf %>% 
  rename(state_code = state_abbv) %>% 
  inner_join(tuition_cost, by = "state_code") %>% 
  filter(degree_length == "4 Year") %>% 
  group_by(state_code) %>% 
  summarise_if(is.numeric, list(~mean(.), ~median(.)))

tuition_by_state %>% 
  ggplot() +
  geom_sf(color = "#ffffff", aes(fill = out_of_state_tuition_median)) +
  scale_fill_gradient2(labels = scales::dollar_format()) +
  coord_sf(datum = NA)

Most cost effective Universities

To quantify the most cost effective university to attend, divide the mid career pay by the total tuition paid for a 4-year degree (bachelors or masters degree).


salary <- tuesdata$salary_potential

salary %>% 
  left_join(tuesdata$tuition_cost %>% filter(degree_length == "4 Year")) %>% 
  pivot_longer(c("out_of_state_total", "in_state_total"), names_to = "tuition_type", values_to = "tuition") %>% 
  mutate(ratio = mid_career_pay / tuition,
         name_cost = paste(name, scales::dollar(tuition))) %>%
  ggplot(aes(x = ratio, y = tuition)) +
  geom_point() +
  facet_wrap(~tuition_type, ncol = 1) +
  scale_y_continuous(labels = scales::dollar_format()) +
  xlab("Mid Career Earnings / Tuition Fee") +
  ylab("Tuition") +
  labs(title = "Yearly Tuition Costs and Mid Career Earnings")