Releasing Harrier League Data

Jonny Law
2020-03-04

The North East Harrier League is a series of cross country running races in the North East of England taking place over the winter from September to March. Results are available online from 2012-13 season to the present season 2019-20. The results are available online in HTML format. I have downloaded and cleaned the data and it can be used for analysis or exploration. The data for senior men and women is available in a tabular format in my blog package - see the file which contains the parsing functions here to get an insight into what it takes to parse this kind of data.

I used the following R packages to download, parse the HTML and clean the resulting data

The data can be accessed by installing my R package which contains a selection of R code relating to this blog.


# install.packages("remotes")
# remotes::install_github("jonnylaw/jonnylaw")
data("harrier_league_results")

Determining the most difficult course

As a quick example of what can be done with the data I will consider the running time by course. The data can be split by male and female. However the men and women don’t compete over the same distance with the women completing two laps and the men completing three. Therefore we can plot the average time for a single lap of the course (obviously this doesn’t account for changing pace throughout the race). It appears that the hardest (or longest) course is Aykley Heads with the highest median race time.

Citation

For attribution, please cite this work as

Law (2020, March 4). Bayesian Statistics and Functional Programming: Releasing Harrier League Data. Retrieved from https://jonnylaw.rocks/posts/2020-03-04-harrier_league_data/

BibTeX citation

@misc{harrier_league_open_data,
  author = {Law, Jonny},
  title = {Bayesian Statistics and Functional Programming: Releasing Harrier League Data},
  url = {https://jonnylaw.rocks/posts/2020-03-04-harrier_league_data/},
  year = {2020}
}