The aim of this project was to elaborate a ranking of the best Formula 1 drivers in history.  This ranking is based on two scores: a base score and a refined score.

The base score is supported by plain statistics for each driver, such as number of victories.  The refined score is a comparison of how each driver performed each season against their teammates, considering the base scores for both drivers.

The datasets used in this project were obtained from 3 resources: the official Formula 1 website, F1 FanSite and Stats F1. The first is the main source of information and the other two only provide some additional information such as weather conditions.

R language was used for data scraping, R studio and the library Rvest were helpful in performing that task.  To prepare, transform, mine the data and generate the charts the Pandas and Scikit-learn libraries were used with Python. Few scripts were written to identify missing or inconsistent data.  Then the data was structured into a SQLIte database.

To help managing the database I used the ORM library Sqlalchemy. This project was developed for sports enthusiasts to explore historical sports statistics, player achievements and how luck can also play a part in sports.

Apart from Formula 1 fans, my project also targets a few sectors of the computer industry that might be interested in quantifying talent and subjective data. It can be applicable in gaming, betting and even extrapolated to other kinds of analysis like music and arts.