F1 has set up a new game for this season called F1 Fantasy: https://fantasy.formula1.com, and winning comes with some decent prizes: The game has two rules (1) Select 5 drivers and 2 constructors/teams (2) Stay under a total cost cap of $100 million Each driver and team is associated with a particular cost, for example, Max Verstappen costs $30 million to choose, and the Red Bull team costs $27.9 million. F1 already tells us what choices would make up the “Dream Team,” but this team easily breaches the $100 million cost cap and cannot be chosen. In fact, the total […]
Driver Pay and Performance Part 2
This is a follow up to a post from last season where we gawked at the salaries of all 20 F1 drivers: https://f1datadriver.com/driver-pay-and-performance/ It turns out that another data source for driver salaries is provided by the F1 23 video game, produced by EA Sports. You may surprised to know that I do not own this game, but it won’t be long before I buy it, as well as a Playstation 5 to run it on and a hardcover strategy guide. I can’t afford not to! The game comes with ratings in the range of 0-100 for all of the […]
The Hottest F1 Driver
In this post, we will train a machine learning model on a data set of face pictures that have been scored by real people for attractiveness. Then we will use the trained model to generate an attractiveness “score” for all of the F1 drivers on the grid in 2023. [Note: Full Python code is available at: https://github.com/f1datadriver/imagenet] The SCUT 5500 Data Set The SCUT 5500 data set is a series of 5500 face images of Asian and Caucasian males and females. You can see the original paper that built the data set here: SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm […]
Driver Pay and Performance
Formula 1 drivers are paid to do one thing: win races. In this post, we’ll look at the data for driver pay and compare it to driver win rates and podium rates. This will also be a demo of how to make some nice charts in R, which is considerably more fun than making them in (::shudder::) Microsoft Excel. 1. Driver Salary Data Let’s start with a look at how much these guys are making this year. First, gather historical driver compensation data from the trusty internet, put them on a .csv file and load into R as a data […]
A Machine Learning Model for Podium Finishes
This post will demonstrate how to use the “caret” library in R to set up a simple machine learning model. We’ll use the model we build to predict which drivers will finish on Podium (1st, 2nd or 3rd) at the next race, the Japan Grand Prix. Enjoy! 1. Gather the F1 Data Set up the same data frame as in the Regression model, including generating the “Prior Year Win Rate” and “Prior Year Podium Rate” for each driver and each race. We will use these same features in our machine learning model. With one line of code, we can get […]
Dynamic Driver Performance: Regression Model Update
This is an update to the first post: A Regression Model for Driver Win Probability. In that post, we treated the lifetime win rates and podium rates as constants for each driver. Although this made setting up the regression model easier, we got the unlikely result of Lewis Hamilton having the highest win probability in a hypothetical next race. As an improvement, we will upgrade the win rate and podium rate features to be dynamic year by year. For example, we will regress all of the 2018 wins against 2017 driver performance, 2019 wins against 2018 performance, and so on […]
Formula 1 Betting Markets
If you pull up a Formula 1 betting market online, it will look like this: Source: https://www.oddschecker.com/us/motorsport/formula-one/singapore-grand-prix/winner In this post we will look at how the F1 betting market works and relate this back to driver win probability. American Betting Spreads F1 spreads or “odds” are of the American variety (consider that horse race betting was the most popular form of gambling in the U.S. for a long time). There are just a few things to know about American betting spreads: (1) “Favorites” have a greater than 50% chance of winning, and have negative spreads. “Underdogs” have a lower than […]
A Regression Model for Driver Win Probability
In this post, we will use the Formula 1 World Championship Dataset (available on kaggle.com) to estimate the race win probabilities for each driver in a hypothetical next race. The model we will use is a “multinomial logistic” regression. Multinomial means we have multiple possible outcomes (20 drivers who can win). Logistic means we will use the coefficients from a linear regression to assign win probabilities to each driver. R code can be downloaded at github.com/f1datadriver Post questions and comments below! Step 1: Gather F1 Data into One Dataframe The only libraries needed are lubridate (for some date/time math), nnet […]