Dynamic Driver Performance: Regression Model Update

This is an update to the first post: A Regression Model for Driver Win Probability. In that post, we treated the lifetime win rates and podium rates as constants for each driver. Although this made setting up the regression model easier, we got the unlikely result of Lewis Hamilton having the highest win probability in a hypothetical next race. As an improvement, we will upgrade the win rate and podium rate features to be dynamic year by year. For example, we will regress all of the 2018 wins against 2017 driver performance, 2019 wins against 2018 performance, and so on […]

Formula 1 Betting Markets

If you pull up a Formula 1 betting market online, it will look like this: Source: https://www.oddschecker.com/us/motorsport/formula-one/singapore-grand-prix/winner In this post we will look at how the F1 betting market works and relate this back to driver win probability. American Betting Spreads F1 spreads or “odds” are of the American variety (consider that horse race betting was the most popular form of gambling in the U.S. for a long time). There are just a few things to know about American betting spreads: (1) “Favorites” have a greater than 50% chance of winning, and have negative spreads. “Underdogs” have a lower than […]

A Regression Model for Driver Win Probability

In this post, we will use the Formula 1 World Championship Dataset (available on kaggle.com) to estimate the race win probabilities for each driver in a hypothetical next race. The model we will use is a “multinomial logistic” regression. Multinomial means we have multiple possible outcomes (20 drivers who can win). Logistic means we will use the coefficients from a linear regression to assign win probabilities to each driver. R code can be downloaded at github.com/f1datadriver Post questions and comments below! Step 1: Gather F1 Data into One Dataframe The only libraries needed are lubridate (for some date/time math), nnet […]