In this post, we will train a machine learning model on a data set of face pictures that have been scored by real people for attractiveness.
Then we will use the trained model to generate an attractiveness “score” for all of the F1 drivers on the grid in 2023.
[Note: Full Python code is available at: https://github.com/f1datadriver/imagenet]
The SCUT 5500 Data Set
The SCUT 5500 data set is a series of 5500 face images of Asian and Caucasian males and females.
You can see the original paper that built the data set here:
SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction: https://arxiv.org/abs/1801.06345
Each picture was given a “score” from 1-5 by 60 different volunteers.
A “5” indicates a high degree of hotness; a “1” indicates the opposite.
Here are a few example pictures from the data set and their corresponding scores:
We can pull up the image in the data set that has the highest score.
That would be this handsome gentleman with a score of 4.6:
We can do the same to check for the most unfortunately looking person in the entire dataset.
This poor soul has a score of 1.02:
Our goal is to build a model that can generate scores for new images, in particular, pictures of F1 drivers.
The ResNet50 CNN
The type of machine learning algorithm that we will use is a Convolutional Neural Network (CNN).
CNNs happen to be useful for image recognition problems, like scoring pictures of F1 drivers for attractiveness.
Building a CNN, i.e. identifying the correct number of layers and nodes, is hard work.
Fortunately, a number of CNNs have already been designed and are available “off the shelf” for use in a wide range of applications.
The CNN we will use is called ResNet50, which is a state of the art, award winning CNN that is 50 layers deep.
The depth of the CNN is what gives it space to learn patterns contained in images.
The best part is, ResNet50 is ridiculously easy to deploy using the TensorFlow library in Python:
Training and Fitting the Model
[Note see the full code for image data wrangling and train/test split on GitHub: https://github.com/f1datadriver/imagenet]
With the ResNet50 CNN handy, we now have to train it or “fit” it to the SCUT 5500 dataset.
This is because ResNet50 can be used for all kinds of image recognition problems, but we want to fit the model’s weights to our face picture hotness objective.
To accomplish this we use TensorFlow’s compile + fit functions.
Compile establishes a loss function to optimize the model’s weights.
Fit will train the model by repeatedly passing over the images in the SCUT 5500 data set with the ResNet50 CNN.
As shown here, the loss decreases with each epoch, or pass over the image data. This means that the model is gradually improving its ability to accurately predict the scores of the test set images in SCUT 5500.
Now that we have a trained model, we can feed any new face pictures into it and generate scores.
This is done with the predict function:
Now let’s unleash the model on the 2023 F1 drivers.
F1 Drivers Ranked by Hotness
[Note: Percentiles are relative to the scores in the SCUT 5500 data set.]
My apologies to Oscar and Yuki, who are both very good looking in my opinion. That said:
Otherwise, Congrats to the winners!