Skip to main content

Finetuning

Finetuning is the process of taking a pre-trained Large Language Model (LLM) and customizing it with a dataset to excel at a specific task. Finetuning LLMs tends to lead to some of the best performing models in NLP for a wide number of tasks.

A baseline model already comes pre-trained with a huge amount of text data. Finetuning builds on that by taking in and adapting to your own training data. The result is a custom, finetuned model, which produces outputs that are more attuned to the task you have at hand.

See here for how to finetune a generation model and here for a representation model.

Creating a custom model via finetuning

Finetuning uses training data to turn a baseline pre-trained model into a custom, finetuned model

Intuition#

Let's take a representation model as an example, where we finetune a model for a classification task with training data consisting of three classes.

To get an idea of how a representation model performs, we can project the embeddings it generates on a 2-dimensional plot, as per the image below. This image was taken from actual model outputs in the Playground.

The distance between two data points represents how semantically similar they are—the closer they are, the more similar they are, and vice versa. A good model will have a clear separation between classes. To test the model, here we have fifteen data points, five for each class, in which the classes are unknown to the model.

With a baseline model (left plot), we get a good separation between classes, which shows that it can perform well in this task.

But with a finetuned model (right plot), the separation becomes even more apparent. Similar data points are now pushed even closer together and further apart from the rest. This indicates that the model has adapted to the additional data it receives during finetuning, hence is more likely to perform even better in this task.

In real applications, this makes a huge difference. One example is a toxicity classifier to help content moderators automatically flag toxic content on their platforms. Not all online platforms define toxicity the same way, and each will have different language nuances to accommodate. For example, a gaming platform, an online community for kids, and a social media platform—each would have a different interpretation of the exact same data. This is where finetuning can help, where a model can be customized to the specific needs.

Finetuning creates a custom model that adapts to the training data

Finetuning creates a custom model that adapts to the training data