In this article, we look at finetuning a representation model, which covers both the Embed and Classify endpoints.
See here if you'd like to get an overview of finetuning and finetuning a generation model.
Text classification is one of the most common language understanding tasks. A lot of business use cases can be mapped to text classification. Examples include:
- Evaluating the tone and sentiment of an incoming customer message (e.g. classes: “positive” and “negative”)
- Routing incoming customer messages to the appropriate agent (e.g. classes: “billing”, “tech support”, “other”)
- Evaluating if a user comment needs to be flagged for moderator attention (e.g. classes: “flag for moderation”, “neutral”)
In this article, we'll finetune a representation model for sentiment classification.
Finetuning leads to the best classification results a language model can achieve. That said, non-finetuned, baseline embeddings can perform well in a lot of tasks (See the text classification article for an example of how to train a sentiment classifier on top of baseline embedding models). But if we need to get that extra boost in performance, finetuning makes our LLM become a specialist for the task we care about.
The finetune file is a Comma Separated Values (CSV) file with a column for text and another for the number of the class. The contents of that file can look like this:
The CSV file can be prepared in Excel or in text format like this with a
.csv extension (Note: no header row, although there's an option to ignore the header):
That CSV file is then what you upload in the Representation finetuning dialog box in the Playground.
Get Started now and get unprecedented access to world-class Generation and Representation models with billions of parameters.
A representation LLM is excellent at generating sentence embeddings (lists of numbers that capture the meaning of the sentences). These embeddings are great at indicating how similar sentences are to each other. We can plot them to explore their similarities and differences (points that are close together have similar embeddings).
Consider a case where we have five customer messages. Visualizing their embeddings can look like this:
Such an embedding captures semantic similarity – so for example, messages about shipping are close to each other on the left.
If we want to build the best sentiment classifier, however, then we need our embedding model to care about sentiment more than it cares about semantic similarity.
If we colour the points depending on their sentiment, it could look like this:
Successfully finetuning a representation model on customer sentiment leads to a model which embeds sentences in this fashion:
Finetuning an embedding model on customer sentiment leads to an embedding model where the embeddings of positive comments are similar to each other and distinct from those of negative comments. This leads to better sentiment classification results.
There are several things to take into account to achieve the best finetuned embeddings:
- Text cleaning: Improving the quality of the data is often the best investment in problem solving with machine learning. If the text, for example contains symbols or URLs or HTML code which are not needed for a specific task, make sure to remove them from the finetuned file (and from the text you later send to the trained model).
- Number of examples: The minimum number of labeled examples is 250, though we advise having at least 500 to achieve good finetuning results. The more examples the better.
- Number of examples per class: In addition to the overall number of examples, it's important to have many examples of each class in the dataset.
- Mix of examples in dataset: We recommend that you have a balanced (roughly equal) number of examples per class in your dataset.
- Length of texts: The context size for text is currently 512 tokens. Subsequent tokens are truncated.
Finetuning a representation model consists of a few simple steps. Let’s go through the steps for finetuning a representation model.
On the Cohere platform, go to the Dashboard and click on ‘Create Finetune’.
Choose ‘Representation (Embed,Classify)’ as the model type and select the size of your choice. There is a tradeoff—in general, bigger models exhibit better performance while smaller models are faster to finetune.
Upload your training dataset data by going to ‘Training data’ and clicking on ‘choose a .csv’. Your data should be in CSV format with exactly two columns—the first and second columns consisting of the examples and labels respectively.
Optionally, you can upload a validation dataset. This will not be used during finetuning but instead, will be used for evaluating the model’s performance post-finetuning. To do so, go to ‘Validation data (optional)’ and repeat the same steps you just did with the training dataset. If you don’t upload a validation dataset, the platform will automatically set aside a validation dataset from the training dataset.
Once done, click on ‘Preview data’.
The preview window will show a few samples of your training dataset, and if you uploaded it, your validation dataset.
If you are happy with how the samples look, click on ‘Review data’.
Now, everything is set for finetuning to begin. Click on ‘Start finetuning’ to proceed.
You can view the status of the finetuning by going to the Dashboard.
You can also monitor a more detailed log by hovering over your finetuning task and clicking on ‘View metrics'.
Here you can track the progress of the finetuning task. In 'Performance Graphs', you can monitor four metrics commonly used to evaluate a classifier performance - Accuracy, F1, Precision, and Recall. Note that for classification tasks with more than 2 labels, we calculate F1, Precision, and Recall scores by taking the macro average. You can read more about macro averaging to calculate these scores in our blog post about classification evaluation metrics.
Once finetuning is completed, the status message will show as 'Finetune completed. Now you can test your model in the Playground'. Your model is now ready!