Finetuning Representation Models
In this article, we look at finetuning a representation model, which covers both the Embed and Classify endpoints.
See here if you'd like to get an overview of finetuning and finetuning a generation model.
#
A Text Classification ExampleText classification is one of the most common language understanding tasks. A lot of business use cases can be mapped to text classification. Examples include:
- Evaluating the tone and sentiment of an incoming customer message (e.g. classes: âpositiveâ and ânegativeâ)
- Routing incoming customer messages to the appropriate agent (e.g. classes: âbillingâ, âtech supportâ, âotherâ)
- Evaluating if a user comment needs to be flagged for moderator attention (e.g. classes: âflag for moderationâ, âneutralâ)
In this article, we'll finetune a representation model for sentiment classification.

#
Why Finetune a Representation ModelFinetuning leads to the best classification results a language model can achieve. That said, non-finetuned, baseline embeddings can perform well in a lot of tasks (See the text classification article for an example of how to train a sentiment classifier on top of baseline embedding models). But if we need to get that extra boost in performance, finetuning makes our LLM become a specialist for the task we care about.
#
How to Finetune a Representation ModelThe finetune file is a Comma Separated Values (CSV) file with a column for text and another for the number of the class. The contents of that file can look like this:

The CSV file can be prepared in Excel or in text format like this with a .csv
extension (Note: no header row, although there's an option to ignore the header):
That CSV file is then what you upload in the Representation finetuning dialog box in the Playground.
Get Started now and get unprecedented access to world-class Generation and Representation models with billions of parameters.
#
What Finetuning a Representation Model DoesA representation LLM is excellent at generating sentence embeddings (lists of numbers that capture the meaning of the sentences). These embeddings are great at indicating how similar sentences are to each other. We can plot them to explore their similarities and differences (points that are close together have similar embeddings).
Consider a case where we have five customer messages. Visualizing their embeddings can look like this:

Such an embedding captures semantic similarity â so for example, messages about shipping are close to each other on the left.
If we want to build the best sentiment classifier, however, then we need our embedding model to care about sentiment more than it cares about semantic similarity.
If we colour the points depending on their sentiment, it could look like this:

Successfully finetuning a representation model on customer sentiment leads to a model which embeds sentences in this fashion:

Finetuning an embedding model on customer sentiment leads to an embedding model where the embeddings of positive comments are similar to each other and distinct from those of negative comments. This leads to better sentiment classification results.
#
Tips to improve embedding/finetune qualityThere are several things to take into account to achieve the best finetuned embeddings:
- Text cleaning: Improving the quality of the data is often the best investment in problem solving with machine learning. If the text, for example contains symbols or URLs or HTML code which are not needed for a specific task, make sure to remove them from the finetuned file (and from the text you later send to the trained model).
- Number of examples: The minimum number of labeled examples is 250, though we advise having at least 500 to achieve good finetuning results. The more examples the better.
- Number of examples per class: In addition to the overall number of examples, it's important to have many examples of each class in the dataset.
- Mix of examples in dataset: We recommend that you have a balanced (roughly equal) number of examples per class in your dataset.
- Length of texts: The context size for text is currently 512 tokens. Subsequent tokens are truncated.
#
Finetuning a Representation Model: Step-by-stepFinetuning a representation model consists of a few simple steps. Letâs go through the steps for finetuning a representation model.
On the Cohere platform, go to the Dashboard and click on âCreate Finetuneâ.
Creating a finetune
#
Choose the Baseline ModelChoose âRepresentation (Embed,Classify)â as the model type and select the size of your choice. There is a tradeoffâin general, bigger models exhibit better performance while smaller models are faster to finetune.
#
Upload Your DataUpload your training dataset data by going to âTraining dataâ and clicking on âchoose a .csvâ. Your data should be in CSV format with exactly two columnsâthe first and second columns consisting of the examples and labels respectively.
Choosing model and uploading data
Optionally, you can upload a validation dataset. This will not be used during finetuning but instead, will be used for evaluating the modelâs performance post-finetuning. To do so, go to âValidation data (optional)â and repeat the same steps you just did with the training dataset. If you donât upload a validation dataset, the platform will automatically set aside a validation dataset from the training dataset.
Once done, click on âPreview dataâ.
Previewing data
#
Preview Your DataThe preview window will show a few samples of your training dataset, and if you uploaded it, your validation dataset.
If you are happy with how the samples look, click on âReview dataâ.
Reviewing data
#
Start FinetuningNow, everything is set for finetuning to begin. Click on âStart finetuningâ to proceed.
Starting finetuning
#
Monitor the StatusYou can view the status of the finetuning by going to the Dashboard.
Monitoring status
You can also monitor a more detailed log by hovering over your finetuning task and clicking on âView metrics'.
Viewing metrics
Here you can track the progress of the finetuning task. In 'Performance Graphs', you can monitor four metrics commonly used to evaluate a classifier performance - Accuracy, F1, Precision, and Recall. Note that for classification tasks with more than 2 labels, we calculate F1, Precision, and Recall scores by taking the macro average. You can read more about macro averaging to calculate these scores in our blog post about classification evaluation metrics.
Once finetuning is completed, the status message will show as 'Finetune completed. Now you can test your model in the Playground'. Your model is now ready!
Message showing finetune completion and graphs showing performance metrics
We canât wait to see what you start building! Share your projects or find support on our Discord or Forum.