Skip to main content

Finetuning Representation Models

In this article, we look at finetuning a representation model, which covers both the Embed and Classify endpoints.

See here if you'd like to get an overview of finetuning and finetuning a generation model.

A Text Classification Example#

Text classification is one of the most common language understanding tasks. A lot of business use cases can be mapped to text classification. Examples include:

  • Evaluating the tone and sentiment of an incoming customer message (e.g. classes: “positive” and “negative”)
  • Routing incoming customer messages to the appropriate agent (e.g. classes: “billing”, “tech support”, “other”)
  • Evaluating if a user comment needs to be flagged for moderator attention (e.g. classes: “flag for moderation”, “neutral”)

In this article, we'll finetune a representation model for sentiment classification.

A sentiment classifier assigns a piece of text as either 'positive' or 'negative'

Why Finetune a Representation Model#

Finetuning leads to the best classification results a language model can achieve. That said, non-finetuned, baseline embeddings can perform well in a lot of tasks (See the text classification article for an example of how to train a sentiment classifier on top of baseline embedding models). But if we need to get that extra boost in performance, finetuning makes our LLM become a specialist for the task we care about.

How to Finetune a Representation Model#

The finetune file is a Comma Separated Values (CSV) file with a column for text and another for the number of the class. The contents of that file can look like this:

A table with example texts and a numeric label for each text (0 for negative texts, 1 for positive texts)

The CSV file can be prepared in Excel or in text format like this with a .csv extension (Note: no header row, although there's an option to ignore the header):

My order was late, 0
Shipping was fast!, 1
Order arrived on time, 1
Items are always sold out, 0

That CSV file is then what you upload in the Representation finetuning dialog box in the Playground.

New to Cohere?
Get Started now and get unprecedented access to world-class Generation and Representation models with billions of parameters.

What Finetuning a Representation Model Does#

A representation LLM is excellent at generating sentence embeddings (lists of numbers that capture the meaning of the sentences). These embeddings are great at indicating how similar sentences are to each other. We can plot them to explore their similarities and differences (points that are close together have similar embeddings).

Consider a case where we have five customer messages. Visualizing their embeddings can look like this:

Scatter plot of five message example. Three of them are about shipping and are clustered close together.

Such an embedding captures semantic similarity – so for example, messages about shipping are close to each other on the left.

If we want to build the best sentiment classifier, however, then we need our embedding model to care about sentiment more than it cares about semantic similarity.

If we colour the points depending on their sentiment, it could look like this:

The same scatter plot of the five messages, except the colours of the points indicate which messages are positive and which are negative.

Successfully finetuning a representation model on customer sentiment leads to a model which embeds sentences in this fashion:

A different scatterplot. Now positive messages are grouped together on the right, and negative messages are clustered together on the left

Finetuning an embedding model on customer sentiment leads to an embedding model where the embeddings of positive comments are similar to each other and distinct from those of negative comments. This leads to better sentiment classification results.

Tips to improve embedding/finetune quality#

There are several things to take into account to achieve the best finetuned embeddings:

  • Text cleaning: Improving the quality of the data is often the best investment in problem solving with machine learning. If the text, for example contains symbols or URLs or HTML code which are not needed for a specific task, make sure to remove them from the finetuned file (and from the text you later send to the trained model).
  • Number of examples: The minimum number of labeled examples is 250, though we advise having at least 500 to achieve good finetuning results. The more examples the better.
  • Number of examples per class: In addition to the overall number of examples, it's important to have many examples of each class in the dataset.
  • Mix of examples in dataset: We recommend that you have a balanced (roughly equal) number of examples per class in your dataset.
  • Length of texts: The context size for text is currently 512 tokens. Subsequent tokens are truncated.
  • Deduplication: Ensure that each labelled example in your dataset is unique.
  • High quality test set: In the data upload step, upload a separate test set of examples that you want to see the model benchmarked on. These can be examples that were manually written or verified.

Finetuning a Representation Model: Step-by-step#

Finetuning a representation model consists of a few simple steps. Let’s go through the steps for finetuning a representation model.

On the Cohere platform, go to the Dashboard and click on ‘Create Finetune’.

Creating a finetune

Creating a finetune

Choose the Baseline Model#

Choose ‘Representation (Embed,Classify)’ as the model type and select the size of your choice. There is a tradeoff—in general, bigger models exhibit better performance while smaller models are faster to finetune.

Upload Your Data#

Upload your training dataset data by going to ‘Training data’ and clicking on ‘choose a .csv’. Your data should be in CSV format with exactly two columns—the first and second columns consisting of the examples and labels respectively.

Choosing model and uploading data

Choosing model and uploading data

Optionally, you can upload a validation dataset. This will not be used during finetuning but instead, will be used for evaluating the model’s performance post-finetuning. To do so, go to ‘Validation data (optional)’ and repeat the same steps you just did with the training dataset. If you don’t upload a validation dataset, the platform will automatically set aside a validation dataset from the training dataset.

Once done, click on ‘Preview data’.

Previewing data

Previewing data

Preview Your Data#

The preview window will show a few samples of your training dataset, and if you uploaded it, your validation dataset.

If you are happy with how the samples look, click on ‘Review data’.

Reviewing data

Reviewing data

Start Finetuning#

Now, everything is set for finetuning to begin. Click on ‘Start finetuning’ to proceed.

Starting finetuning

Starting finetuning

Monitor the Status#

You can view the status of the finetuning by going to the Dashboard.

Monitoring status

Monitoring status

You can also monitor a more detailed log by hovering over your finetuning task and clicking on ‘View metrics'.

Viewing metrics

Viewing metrics

Here you can track the progress of the finetuning task. In 'Performance Graphs', you can monitor four metrics commonly used to evaluate a classifier performance - Accuracy, F1, Precision, and Recall. Note that for classification tasks with more than 2 labels, we calculate F1, Precision, and Recall scores by taking the macro average. You can read more about macro averaging to calculate these scores in our blog post about classification evaluation metrics.

Once finetuning is completed, the status message will show as 'Finetune completed. Now you can test your model in the Playground'. Your model is now ready!

Finetune completion and performance metrics

Message showing finetune completion and graphs showing performance metrics

We can’t wait to see what you start building! Share your projects or find support on our Discord or Forum.