In this post, we answer frequently asked questions about finetuning.
While our Classify endpoint enables a user to build a classifier with just 5 examples per label, this classifier runs on our baseline model which has not been trained for specific use cases. Your dataset must contain at least 250 labelled examples to start training.
If you are unable to locate a relevant labelled dataset from online sources, we suggest trying to generate labelled examples using our Generate endpoint. Check out this sample preset of a user generating product feedback to finetune a product feedback classifier:
Ensure your data is in a two-column csv. One column should be the sample text you'd like to classify or search, and the second column should be a label for the text. We recommend using a comma
, as your delimiter.
Here are a few example lines from a dataset that could be used to train a model that classifies headlines as
neutral with our Classify endpoint:
To pass data validation, ensure that:
- There are at least 5 examples for each label in your dataset
- Your dataset contains at least unique 250 examples in total (not 250 examples per label)
- Your data is encoded in UTF-8
- There are no duplicate examples (We will automatically deduplicate your dataset)
Cohere's Classify endpoint will return predictions for classes that sum up to
1. We currently do not support outputting classifications for multiple labels (known as multi-label classification). Each example text should be mapped to one label only.
Take this example below:
Currently, we will process this data and train with two labels,
technology,economy instead of the desired three labels,
In this case you will need to select one label for each headline.
At this time, if you are intending to finetune a representation model to use Cohere's Embed endpoint to perform a search task (not predicting a label), you will still need to assign a label to texts for representation finetuning.
For example, if you are building a search engine for Hacker News posts and you want to either cluster similar posts or associate posts with a certain keyword, you would create a labelled dataset with the post titles mapped to the keyword. See a few sample lines labelled below:
If you are topic modelling and trying to find clusters, we recommend trying the baseline model. Check out our blog post on topic modelling Hacker News posts.
When you are viewing auto evaluation metrics during or after your finetune has completed, you may find that the F1, Recall, and Precision metrics are missing. This may occur if your dataset is extremely imbalanced (e.g. A binary dataset with 95%
positive labels and 5%
negative labels) and the finetuned model fails to predict one of the labels at all. This does not prevent you from using this finetuned model, it is simply a warning.
To resolve this warning, try adding more examples for labels with less data.
Finetunes are completed sequentially, and when you launch a finetune it is added to the end of a queue. Depending on the length of our finetuning queue, finetunes may take between 1 hour to a day to complete.
To use your finetuned model in our API or SDKs, you must call it by its model UUID and not by its name. To get the model UUID, select the model in the playground and click
Export Code. Select the library you are using and copy the code to call the model.
All finetuned models are paused after 14 days of inactivity. To restart your model, select your model in the finetuned models panel and click on the
Wake button, pictured below:
Our engineers review every individual failed finetune and will attempt to rectify it without any action on your part. We reach out to individuals with failed finetunes we cannot resolve manually.