Comparing Baseline and Custom Models
Token likelihood is a useful tool for model evaluation. For instance, let's say you've trained a custom model and would like to know how much it's improved over the default model - you could use token likelihoods to compare the performance of the models on some held-out text. Here is a quick demonstration of how to use the return_likelihoods
parameter from the Generate endpoint for model evaluation.
Example Setup
Let's say we've custom trained a medium
model on Shakespeare data. We'd like to confirm that this custom model has higher likelihood on Shakespeare text compared to the default model. To do this, we could hold out the following snippet from the training data:
"To be, or not to be: that is the question:"
"Whether ’tis nobler in the mind to suffer"
"The slings and arrows of outrageous fortune,"
"Or to take arms against a sea of troubles,"
"And by opposing end them. To die: to sleep..."
Then we could use the following example code to retrieve the average log-likelihood of the above snippet:
curl --location --request POST 'https://api.cohere.ai/generate' \
--header 'Authorization: BEARER {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "medium",
"prompt": "To be, or not to be: that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them. To die: to sleep...",
"max_tokens": 1,
"temperature": 1,
"k": 0,
"p": 0.75,
"return_likelihoods": "ALL"
}'
import cohere
co = cohere.Client('{api_key}')
response = co.generate(
model='small',
prompt='To be, or not to be: that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them. To die: to sleep...',
max_tokens=1,
temperature=1,
k=0,
p=0.75,
return_likelihoods='ALL')
print('Likelihood: {}'.format(response.generations[0].likelihood))
const cohere = require('cohere-ai');
cohere.init('{api_key}');
(async () => {
const response = await cohere.generate({
model: 'small',
prompt: 'To be, or not to be: that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them. To die: to sleep...',
max_tokens: 1,
temperature: 1,
k: 0,
p: 0.75,
return_likelihoods: 'ALL']
});
console.log(`Likelihood: ${response.body.generations[0].likelihood}`);
})();
package main
import (
"fmt"
cohere "github.com/cohere-ai/cohere-go"
)
func main() {
co, err := cohere.CreateClient("{api_key}")
if err != nil {
fmt.Println(err)
return
}
response, err := co.Generate(cohere.GenerateOptions{
Model: "small",
Prompt: `To be, or not to be: that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them. To die: to sleep...`,
MaxTokens: 1,
Temperature: 1,
K: 0,
P: 0.75,
ReturnLikelihoods: "ALL",
})
if err != nil {
fmt.Println(err)
return
}
fmt.Println("Likelihood:", *response.Generations[0].Likelihood)
}
co model generate small 'To be, or not to be: that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them. To die: to sleep...' --max-tokens=1 --temperature=1 --k=0 --p=0.75 --return_likelihoods={likelihoods}'
Results
The following are the average log-likelihoods of the snippet using the baseline and custom medium
models:
Model | Average Log-Likelihood |
---|---|
medium | -2.99 |
custom-medium | -1.12 |
This demonstrates that customizing this model increased the likelihood of Shakespeare data!
Updated 4 months ago