Training Custom Models

An Overview of Model Training

Cohere's platform gives you the ability to train a Large Language Model (LLM) and customize it with a dataset to excel at a specific task. Custom models can lead to some of the best performing NLP models for a wide number of tasks.

In this article, we look at training a generation model. See here for training a representation model.

Custom models use training data to turn a baseline model into a fine-tuned model.Custom models use training data to turn a baseline model into a fine-tuned model.

Custom models use training data to turn a baseline model into a fine-tuned model.

When to Train a Model

Training large language models is only required when you need to teach the model something extremely niche, like the different gaits of a horse or your company's unique knowledge base. Common knowledge, like the colour of the sky, does not require training. Training is also helpful for generating or understanding data in a specific writing style or format.

Intuition

Let's take a representation model as an example, where we finetune a model for a classification task with training data consisting of three classes.

To get an idea of how a representation model performs, we can project the embeddings it generates on a 2-dimensional plot, as per the image below. This image was taken from actual model outputs in the Playground.

The distance between two data points represents how semantically similar they are—the closer they are, the more similar they are, and vice versa. A good model will have a clear separation between classes. To test the model, here we have fifteen data points, five for each class, in which the classes are unknown to the model.

With a baseline model (left plot), we get a good separation between classes, which shows that it can perform well in this task.

But with a trained model (right plot), the separation becomes even more apparent. Similar data points are now pushed even closer together and further apart from the rest. This indicates that the model has adapted to the additional data it receives during training, hence is more likely to perform even better in this task.

In real applications, this makes a huge difference. One example is a toxicity classifier to help content moderators automatically flag toxic content on their platforms. Not all online platforms define toxicity the same way, and each will have different language nuances to accommodate. For example, a gaming platform, an online community for kids, and a social media platform—each would have a different interpretation of the exact same data. This is where model training can help, where a model can be customized to your specific needs.

Creates a custom model that adapts to the training data.Creates a custom model that adapts to the training data.

Creates a custom model that adapts to the training data.