Text Classification with co.embed()
This notebook shows how to build a classifiers using Cohere's embeddings. You can find the code in the notebook and colab.

The example classification task here will be sentiment analysis of film reviews. We'll train a simple classifier to detect whether a film review is negative (class 0) or positive (class 1).
We'll go through the following steps:
- Get the dataset
- Get the embeddings of the reviews (for both the training set and the test set).
- Train a classifier using the training set
- Evaluate the performance of the classifier on the testing set
#
1. Get the datasetWe'll only use a subset of the training and testing datasets in this example. We'll only use 100 examples since this is a toy example. You'll want to increase the number to get better performance and evaluation.
#
2. Get the embeddings of the reviewsWe're now ready to retrieve the embeddings from the API
We now have two sets of embeddings, embeddings_train
contains the embeddings of the training sentences while embeddings_test
contains the embeddings of the testing sentences.
Curious what an embedding looks like? we can print it:
#
3. Train a classifier using the training setNow that we have the embedding we can train our classifier. We'll use an SVM from sklearn.
#
4. Evaluate the performance of the classifier on the testing setThis was a small scale example, meant as a proof of concept and designed to illustrate how you can build a custom classifier quickly using a small amount of labelled data and Cohere's embeddings. Increase the number of training examples to achieve better performance on this task.