The example classification task here will be sentiment analysis of film reviews. We'll train a simple classifier to detect whether a film review is negative (class 0) or positive (class 1).
We'll go through the following steps:
- Get the dataset
- Get the embeddings of the reviews (for both the training set and the test set).
- Train a classifier using the training set
- Evaluate the performance of the classifier on the testing set
We'll only use a subset of the training and testing datasets in this example. We'll only use 100 examples since this is a toy example. You'll want to increase the number to get better performance and evaluation.
We're now ready to retrieve the embeddings from the API
We now have two sets of embeddings,
embeddings_train contains the embeddings of the training sentences while
embeddings_test contains the embeddings of the testing sentences.
Curious what an embedding looks like? we can print it:
Now that we have the embedding we can train our classifier. We'll use an SVM from sklearn.
This was a small scale example, meant as a proof of concept and designed to illustrate how you can build a custom classifier quickly using a small amount of labelled data and Cohere's embeddings. Increase the number of training examples to achieve better performance on this task.