The Classify endpoint streamlines the task of running a text classification task. Via a single endpoint, you can deploy different kinds of content moderation use cases according to your needs.
As online communities continue to grow, content moderators need a way to moderate user-generated content at scale. To appreciate the wide-ranging need for content moderation, we can refer to the paper A Unified Typology of Harmful Content by Banko et al. [Source]. It provides a unified typology of harmful content generated within online communities and a comprehensive list of examples, which can be grouped into four types:
- Hate and Harassment
- Self-Inflicted Harm
- Ideological Harm
There are publicly available datasets within the content moderation space which you can experiment with, for example:
- Social Media Toxicity dataset from Surge AI
- Wikipedia Comments dataset by Jigsaw/Conversation AI
- Civil Comments dataset by Jigsaw/Conversation AI
- Hate Speech Dataset by Derczynski et al.
Here we take a quick look at performing a toxicity detection using the Classify endpoint of the Cohere API. In this example, our task is to classify a list of example social media comments as either toxic or benign.
LLMs work by conditioning it with some examples of what we want its outputs to look like. In our case, we’ll provide a few examples of labeled data, whereby each data point contains the text online comment and the associated toxicity label. Then we feed the model with the inputs we want to classify and the model will return the predicted class it belongs to.
We’ll use the Cohere Playground, which is an interface that helps you quickly prototype and experiment with LLMs.
First, we choose the model we want to use and enter the labeled examples. The model will work fine with as few as 5 examples per class, but in general, the more data, the better. In this example, we’ll provide 5 examples for each class: toxic and benign.
Here’s a better look at all ten examples:
Next we enter the list of inputs we want to classify and run the classification. Here we have 3 inputs per class, making it 6 total.
Here’s a better look at all six inputs and outcomes:
In this small example, the model got all classifications correct. We can then generate the equivalent code to access the Classify endpoint by exporting the code from the Playground.
The following is the corresponding code snippet (here using a Python example) for the API call. From here, we can further build the content moderation solution according to the scale and integration needs.
To get the best classification performance, you will likely need to perform finetuning, which is a method for customizing an LLM model with your own dataset. This is especially true for a content moderation task, where no two communities are the same and where the nature of the content is always evolving. The model will need to capture the nuances of the content within a given community at a given time, and finetuning is a way to do that.
The Cohere platform lets you finetune a model using a dataset you provide. Refer to this article for a step-by-step guide for finetuning the Classify endpoint.
In summary, Cohere’s LLM API empowers developers to build content moderation systems at scale without having to worry about building and deploying machine learning models in-house. In particular, teams can perform text classification tasks via the Classify endpoint. Try it now!