Skip to main content

Content Moderation with Classify

The Classify endpoint streamlines the task of running a text classification task. Via a single endpoint, you can deploy different kinds of content moderation use cases according to your needs.

As online communities continue to grow, content moderators need a way to moderate user-generated content at scale. To appreciate the wide-ranging need for content moderation, we can refer to the paper A Unified Typology of Harmful Content by Banko et al. [Source]. It provides a unified typology of harmful content generated within online communities and a comprehensive list of examples, which can be grouped into four types:

  • Hate and Harassment
  • Self-Inflicted Harm
  • Ideological Harm
  • Exploitation

A Typology of Harmful Content by Banko et al.

A Typology of Harmful Content by Banko et al.

There are publicly available datasets within the content moderation space which you can experiment with, for example:

A Quick Walkthrough#

Here we take a quick look at performing a toxicity detection using the Classify endpoint of the Cohere API. In this example, our task is to classify a list of example social media comments as either toxic or benign.

Detecting and flagging toxic comments on social media

Detecting and flagging toxic comments on social media

LLMs work by conditioning it with some examples of what we want its outputs to look like. In our case, we’ll provide a few examples of labeled data, whereby each data point contains the text online comment and the associated toxicity label. Then we feed the model with the inputs we want to classify and the model will return the predicted class it belongs to.

We’ll use the Cohere Playground, which is an interface that helps you quickly prototype and experiment with LLMs.

First, we choose the model we want to use and enter the labeled examples. The model will work fine with as few as 5 examples per class, but in general, the more data, the better. In this example, we’ll provide 5 examples for each class: toxic and benign.

Adding examples in the Playground

Adding examples in the Playground

Here’s a better look at all ten examples:

The list of examples

The list of examples

Next we enter the list of inputs we want to classify and run the classification. Here we have 3 inputs per class, making it 6 total.

Adding inputs in the Playground

Adding inputs in the Playground

Here’s a better look at all six inputs and outcomes:

The list of inputs and outcomes

The list of inputs and outcomes

In this small example, the model got all classifications correct. We can then generate the equivalent code to access the Classify endpoint by exporting the code from the Playground.

Exporting the classification code from the Playground

Exporting the classification code from the Playground ">tk

The following is the corresponding code snippet (here using a Python example) for the API call. From here, we can further build the content moderation solution according to the scale and integration needs.

import cohere
from cohere.classify import Example
co = cohere.Client('{apiKey}')
classifications = co.classify(
model='medium',
taskDescription='',
outputIndicator='',
inputs=["this game sucks,\n you suck", "you f*g*t", "put your neck in a\n noose", "buy the black\n potion", "top mia", "gg well played"],
examples=[Example("yo how are you", "benign"), Example("PUDGE MID!", "benign"), Example("I WILL REMEMBER THIS FOREVER", "benign"), Example("I think I saw it first", "benign"), Example("bring me a potion", "benign"), Example("I will honestly kill you", "toxic"), Example("get rekt moron", "toxic"), Example("go to hell", "toxic"), Example("f*a*g*o*t", "toxic"), Example("you are hot trash", "toxic")])
print('The confidence levels of the labels are: {}'.format(
classifications.classifications))

Next Steps#

To get the best classification performance, you will likely need to perform finetuning, which is a method for customizing an LLM model with your own dataset. This is especially true for a content moderation task, where no two communities are the same and where the nature of the content is always evolving. The model will need to capture the nuances of the content within a given community at a given time, and finetuning is a way to do that.

The Cohere platform lets you finetune a model using a dataset you provide. Refer to this article for a step-by-step guide for finetuning the Classify endpoint.

In summary, Cohere’s LLM API empowers developers to build content moderation systems at scale without having to worry about building and deploying machine learning models in-house. In particular, teams can perform text classification tasks via the Classify endpoint. Try it now!