Skip to main content

Intro to Large Language Models with Cohere

Language is important. It’s how we learn about the world (e.g. news, searching the web, Wikipedia), and also how we shape it (e.g. agreements, contracts, laws). Language is also how we connect and communicate -- as people, and as groups and companies.

Despite the rapid evolution of software, computers remain limited in their ability to deal with language. Software is great at searching for exact matches in text, but often fails at more advanced uses of language -- ones that humans employ on a daily basis.

There’s a clear need for more intelligent tools that better understand language.

Enter the large language model#

A recent breakthrough in artificial intelligence (AI) is the introduction of language processing technologies that enable us to build more intelligent systems with a richer understanding of language than ever before. Large pretrained Transformer language models, or simply large language models, vastly extend the capabilities of what systems are able to do with text.

Language models take in text and can output text, or numeric representation of text, each is useful in its own way.

Large language models are computer programs that open new possibilities of text understanding and generation in software systems.

Consider this: adding language models to empower Google Search was noted as “representing the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of Search“. Microsoft also uses such models for every query in the Bing search engine.

Despite the utility of these models, training and deploying them effectively is resource intensive in its requirements of data, compute, and engineering resources.

Large language models via the Cohere API#

Cohere offers an API to add cutting-edge language processing to any system. Cohere trains massive language models and puts them behind a simple API. Moreover, through finetuning, users can create massive models customized to their use case and trained on their data. This way, Cohere handles the complexities of collecting massive amounts of text data, the ever evolving neural network architectures, distributed training, and serving models around the clock.

Generation language models generate text, representation language models produce numeric representations that other systems can use.

Cohere offers access to both generation models (through the generate endpoint) and representation models (through the embed endpoint which returns an embedding vector for the input text).

Two major categories of large language models are generative language models (like GPT2 and GPT3) and representation language models (like BERT). Cohere offers variants of both types.

Three example applications#

Here are a few examples of language understanding systems that can be built on top of large language models.

1. Semantic similarity: have we answered this question before?#

Think of how many repeated questions have to be answered by a customer service agent every day. Language models are capable of judging text similarity and determining if an incoming question is similar to questions already answered in the FAQ section.

Semantic similarity can help us build system that determine if a customer question is already addressed in the FAQ.

Cohere’s similarity endpoint compares a sentence (anchor) and multiple other sentences (targets).

The following examples from the Cohere playground shows the similarity score between an example question and two FAQs.

Example of question similarity

The similarity endpoint compares an anchor sentence against a number of target sentences. Note how the tablet FAQ question is scored higher in similarity (0.51) than the other FAQ question (0.27).

There are multiple things your system can do once it receives the similarity scores -- one possible next action is to simply show the answer to the most similar question (if above a certain similarity threshold). Another possible next action is to make that suggestion to a customer service agent.

2. Summarization & paraphrasing: what’s a better way of saying this?#

Large language models present a breakthrough in text generation. For the first time in history, we have software programs that can write text that sounds like it’s written by humans. These capabilities open doors to use cases like summarization or paraphrasing.

generative language models take in a text prompt and output text

Language models can be instructed to generate useful summaries or paraphrases of input text by guiding them using a task description in the prompt.

A summarization prompt in the Cohere playground shows this output (in bold):

Example of summarizing a sentence about the planet Jupiter

Example summarization prompt and generation. Stop sequence is specified as the period to limit the output to one sentence.

Large language models can be adapted to new tasks with impressive speed. For tasks which appear in the training data (i.e. documents on the web), language models can successfully summarize text without being shown any examples at all.

The prompt can be tuned by trying multiple task descriptions and adding examples. Finetuning allows us to show a lot more examples to the model.

Two strategies you can experiment with generative language models are prompt engineering and finetuning (which creates a custom model trained on your dataset).

Summarization and paraphrasing both use the generate endpoint.

3. Classification: is this a question or request?#

Classification is one of the most common use cases in language processing. Building systems on top of language models can automate language-based tasks and save time and energy.

Classyfing incoming customer messages can help automate customer service workflows

Developers can build classifiers on top of Cohere’s language models. These classifiers can automate language tasks and workflows.

There's more than one way to build a classifier on top of Cohere's language models. It's worth experimeting to see which method works best for your use case. The simpler methods can get you quick results, while the more advanced methods need more data and will lead to better results.

On the simpler side are methods like using the Similarity endpoint for classification or the Choose Best endpoint for classification. More industrial grade classifiers can be built by fitting a classifier on top of the embed endpoint (guide coming soon).