Language is important. It’s how we learn about the world (e.g. news, searching the web, Wikipedia), and also how we shape it (e.g. agreements, contracts, laws). Language is also how we connect and communicate -- as people, and as groups and companies.
Despite the rapid evolution of software, computers remain limited in their ability to deal with language. Software is great at searching for exact matches in text, but often fails at more advanced uses of language -- ones that humans employ on a daily basis.
There’s a clear need for more intelligent tools that better understand language.
A recent breakthrough in artificial intelligence (AI) is the introduction of language processing technologies that enable us to build more intelligent systems with a richer understanding of language than ever before. Large pretrained Transformer language models, or simply large language models, vastly extend the capabilities of what systems are able to do with text.
Consider this: adding language models to empower Google Search was noted as “representing the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of Search“. Microsoft also uses such models for every query in the Bing search engine.
Despite the utility of these models, training and deploying them effectively is resource intensive in its requirements of data, compute, and engineering resources.
Cohere offers an API to add cutting-edge language processing to any system. Cohere trains massive language models and puts them behind a simple API. Moreover, through finetuning, users can create massive models customized to their use case and trained on their data. This way, Cohere handles the complexities of collecting massive amounts of text data, the ever evolving neural network architectures, distributed training, and serving models around the clock.
Two major categories of large language models are generative language models (like GPT2 and GPT3) and representation language models (like BERT). Cohere offers variants of both types.
Here are a few examples of language understanding systems that can be built on top of large language models.
Think of how many repeated questions have to be answered by a customer service agent every day. Language models are capable of judging text similarity and determining if an incoming question is similar to questions already answered in the FAQ section.
The following examples from the Cohere playground shows the similarity score between an example question and two FAQs.
There are multiple things your system can do once it receives the similarity scores -- one possible next action is to simply show the answer to the most similar question (if above a certain similarity threshold). Another possible next action is to make that suggestion to a customer service agent.
Large language models present a breakthrough in text generation. For the first time in history, we have software programs that can write text that sounds like it’s written by humans. These capabilities open doors to use cases like summarization or paraphrasing.
A summarization prompt in the Cohere playground shows this output (in bold):
Large language models can be adapted to new tasks with impressive speed. For tasks which appear in the training data (i.e. documents on the web), language models can successfully summarize text without being shown any examples at all.
Summarization and paraphrasing both use the generate endpoint.
Classification is one of the most common use cases in language processing. Building systems on top of language models can automate language-based tasks and save time and energy.
There's more than one way to build a classifier on top of Cohere's language models. It's worth experimeting to see which method works best for your use case. The simpler methods can get you quick results, while the more advanced methods need more data and will lead to better results.
On the simpler side are methods like using the Similarity endpoint for classification or the Choose Best endpoint for classification. More industrial grade classifiers can be built by fitting a classifier on top of the embed endpoint (guide coming soon).