Chat API
This Guide Uses the Chat Endpoint.
You can find the API reference for the endpoint here.
In this guide, we show how to use the Chat endpoint to create a simple Chatbot that, given an input query, responds to it considering the previous context.
Getting Set Up
First, let's install the SDK (the examples below are in Python, Typescript, and Go):
pip install cohere
npm i -s cohere-ai
go get github.com/cohere-ai/cohere-go/v2
Import dependencies and set up the Cohere client.
import cohere
co = cohere.Client('Your API key')
import { CohereClient } from "cohere-ai";
const cohere = new CohereClient({
token: "YOUR_API_KEY",
});
(async () => {
const prediction = await cohere.generate({
prompt: "hello",
maxTokens: 10,
});
console.log("Received prediction", prediction);
})();
import cohereclient "github.com/cohere-ai/cohere-go/v2/client"
client := cohereclient.NewClient(cohereclient.WithToken("<YOUR_AUTH_TOKEN>"))
(All the rest of the examples on this page will be in Python, but you can find more detailed instructions for getting set up by checking out the Github repositories for Python, Typescript, and Go.)
Create Prompt
Store the message you want to send into a variable
message = "Hello World!"
Define the Model Settings
The endpoint has a number of settings you can use to control the kind of output it generates. The full list is available in the API reference, but let’s look at a few:
model
: The currently available models arecommand
,command-light
,command-nightly
, andcommand-light-nightly
(command
is the default). Generally, light models are faster but may produce lower-quality generated text, while the others perform better.temperature
: Controls the randomness of the output, with higher values tending to generate more creative outputs and less grounded replies when using retrieval augmented generation.
Generate the Response
Call the endpoint via the co.chat()
method, specifying the message and the model settings.
response = co.chat(
message=message,
model="command",
temperature=0.9
)
answer = response.text
Various Ways of Using the Chat Endpoint
Now that we've covered the basics of getting set up, lets discuss some of the different ways you can use co.chat()
.
Interacting with Chat Directly
In the "Generate the Response" section directly above, we included this code snippet:
response = co.chat(
message=message,
model="command",
temperature=0.9
)
answer = response.text
Here, we are simply pinging the underlying chat model and getting back whatever it generates. This is the simplest way of leveraging co.chat()
, and it's distinct from storing messages as part of an ongoing conversation (covered in the next section) and from "grounding" model outputs in user-provided information (covered near the end).
The advantages of this approach are that the model will attempt to do what you ask it to do, without being constrained by any external data sources (more on this below). For this same reason, the model can also produce more creative replies when you're completing brainstorming or writing tasks.
The disadvantage is that the model's output could also contain factually incorrect information and, without the kinds of citations produced by the model in Document mode, it can be very hard to double check.
Multi-Message Conversations
So far, we have generated a single reply to a message without using any previous messages.
If you want to utilize the chat_history
or conversation_id
id parameters to utilize multi-turn functionality, check out our dedicated documentation on multi-message conversations.
Documents Mode
With the release of retrieval augmented generation (RAG), it's possible to feed the model context to ground its replies. Large language models are often quite good at generating sensible output on their own, but they're well-known to hallucinate factually incorrect, nonsensical, or incomplete information in their replies, which can be problematic for certain use cases.
RAG substantially reduces this problem by giving the model source material to work with. Rather than simply generating an output based on the input prompt, the model can pull information out of this material and incorporate it into its reply.
You can read more about how this works in "Documents and Citations."
Connectors mode
Finally, if you want to point the model at the sources it should use rather than specifying your own, you can do that through connector mode.
Here’s an example:
{
"message": "What are the tallest living penguins?",
"connectors": [{"id": "web-search"}],
"prompt_truncation":"AUTO"
}
And here’s what the output looks like:
{
"response_id": "a29d7080-11e5-43f6-bbb6-9bc3c187eed7",
"text": "The tallest living penguin species is the emperor penguin, which can reach a height of 100 cm (39 in) and weigh between 22 and 45 kg (49 to 99 lb).",
"generation_id": "1c60cb38-f92f-4054-b37d-566601de7e2e",
"token_count": {
"prompt_tokens": 1257,
"response_tokens": 38,
"total_tokens": 1295,
"billed_tokens": 44
},
"meta": {
"api_version": {
"version": "2022-12-06"
}
},
"citations": [
{
"start": 42,
"end": 57,
"text": "emperor penguin",
"document_ids": [
"web-search_1",
"web-search_8"
]
},
{
"start": 87,
"end": 101,
"text": "100 cm (39 in)",
"document_ids": [
"web-search_1"
]
},
{
"start": 120,
"end": 146,
"text": "22 and 45 kg (49 to 99 lb)",
"document_ids": [
"web-search_1",
"web-search_8"
]
}
],
"documents": [
{
"id": "web-search_1",
"title": "Emperor penguin - Wikipedia",
"snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb).",
"url": "https://en.wikipedia.org/wiki/Emperor_penguin"
},
{
"id": "web-search_8",
"title": "The largest penguin that ever lived",
"snippet": "They concluded that the largest flipper bones belong to a penguin that tipped the scales at an astounding 154 kg. In comparison, emperor penguins, the tallest and heaviest of all living penguins, typically weigh between 22 and 45 kg.",
"url": "https://www.cam.ac.uk/stories/giant-penguin"
},
{
"id": "web-search_1",
"title": "Emperor penguin - Wikipedia",
"snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb).",
"url": "https://en.wikipedia.org/wiki/Emperor_penguin"
},
{
"id": "web-search_1",
"title": "Emperor penguin - Wikipedia",
"snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb).",
"url": "https://en.wikipedia.org/wiki/Emperor_penguin"
},
{
"id": "web-search_8",
"title": "The largest penguin that ever lived",
"snippet": "They concluded that the largest flipper bones belong to a penguin that tipped the scales at an astounding 154 kg. In comparison, emperor penguins, the tallest and heaviest of all living penguins, typically weigh between 22 and 45 kg.",
"url": "https://www.cam.ac.uk/stories/giant-penguin"
}
],
"search_results": [
{
"search_query": {
"text": "tallest living penguins",
"generation_id": "12eda337-f096-404f-9ba9-905076304934"
},
"document_ids": [
"web-search_0",
"web-search_1",
"web-search_2",
"web-search_3",
"web-search_4",
"web-search_5",
"web-search_6",
"web-search_7",
"web-search_8",
"web-search_9"
],
"connector": {
"id": "web-search"
}
}
],
"search_queries": [
{
"text": "tallest living penguins",
"generation_id": "12eda337-f096-404f-9ba9-905076304934"
}
]
}
(NOTE: In this example, we’ve modified the query slightly to say “living” penguins, because “What are the tallest penguins?” returns a great deal of information about a long-extinct penguin species that was nearly seven feet tall.)
As you can see, we've told the model to use the “web-search”
connector to find out which breed of penguin is tallest, rather than pass in source material through the documents
parameter. If you’re wondering how this works under the hood, we have more information in the next section.
You can experiment with this feature in the chat playground. Here is a screenshot of what that looks like:
As with document mode, when the chat endpoint generates a response using a connector it will include a citations
object in its output. If the model is unable to find anything suitable, however, no such citation
object will appear in the output.
A Note on Connectors
Connectors allow Coral users to initiate a search of a third-party application containing textual data, such as the internet, a document database, etc. The application will send relevant information back to Coral, and Coral will use it to generate a grounded response. Cohere supports the “web-search” connector which runs searches against a browser in safe mode, and you can create and deploy your own custom Connectors for services such as Google Drive, Confluence etc.
Tool use
Single-step and multi-step tool use are extensions of this idea. Both allow you to create dynamic, powerful workflows by giving underlying models access to databases, internet search, and much more. Check out the linked documents for additional information.
Streaming Mode
All the methods of interacting with Chat discussed above -- including talking to it directly, using document mode, and using connector mode -- can also be used to stream responses. This is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated.
All that's required to do this is to set stream
equal to True
(it's set to False
by default). In the next few sections we'll include code snippets and examples of what it looks like to interact with co.chat()
in streaming mode.
Streaming Responses from the Chat Endpoint.
Here, we're simply asking the model about penguins and streaming the reply. Note that we're importing StreamEvent'
, that stream=True
, and that we're using if
and elif
clauses to respond to different states that StreamEvent
can be in:
import cohere
from cohere.responses.chat import StreamEvent
co = cohere.Client("<YOUR API KEY>")
for event in co.chat("What are the tallest living penguins?", stream=True):
if event.event_type == StreamEvent.TEXT_GENERATION:
print(event.text)
elif event.event_type == StreamEvent.STREAM_END:
print(event.finish_reason)
Here's what the response looks like:
The
tallest
living
penguins
are
emperor
penguins
(
A
pt
en
ody
tes
for
ster
i
).
On
average
,
adult
emperor
penguins
stand
at
about
115
cm
(
45
inches
)
...
Would
you
like
to
know
more
about
any
of
these
penguin
species
?
COMPLETE
The output has been truncated for readability, but you can see that the model streams one token after another until it hits COMPLETE
.
Streaming Responses in Document Mode.
In document mode, we pass in sources for the model to use in formulating it's reply. Here's what that looks like:
import cohere
from cohere.responses.chat import StreamEvent
co = cohere.Client("<YOUR API KEY>")
documents = [
{
"title": "Tall penguins",
"snippet": "Emperor penguins are the tallest."
},
{
"title": "Penguin habitats",
"snippet": "Emperor penguins only live in Antarctica."
},
{
"title": "What are animals?",
"snippet": "Animals are different from plants."
}
]
for event in co.chat_stream(message="What are the tallest living penguins?", documents=documents, prompt_truncation="AUTO"):
if event.event_type == "text-generation":
print(event.text)
elif event.event_type == "citation-generation":
print(event.citations)
elif event.event_type == "stream-end":
print(event.finish_reason)
Here's what the response looks like:
The
tallest
living
penguins
in
the
world
are
Emperor
penguins
,
which
can
reach
heights
of
approximately
115
cm
(
45
.
3
inches
)
tall
.
Interestingly
,
they
are
only
found
in
Antarctica
.
[{'start': 45, 'end': 61, 'text': 'Emperor penguins', 'document_ids': ['doc_0']}]
[{'start': 104, 'end': 130, 'text': '115 cm (45.3 inches) tall.', 'document_ids': ['doc_0']}]
[{'start': 169, 'end': 180, 'text': 'Antarctica.', 'document_ids': ['doc_1']}]
COMPLETE
Note that the citation objects appear at the end, just before the stream completes. If you're not sure what these mean or how to read them, check out the "Document Mode" section above, which furnishes additional context.
Streaming Responses in Connector Mode.
Finally, we can also stream responses from the model when it's operating in connector mode. Here's what the code looks like:
import cohere
from cohere.responses.chat import StreamEvent
co = cohere.Client("<YOUR API KEY>")
for event in co.chat("What are the tallest living penguins?", stream=True, connectors=[{"id": "web-search"}], prompt_truncation="AUTO"):
if event.event_type == StreamEvent.TEXT_GENERATION:
print(event.text)
elif event.event_type == StreamEvent.CITATION_GENERATION:
print(event.citations)
elif event.event_type == StreamEvent.STREAM_END:
print(event.finish_reason)
And here's what the response looks like:
The
tallest
living
penguins
are
the
males
of
the
Emperor
penguin
species
,
who
can
stand
up
to
1
.
3
meters
(
4
feet
3
inches
)
tall
and
weigh
as
much
as
45
kilograms
(
99
pounds
).
This
species
of
penguin
is
native
to
Antarctica
.
While
they
are
the
tallest
living
penguins
,
they
would
be
dwar
fed
...
[{'start': 36, 'end': 72, 'text': 'males of the Emperor penguin species', 'document_ids': ['web-search_6:0']}]
[{'start': 94, 'end': 127, 'text': '1.3 meters (4 feet 3 inches) tall', 'document_ids': ['web-search_6:0']}]
[{'start': 149, 'end': 173, 'text': '45 kilograms (99 pounds)', 'document_ids': ['web-search_0:0', 'web-search_6:0']}]
[{'start': 212, 'end': 223, 'text': 'Antarctica.', 'document_ids': ['web-search_0:0', 'web-search_6:0']}]
[{'start': 282, 'end': 310, 'text': 'dwarfed by the Mega Penguins', 'document_ids': ['web-search_9:0']}]
[{'start': 364, 'end': 388, 'text': 'Palaeeudyptes klekowskii', 'document_ids': ['web-search_9:4']}]
[{'start': 408, 'end': 424, 'text': 'Colossus Penguin', 'document_ids': ['web-search_9:4']}]
[{'start': 448, 'end': 461, 'text': '115 kilograms', 'document_ids': ['web-search_9:4']}]
[{'start': 475, 'end': 489, 'text': '2 meters tall.', 'document_ids': ['web-search_9:4']}]
COMPLETE
Next Steps
Check the Chat API reference and start building your own products! You can also read the retrieval augmented generation (RAG) documentation for more context.
Speaking of RAG, we've also released "Toolkit," a collection of pre-built front-end and back-end components enabling users to quickly build and deploy RAG applications. Check out the documentation for more details.
Updated 5 days ago