Streaming Responses
The Chat API is capable of streaming events (such as text generation) as they come. This means that partial results from the model can be displayed within moments, even if the full generation takes longer.
You're likely already familiar with streaming. When you ask Coral a question, it doesn't output a single block of text, it streams the text out a few words at a time. In many user interfaces enabling streaming improves the user experience by lowering the perceived latency.
Example
import cohere
co = cohere.Client('<YOUR API KEY>')
for event in co.chat_stream(message="What is an LLM?"):
if event.event_type == "text-generation":
print(event.text)
elif event.event_type == "stream-end":
print(event.finish_reason)
Stream Events
When streaming is enabled, the API sends events down one by one. Each event has an event_type. Events of different types need to be handled correctly.
Basic Stream Events
stream-start
The first event in the stream contains metadata for the request such as the generation_id
. Only one stream-start
event will be emitted.
stream-end
A stream-end
event is the final event of the stream, and is returned only when streaming is finished. This event contains aggregated data from all the other events such as the complete text
, as well as a finish_reason
for why the stream ended (i.e. because of it was finished or there was an error).
Only one stream-end
event will be returned.
text-generation
A text-generation
event is emitted whenever the next chunk of text comes back from the model. As the model continues generating text, multiple events of this type will be will be emitted.
Retrieval Augmented Generation Stream Events
These events are generated when using the API with various RAG parameters.
search-queries-generation
Emitted when search queries are generated by the model. Only happens when the Chat API is used with the search_queries_only
or connectors
parameters .
search-results
Emitted when the specified connectors
respond with search results. Only one event of this type will be returned for a given stream.
citation-generation
This event contains streamed citations and references to the documents being cited (if citations have been generated by the model). Multiple citation-generation
events will be returned.
For an illustration of a generated citation with document-specific indices, look at the "Example Response" below. As you can see, each document
has an id
, and when that document is used as part of the response, it's cited by that id.
Getting Started
Install the SDK
If you haven't already, you'll need to install Cohere's SDK:
pip install cohere
Using Streaming
You can use the stream
parameter to toggle token streaming in any generative model. The following example uses the co.chat
endpoint.
Example Responses
While Streaming
Below, we have a json
object which shows the full output you might see during a streaming session:
{
"is_finished": false,
"event_type": "stream-start",
"generation_id": "6789661c-731c-4d83-b0fe-8926f6194811"
}
{
"is_finished": false,
"event_type": "search-queries-generation",
"search_queries": [
{
"text": "What is the tallest penguin in the world?",
"generation_id": "2c45db14-85f7-4714-b6c5-19cc9f026165"
}
]
}
{
"is_finished": false,
"event_type": "search-results",
"search_results": [
{
"search_query": {
"text": "What is the tallest penguin in the world?",
"generation_id": "2c45db14-85f7-4714-b6c5-19cc9f026165"
},
"document_ids": [
"web-search_0",
"web-search_1",
],
"connector": {
"id": "web-search"
}
}
],
"documents": [
{
"id": "web-search_0",
"snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb). Feathers of the head and back are black and sharply delineated from the white belly, pale-yellow breast and bright-yellow ear patches.\n\nLike all penguins, it is flightless, with a streamlined body, and wings stiffened and flattened into flippers for a marine habitat. Its diet consists primarily of fish, but also includes crustaceans, such as krill, and cephalopods, such as squid.",
"title": "Emperor penguin - Wikipedia",
"url": "https://en.wikipedia.org/wiki/Emperor_penguin"
},
{
"id": "web-search_1",
"snippet": "King penguins can weigh up to 40 pounds, growing to be 33 to 37 inches tall.\n\nThey are also able swimmers, diving to depths of over 200 feet in search of squid and small fish, which are their main food sources.\n\nKing penguins lay unique pear-shaped eggs, which they incubate in a pooch and carry around with their legs.\n\n3. Gentoo Penguin\n\nThe Gentoo Penguin is the world’s third-largest penguin. While its average height is 31 inches, it can grow to a maximum height of 35 inches.\n\nView this post on Instagram\n\nA post shared by Ricardo Peralta Ayala (@ricardo_peralta_ayala)\n\nGentoo Penguins have a white stripe across their black head, making them easily distinguishable from other penguin species.",
"title": "A Ranking of the 10 Biggest Penguin Species - American Oceans",
"url": "https://www.americanoceans.org/facts/the-largest-penguins-ranked-by-size/"
}
]
}
{
"is_finished": false,
"event_type": "text-generation",
"text": "The"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " tallest"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " penguin"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " in"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " the"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " world"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " is"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " the"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " Emperor"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " Penguin"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": "."
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " They"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " have"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " an"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " average"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " height"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " of"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " 45"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " inches"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " and"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " weigh"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " up"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " to"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " 100"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": " pounds"
}
{
"is_finished": false,
"event_type": "text-generation",
"text": "."
}
{
"is_finished": false,
"event_type": "citation-generation",
"citations": [
{
"start": 40,
"end": 56,
"text": "Emperor Penguin.",
"document_ids": [
"web-search_0",
"web-search_1",
]
}
]
}
{
"is_finished": true,
"event_type": "stream-end",
"response": {
"response_id": "75f0d364-5086-4d38-8afd-b67d19e06bf1",
"text": "The tallest penguin in the world is the Emperor Penguin. They have an average height of 45 inches and weigh up to 100 pounds.",
"generation_id": "6789661c-731c-4d83-b0fe-8926f6194811",
"token_count": {
"prompt_tokens": 2821,
"response_tokens": 29,
"total_tokens": 2850,
"billed_tokens": 37
},
"citations": [
{
"start": 40,
"end": 56,
"text": "Emperor Penguin.",
"document_ids": [
"web-search_0",
"web-search_1",
]
},
],
"documents": [
{
"id": "web-search_0",
"snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb). Feathers of the head and back are black and sharply delineated from the white belly, pale-yellow breast and bright-yellow ear patches.\n\nLike all penguins, it is flightless, with a streamlined body, and wings stiffened and flattened into flippers for a marine habitat. Its diet consists primarily of fish, but also includes crustaceans, such as krill, and cephalopods, such as squid.",
"title": "Emperor penguin - Wikipedia",
"url": "https://en.wikipedia.org/wiki/Emperor_penguin"
},
{
"id": "web-search_1",
"snippet": "King penguins can weigh up to 40 pounds, growing to be 33 to 37 inches tall.\n\nThey are also able swimmers, diving to depths of over 200 feet in search of squid and small fish, which are their main food sources.\n\nKing penguins lay unique pear-shaped eggs, which they incubate in a pooch and carry around with their legs.\n\n3. Gentoo Penguin\n\nThe Gentoo Penguin is the world’s third-largest penguin. While its average height is 31 inches, it can grow to a maximum height of 35 inches.\n\nView this post on Instagram\n\nA post shared by Ricardo Peralta Ayala (@ricardo_peralta_ayala)\n\nGentoo Penguins have a white stripe across their black head, making them easily distinguishable from other penguin species.",
"title": "A Ranking of the 10 Biggest Penguin Species - American Oceans",
"url": "https://www.americanoceans.org/facts/the-largest-penguins-ranked-by-size/"
}
],
"search_results": [
{
"search_query": {
"text": "What is the tallest penguin in the world?",
"generation_id": "2c45db14-85f7-4714-b6c5-19cc9f026165"
},
"document_ids": [
"web-search_0",
"web-search_1",
],
"connector": {
"id": "web-search"
}
}
],
"search_queries": [
{
"text": "What is the tallest penguin in the world?",
"generation_id": "2c45db14-85f7-4714-b6c5-19cc9f026165"
}
]
},
"finish_reason": "COMPLETE"
It contains information about whether the streaming session is finished, what type of event is being fired, and the text that was generated by the model.
Of course, the print(event.text)
and print(event.finish_reason)
lines in the code snippet above peels a lot of the extra information away, so what your output would look more like this:
The
tallest
living
penguins
in
the
world
are
Emperor
penguins
,
which
can
reach
heights
of
approximately
115
cm
(
45
.
3
inches
)
tall
.
Interestingly
,
they
are
only
found
in
Antarctica
.
[{'start': 45, 'end': 61, 'text': 'Emperor penguins', 'document_ids': ['doc_0']}]
[{'start': 104, 'end': 130, 'text': '115 cm (45.3 inches) tall.', 'document_ids': ['doc_0']}]
[{'start': 169, 'end': 180, 'text': 'Antarctica.', 'document_ids': ['doc_1']}]
COMPLETE
It should be (more or less) the same text, but that text is on its own rather than being accompanied by search queries, event types, etc.
Note that the citation objects appear at the end, just before the stream completes. If you're not sure what these mean or how to read them, check out the "Document Mode" section in the linked doc, which furnishes additional context.
When the model has finished generating, it returns the full text, some metadata, citations, and the documents that were used to ground the reply.
Updated 2 days ago