Streaming Responses

The Chat API is capable of streaming events (such as text generation) as they come. This means that partial results from the model can be displayed within moments, even if the full generation takes longer.

You're likely already familiar with streaming. When you ask Coral a question, it doesn't output a single block of text, it streams the text out a few words at a time. In many user interfaces enabling streaming improves the user experience by lowering the perceived latency.

Example

import cohere

co = cohere.Client('<YOUR API KEY>')

for event in co.chat_stream(message="What is an LLM?"):
    if event.event_type == "text-generation":
      print(event.text)
    elif event.event_type == "stream-end":
      print(event.finish_reason)

Stream Events

When streaming is enabled, the API sends events down one by one. Each event has an event_type. Events of different types need to be handled correctly.

Basic Stream Events

stream-start

The first event in the stream contains metadata for the request such as the generation_id. Only one stream-start event will be emitted.

stream-end

A stream-end event is the final event of the stream, and is returned only when streaming is finished. This event contains aggregated data from all the other events such as the complete text, as well as a finish_reason for why the stream ended (i.e. because of it was finished or there was an error).

Only one stream-end event will be returned.

text-generation

A text-generation event is emitted whenever the next chunk of text comes back from the model. As the model continues generating text, multiple events of this type will be will be emitted.

Retrieval Augmented Generation Stream Events

These events are generated when using the API with various RAG parameters.

search-queries-generation

Emitted when search queries are generated by the model. Only happens when the Chat API is used with the search_queries_only or connectors parameters .

search-results

Emitted when the specified connectors respond with search results. Only one event of this type will be returned for a given stream.

citation-generation

This event contains streamed citations and references to the documents being cited (if citations have been generated by the model). Multiple citation-generation events will be returned.

For an illustration of a generated citation with document-specific indices, look at the "Example Response" below. As you can see, each document has an id, and when that document is used as part of the response, it's cited by that id.

Getting Started

Install the SDK

If you haven't already, you'll need to install Cohere's SDK:

pip install cohere

Using Streaming

You can use the stream parameter to toggle token streaming in any generative model. The following example uses the co.chat endpoint.

Example Responses

While Streaming

Below, we have a json object which shows the full output you might see during a streaming session:


{
    "is_finished": false,
    "event_type": "stream-start",
    "generation_id": "6789661c-731c-4d83-b0fe-8926f6194811"
}
{
    "is_finished": false,
    "event_type": "search-queries-generation",
    "search_queries": [
        {
            "text": "What is the tallest penguin in the world?",
            "generation_id": "2c45db14-85f7-4714-b6c5-19cc9f026165"
        }
    ]
}
{
    "is_finished": false,
    "event_type": "search-results",
    "search_results": [
        {
            "search_query": {
                "text": "What is the tallest penguin in the world?",
                "generation_id": "2c45db14-85f7-4714-b6c5-19cc9f026165"
            },
            "document_ids": [
                "web-search_0",
                "web-search_1",
            ],
            "connector": {
                "id": "web-search"
            }
        }
    ],
    "documents": [
        {
            "id": "web-search_0",
            "snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb). Feathers of the head and back are black and sharply delineated from the white belly, pale-yellow breast and bright-yellow ear patches.\n\nLike all penguins, it is flightless, with a streamlined body, and wings stiffened and flattened into flippers for a marine habitat. Its diet consists primarily of fish, but also includes crustaceans, such as krill, and cephalopods, such as squid.",
            "title": "Emperor penguin - Wikipedia",
            "url": "https://en.wikipedia.org/wiki/Emperor_penguin"
        },
        {
            "id": "web-search_1",
            "snippet": "King penguins can weigh up to 40 pounds, growing to be 33 to 37 inches tall.\n\nThey are also able swimmers, diving to depths of over 200 feet in search of squid and small fish, which are their main food sources.\n\nKing penguins lay unique pear-shaped eggs, which they incubate in a pooch and carry around with their legs.\n\n3. Gentoo Penguin\n\nThe Gentoo Penguin is the world’s third-largest penguin. While its average height is 31 inches, it can grow to a maximum height of 35 inches.\n\nView this post on Instagram\n\nA post shared by Ricardo Peralta Ayala (@ricardo_peralta_ayala)\n\nGentoo Penguins have a white stripe across their black head, making them easily distinguishable from other penguin species.",
            "title": "A Ranking of the 10 Biggest Penguin Species - American Oceans",
            "url": "https://www.americanoceans.org/facts/the-largest-penguins-ranked-by-size/"
        }
    ]
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": "The"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " tallest"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " penguin"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " in"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " the"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " world"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " is"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " the"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " Emperor"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " Penguin"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": "."
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " They"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " have"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " an"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " average"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " height"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " of"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " 45"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " inches"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " and"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " weigh"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " up"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " to"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " 100"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": " pounds"
}
{
    "is_finished": false,
    "event_type": "text-generation",
    "text": "."
}
{
    "is_finished": false,
    "event_type": "citation-generation",
    "citations": [
        {
            "start": 40,
            "end": 56,
            "text": "Emperor Penguin.",
            "document_ids": [
                "web-search_0",
                "web-search_1",
            ]
        }
    ]
}
{
    "is_finished": true,
    "event_type": "stream-end",
    "response": {
        "response_id": "75f0d364-5086-4d38-8afd-b67d19e06bf1",
        "text": "The tallest penguin in the world is the Emperor Penguin. They have an average height of 45 inches and weigh up to 100 pounds.",
        "generation_id": "6789661c-731c-4d83-b0fe-8926f6194811",
        "token_count": {
            "prompt_tokens": 2821,
            "response_tokens": 29,
            "total_tokens": 2850,
            "billed_tokens": 37
        },
        "citations": [
            {
                "start": 40,
                "end": 56,
                "text": "Emperor Penguin.",
                "document_ids": [
                    "web-search_0",
                    "web-search_1",
                ]
            },
        ],
        "documents": [
        {
            "id": "web-search_0",
            "snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb). Feathers of the head and back are black and sharply delineated from the white belly, pale-yellow breast and bright-yellow ear patches.\n\nLike all penguins, it is flightless, with a streamlined body, and wings stiffened and flattened into flippers for a marine habitat. Its diet consists primarily of fish, but also includes crustaceans, such as krill, and cephalopods, such as squid.",
            "title": "Emperor penguin - Wikipedia",
            "url": "https://en.wikipedia.org/wiki/Emperor_penguin"
        },
        {
            "id": "web-search_1",
            "snippet": "King penguins can weigh up to 40 pounds, growing to be 33 to 37 inches tall.\n\nThey are also able swimmers, diving to depths of over 200 feet in search of squid and small fish, which are their main food sources.\n\nKing penguins lay unique pear-shaped eggs, which they incubate in a pooch and carry around with their legs.\n\n3. Gentoo Penguin\n\nThe Gentoo Penguin is the world’s third-largest penguin. While its average height is 31 inches, it can grow to a maximum height of 35 inches.\n\nView this post on Instagram\n\nA post shared by Ricardo Peralta Ayala (@ricardo_peralta_ayala)\n\nGentoo Penguins have a white stripe across their black head, making them easily distinguishable from other penguin species.",
            "title": "A Ranking of the 10 Biggest Penguin Species - American Oceans",
            "url": "https://www.americanoceans.org/facts/the-largest-penguins-ranked-by-size/"
        }
        ],
        "search_results": [
            {
                "search_query": {
                    "text": "What is the tallest penguin in the world?",
                    "generation_id": "2c45db14-85f7-4714-b6c5-19cc9f026165"
                },
                "document_ids": [
                    "web-search_0",
                    "web-search_1",
                ],
                "connector": {
                    "id": "web-search"
                }
            }
        ],
        "search_queries": [
            {
                "text": "What is the tallest penguin in the world?",
                "generation_id": "2c45db14-85f7-4714-b6c5-19cc9f026165"
            }
        ]
    },
    "finish_reason": "COMPLETE"

It contains information about whether the streaming session is finished, what type of event is being fired, and the text that was generated by the model.

Of course, the print(event.text) and print(event.finish_reason) lines in the code snippet above peels a lot of the extra information away, so what your output would look more like this:

The
 tallest
 living
 penguins
 in
 the
 world
 are
 Emperor
 penguins
,
 which
 can
 reach
 heights
 of
 approximately
 115
 cm
 (
45
.
3
 inches
)
 tall
.
 Interestingly
,
 they
 are
 only
 found
 in
 Antarctica
.
[{'start': 45, 'end': 61, 'text': 'Emperor penguins', 'document_ids': ['doc_0']}]
[{'start': 104, 'end': 130, 'text': '115 cm (45.3 inches) tall.', 'document_ids': ['doc_0']}]
[{'start': 169, 'end': 180, 'text': 'Antarctica.', 'document_ids': ['doc_1']}]
COMPLETE

It should be (more or less) the same text, but that text is on its own rather than being accompanied by search queries, event types, etc.

Note that the citation objects appear at the end, just before the stream completes. If you're not sure what these mean or how to read them, check out the "Document Mode" section in the linked doc, which furnishes additional context.

When the model has finished generating, it returns the full text, some metadata, citations, and the documents that were used to ground the reply.