Documents and Citations

Suggest Edits

With the release of retrieval augmented generation (RAG), it's possible to feed the model context to ground its replies. Large language models are often quite good at generating sensible output on their own, but they're well-known to hallucinate factually incorrect, nonsensical, or incomplete information in their replies, which can be problematic for certain use cases.

RAG substantially reduces this problem by giving the model source material to work with. Rather than simply generating an output based on the input prompt, the model can pull information out of this material and incorporate it into its reply.

There are a few ways to do this (which you can read more about in the RAG-specific documentation linked above), but here, we'll confine our discussion to documents mode.

Document mode involves users providing the model with their own documents directly in the message, which it can use to ground its replies.

Here's an example of interacting with document mode via the Postman API service. We're asking the co.chat() about penguins, and uploading documents for it to use:

{
  "message": "Where do the tallest penguins live?",
  "documents": [
    {
      "title": "Tall penguins",
      "snippet": "Emperor penguins are the tallest."
    },
    {
      "title": "Penguin habitats",
      "snippet": "Emperor penguins only live in Antarctica."
    },
    {
      "title": "What are animals?",
      "snippet": "Animals are different from plants."
    }
  ],
  "prompt_truncation": "AUTO"
}

Here's an example reply:

{  
    "response_id": "ea9eaeb0-073c-42f4-9251-9ecef5b189ef",  
    "text": "The tallest penguins, Emperor penguins, live in Antarctica.",  
    "generation_id": "1b5565da-733e-4c14-9ff5-88d18a26da96",  
    "token_count": {  
        "prompt_tokens": 445,  
        "response_tokens": 13,  
        "total_tokens": 458,  
        "billed_tokens": 20  
    },  
    "meta": {  
        "api_version": {  
            "version": "2022-12-06"  
        }  
    },  
    "citations": [  
        {  
            "start": 22,  
            "end": 38,  
            "text": "Emperor penguins",  
            "document_ids": [  
                "doc_0"  
            ]  
        },  
        {  
            "start": 48,  
            "end": 59,  
            "text": "Antarctica.",  
            "document_ids": [  
                "doc_1"  
            ]  
        }  
    ],  
    "documents": [  
        {  
            "id": "doc_0",  
            "title": "Tall penguins",  
            "snippet": "Emperor penguins are the tallest.",  
            "url": ""  
        },  
        {  
            "id": "doc_1",  
            "title": "Penguin habitats",  
            "snippet": "Emperor penguins only live in Antarctica.",  
            "url": ""  
        }  
    ],  
    "search_queries": []  
}

Observe that the payload includes a list of documents with a “snippet” field containing the information we want the model to use. The recommended length for the snippet of each document is relatively short, 300 words or less. We recommend using field names similar to the ones we’ve included in this example (i.e. “title” and “snippet” ), but RAG is quite flexible with respect to how you structure the documents. You can give the fields any names you want, and can pass in other fields as well, such as a “date” field. All field names and field values are passed to the model.

Also, we can clearly see that it has utilized the document. Our first document says that Emperor penguins are the tallest penguin species, and our second says that Emperor penguins can only be found in Antarctica. The model’s reply successfully synthesizes both of these facts: "The tallest penguins, Emperor penguins, live in Antarctica."

Finally, note that the output contains a citations object that tells us not only which documents the model relied upon (with the "text" and “document_ids" fields), but also the particular part of the claim supported by a particular document (with the “start” and “end” fields, which are spans that tell us the location of the supported claim inside the reply). This citation object is included because the model was able to use the documents provided, but if it hadn’t been able to do so, no citation object would be present.

You can experiment with RAG in the chat playground.

Updated about 1 month ago