Cohere on Azure

In an effort to make our language-model capabilities more widely available, we've partnered with a few major platforms to create hosted versions of our offerings.

In this article, you learn how to use Azure AI Studio to deploy both the Cohere Command models and the Cohere Embed models on Microsoft's Azure cloud computing platform.

The following four models are available through Azure AI Studio with pay-as-you-go, token-based billing:

  • Command R
  • Command R+
  • Embed v3 - English
  • Embed v3 - Multilingual

Prerequisites

Whether you're using Command or Embed, the initial set up is the same. You'll need:

  • An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a paid Azure account to begin.
  • An Azure AI hub resource. Note: for Cohere models, the pay-as-you-go deployment offering is only available with AI hubs created in the EastUS, EastUS2 or Sweden Central regions.
  • An Azure AI project in Azure AI Studio.
  • Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the required steps, your user account must be assigned the Azure AI Developer role on the resource group. For more information on permissions, see Role-based access control in Azure AI Studio.

For Command- or Embed-based workflows, you'll also need to create a deployment and consume the model. Here are links for more information:

Text Generation

We expose two routes for Command R and Command R+ inference:

  • v1/chat/completions adheres to the Azure AI Generative Messages API schema;
  • v1/chat supports Cohere's native API schema.

You can find more information about Azure's API here.

Here's a code snippet demonstrating how to programmatically interact with a Cohere model on Azure:

import urllib.request
import json

# Configure payload data sending to API endpoint
data = {
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is good about Wuhan?"},
    ],
    "max_tokens": 500,
    "temperature": 0.3,
    "stream": "True",
}

body = str.encode(json.dumps(data))

# Replace the url with your API endpoint
url = "https://your-endpoint.inference.ai.azure.com/v1/chat/completions"

# Replace this with the key for the endpoint
api_key = "your-auth-key"
if not api_key:
    raise Exception("API Key is missing")

headers = {"Content-Type": "application/json", "Authorization": (api_key)}

req = urllib.request.Request(url, body, headers)

try:
    response = urllib.request.urlopen(req)
    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))
    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(error.read().decode("utf8", "ignore"))

You can find more code snippets, including examples of how to stream responses, in this notebook.

Though this section is called "Text Generation", it's worth pointing out that these models are capable of much more. Specifically, you can use Azure-hosted Cohere models for both retrieval augmented generation and multi-step tool use. Check the linked pages for much more information.

Embeddings

We expose two routes for Embed v3 - English and Embed v3 - Multilingual inference:

  • v1/embeddings adheres to the Azure AI Generative Messages API schema;
  • v1/embed supports Cohere's native API schema.

You can find more information about Azure's API here.

import urllib.request
import json

# Configure payload data sending to API endpoint
data = {
    "input": ["hi"]
}

body = str.encode(json.dumps(data))

# Replace the url with your API endpoint
url = "https://your-endpoint.inference.ai.azure.com/v1/embedding"

# Replace this with the key for the endpoint
api_key = "your-auth-key"
if not api_key:
    raise Exception("API Key is missing")

headers = {"Content-Type": "application/json", "Authorization": (api_key)}

req = urllib.request.Request(url, body, headers)

try:
    response = urllib.request.urlopen(req)
    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))
    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(error.read().decode("utf8", "ignore"))

A Note on SDKs

You should be aware that it's possible to use the cohere SDK client to consume Azure AI deployments. Here are example notes for Command and Embed.

The important thing to understand is that our new and existing customers can call the models from Azure while still leveraging their integration with the Cohere SDK.