Single Container on Private Clouds
This document walks through how to pull Cohere's container images using a license, and provides steps for testing both Docker and Kubernetes images.
Before starting, ensure you have a license and image tag provided by Cohere.
Pull Container Images with A License
Cohere provides access to container images through a registry authenticated with a license. Users can pull these images and replicate them in their environment, as needed, to avoid runtime network access from inside the cluster.
Images will come through the proxy.replicated.com
registry. Pulling the images will require firewall access open to proxy.replicated.com
and proxy-auth.replicated.com
. More information on these endpoints may be found here.
To test pulling images with a license, modify your docker CLI configuration to include authentication details for the registry. Note: docker login
will not work.
The docker CLI is only an example; any tool which can pull images with credentials will work with the license ID configured as both username and password. Skopeo is another popular tool for copying images between registries which will work with this flow.
The following commands will overwrite your existing docker CLI configuration with authentication details for Cohere’s registry. If preferred, you can manually add the authentication details to preserve your existing configuration.
LICENSE_ID="<YOUR LICENSE ID>"
cat <<EOF > ~/.docker/config.json
{
"auths": {
"proxy.replicated.com": {
"auth": "$(echo -n "${LICENSE_ID}:${LICENSE_ID}" | base64 | tr -d '\n')"
}
}
}
EOF
Validate that the authenticated image pull works correctly using the docker CLI:
CUSTOMER_TAG=image_tag_from_cohere # provided by Cohere
docker pull $CUSTOMER_TAG
You can now re-tag and replicate this image anywhere you want, using workflows appropriate to your air-gapped environment.
Validate Workload Infrastructure
Once you can pull the image from the registry, run a test workload to validate the container's functionality.
Docker/Containerd
To test the container image with Docker, you should have a machine with the following installed:
- Nvidia drivers installed on host (the latest tested version is 545).
- nvidia-container-toolkit and corresponding configuration for docker/containerd.
Example Usage
Different models have different inputs.
- Embed models expect an array of texts and return the embeddings as output.
- Rerank models expect a list of documents and a query, returning relevance scores for the top
n
results (then
parameter is configurable). - Command models expect a prompt and return the model response.
This section provides simple examples of using each primary Cohere model in a Docker container. Note that if you try these out and get an error like curl: (7) Failed to connect to localhost port 8080: Connection refused
, the container has not yet fully started up. Wait a few more seconds and then try again.
Bash Commands for Running Cohere Models Through Docker
Here are the bash
commands you can run to use the Embed English, Embed Multilingual, Rerank English, Rerank Multilingual, and Command models through Docker.
docker run -d --rm --name embed-english --gpus=1 --net=host $IMAGE_TAG
# wait 5-10 seconds for the container to start
# can check `curl http://localhost:8080/ping` for readiness
curl --header "Content-Type: application/json" --request POST http://localhost:8080/embed --data-raw '{"texts": ["testing embeddings in english"], "input_type": "classification"}'
{"id":"2ffe4bca-8664-4456-b858-1b3b15411f2c","embeddings":[[-0.5019531,-2.0917969,-1.6220703,-1.2919922,-0.80029297,1.3173828,1.4677734,-1.7763672,0.03869629,1.9033203...}
docker stop embed-english
docker run -d --rm --name multilingual--gpus=1 --net=host $IMAGE_TAG
curl --header "Content-Type: application/json" --request POST http://localhost:8080/embed --data-raw '{"texts": ["testing multilingual embeddings"], "input_type": "classification"}'
{"id":"2eab88e7-5906-44e1-9644-01893a70f1e7","texts":["testing multilingual embeddings"],"embeddings":[[-0.022094727,-0.0121154785,0.037628174,-0.0026988983,-0.0129776,0.013305664,0.005458832,-0.03161621,-0.019744873,-0.026290894,0.017333984,-0.02444458,0.01953125...
docker stop multilingual
docker run -d --rm --name rerank-english --gpus=1 --net=host $IMAGE_TAG
curl --header "Content-Type: application/json" --request POST http://localhost:8080/rerank --data-raw '{"documents": [{"text": "orange"},{"text": "Ottawa"},{"text": "Toronto"},{"text": "Ontario"}],"query": "what is the capital of Canada","top_n": 2}'
{"id":"a547bcc5-a243-42dd-8617-d12a7944c164","results":[{"index":1,"relevance_score":0.9734939},{"index":2,"relevance_score":0.73772544}]}
docker stop rerank-english
docker run -d --rm --name rerank-multilingual --gpus=1 --net=host $IMAGE_TAG
curl --header "Content-Type: application/json" --request POST http://localhost:8080/rerank --data-raw '{"documents": [{"text": "orange"},{"text": "Ottawa"},{"text": "Toronto"},{"text": "Ontario"}],"query": "what is the capital of Canada","top_n": 2}'
{"id":"8abeacf2-e657-415c-bab3-ac593e67e8e5","results":[{"index":1,"relevance_score":0.6124835},{"index":2,"relevance_score":0.5305253}],"meta":{"api_version":{"version":"2022-12-06"},"billed_units":{"search_units":1}}}
docker stop rerank-multilingual
docker run -d --rm --name command --gpus=4 --net=host $IMAGE_TAG
curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{"query":"Docker is good because"}'
{
"response_id": "dc182f8d-2db1-4b13-806c-e1bcea17f864",
"text": "Docker is a powerful tool for developing,..."
...
}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{
"chat_history": [
{"role": "USER", "message": "Who discovered gravity?"},
{"role": "CHATBOT", "message": "The man who is widely credited with discovering gravity is Sir Isaac Newton"}
],
"message": "What year was he born?"
}'
{
"response_id": "7938d788-f800-4f9b-a12c-72a96b76a6d6",
"text": "Sir Isaac Newton was born in Woolsthorpe, England, on January 4, 1643. He was an English physicist, mathematician, astronomer, and natural philosopher who is widely recognized as one of the most...",
...
}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{
"message": "tell me about penguins",
"return_chatlog": true,
"documents": [
{
"title": "Tall penguins",
"snippet": "Emperor penguins are the tallest",
"url": "http://example.com/foo"
},
{
"title": "Tall penguins",
"snippet": "Baby penguins are the tallest",
"url": "https://example.com/foo"
}
],
"mode": "augmented_generation"
}'
{
"response_id": "8a9f55f6-26aa-455e-bc4c-3e93d4b0d9e6",
"text": "Penguins are a group of flightless birds that live in the Southern Hemisphere. There are many different types of penguins, including the Emperor penguin, which is the tallest of the penguin species. Baby penguins are also known to be the tallest species of penguin. \n\nWould you like to know more about the different types of penguins?",
"generation_id": "65ef2270-46bb-427d-b54c-2e5f4d7daa90",
"chatlog": "User: tell me about penguins\nChatbot: Penguins are a group of flightless birds that live in the Southern Hemisphere. There are many different types of penguins, including the Emperor penguin, which is the tallest of the penguin species. Baby penguins are also known to be the tallest species of penguin. \n\nWould you like to know more about the different types of penguins? ",
"token_count": {
"prompt_tokens": 435,
"response_tokens": 68,
"total_tokens": 503
},
"meta": {
"api_version": {
"version": "2022-12-06"
},
"billed_units": {
"input_tokens": 4,
"output_tokens": 68
}
},
"citations": [
{
"start": 15,
"end": 40,
"text": "group of flightless birds",
"document_ids": [
"doc_1"
]
},
{
"start": 58,
"end": 78,
"text": "Southern Hemisphere.",
"document_ids": [
"doc_1"
]
},
{
"start": 137,
"end": 152,
"text": "Emperor penguin",
"document_ids": [
"doc_0"
]
},
{
"start": 167,
"end": 174,
"text": "tallest",
"document_ids": [
"doc_0"
]
},
{
"start": 238,
"end": 265,
"text": "tallest species of penguin.",
"document_ids": [
"doc_1"
]
}
],
"documents": [
{
"id": "doc_1",
"snippet": "Baby penguins are the tallest",
"title": "Tall penguins",
"url": "https://example.com/foo"
},
{
"id": "doc_0",
"snippet": "Emperor penguins are the tallest",
"title": "Tall penguins",
"url": "http://example.com/foo"
}
]
}
docker stop command
You'll note that final example includes documents that the Command model can use to ground its replies. This functionality falls under retrieval augmented generation.
Kubernetes
Deploying to Kubernetes requires nodes with the following installed:
- Nvidia drivers - latest tested version is currently 545.
- nvidia-container-toolkit and corresponding configuration for docker/containerd.
- nvidia-device-plugin to make GPUs available to Kubernetes.
To deploy the same image on Kubernetes, we must first convert the docker configuration into an image pull secret (see the Kubernetes documentation for more detail).
kubectl create secret generic cohere-pull-secret \
--from-file=.dockerconfigjson="~/.docker/config.json" \
--type=kubernetes.io/dockerconfigjson
With that done, fill in the environment variables and generate the application manifest:
APP=cohere # or any other name you want to use
IMAGE= <IMAGE_TAG_FROM_COHERE> # replace with the image cohere provided
GPUS=4 # use 4 GPUs for command, 1 is enough for embed / rerank
cat <<EOF > cohere.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ${APP}
name: ${APP}
spec:
replicas: 1
selector:
matchLabels:
app: ${APP}
strategy: {}
template:
metadata:
labels:
app: ${APP}
spec:
imagePullSecrets:
- name: cohere-pull-secret
containers:
- image: ${IMAGE}
name: ${APP}
resources:
limits:
nvidia.com/gpu: ${GPUS}
---
apiVersion: v1
kind: Service
metadata:
labels:
app: ${APP}
name: ${APP}
spec:
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: ${APP}
type: ClusterIP
---
EOF
The manifest above does not account for air-gapped environments
Change this to the registry where you replicated the image previously pulled for an air-gapped deployment. Alternatively, to test in an internet-connected environment, create an image pull secret using the license ID as username/password as in the earlier step for the docker CLI for testing. Keep in mind you will need the firewall rules open mentioned in the image pull steps
Use the following to deploy the containers and run inference requests:
kubectl apply -f cohere.yaml
Be aware that this is a multi-gigabyte image, so it may take some time to download.
Once the pod is up and running, you should expect to see something like the following:
# once the pod is running
kubectl port-forward svc/${APP} 8080:8080
# Forwarding from 127.0.0.1:8080 -> 8080
# Forwarding from [::1]:8080 -> 8080
# Handling connection for 8080
Leave that running in the background, and up a new terminal session to execute a test request. In the next few sections, we'll include examples of appropriate requests for the major Cohere models.
Example Usage
Here are the bash
commands you can run to use the Embed English, Embed Multilingual, Rerank English, Rerank Multilingual, and Command models through Kubernetes.
curl --header "Content-Type: application/json" --request POST http://localhost:8080/embed --data-raw '{"texts": ["testing embeddings in english"], "input_type": "classification"}'
# {"id":"2ffe4bca-8664-4456-b858-1b3b15411f2c","embeddings":[[-0.5019531,-2.0917969,-1.6220703,-1.2919922,-0.80029297,1.3173828,1.4677734,-1.7763672,0.03869629,1.9033203...}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/embed --data-raw '{"texts": ["testing multilingual embeddings"], "input_type": "classification"}'
# {"id":"2eab88e7-5906-44e1-9644-01893a70f1e7","texts":["testing multilingual embeddings"],"embeddings":[[-0.022094727,-0.0121154785,0.037628174,-0.0026988983,-0.0129776,0.013305664,0.005458832,-0.03161621,-0.019744873,-0.026290894,0.017333984,-0.02444458,0.01953125...
curl --header "Content-Type: application/json" --request POST http://localhost:8080/rerank --data-raw '{"documents": [{"text": "orange"},{"text": "Ottawa"},{"text": "Toronto"},{"text": "Ontario"}],"query": "what is the capital of Canada","top_n": 2}'
# {"id":"a547bcc5-a243-42dd-8617-d12a7944c164","results":[{"index":1,"relevance_score":0.9734939},{"index":2,"relevance_score":0.73772544}]}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/rerank --data-raw '{"documents": [{"text": "orange"},{"text": "Ottawa"},{"text": "Toronto"},{"text": "Ontario"}],"query": "what is the capital of Canada","top_n": 2}'
# {"id":"8abeacf2-e657-415c-bab3-ac593e67e8e5","results":[{"index":1,"relevance_score":0.6124835},{"index":2,"relevance_score":0.5305253}],"meta":{"api_version":{"version":"2022-12-06"},"billed_units":{"search_units":1}}}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{"query":"Docker is good because"}'
{
"response_id": "dc182f8d-2db1-4b13-806c-e1bcea17f864",
"text": "Docker is a powerful tool for developing,..."
...
}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{
"chat_history": [
{"role": "USER", "message": "Who discovered gravity?"},
{"role": "CHATBOT", "message": "The man who is widely credited with discovering gravity is Sir Isaac Newton"}
],
"message": "What year was he born?"
}'
{
"response_id": "7938d788-f800-4f9b-a12c-72a96b76a6d6",
"text": "Sir Isaac Newton was born in Woolsthorpe, England, on January 4, 1643. He was an English physicist, mathematician, astronomer, and natural philosopher who is widely recognized as one of the most...",
...
}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{
"message": "tell me about penguins",
"return_chatlog": true,
"documents": [
{
"title": "Tall penguins",
"snippet": "Emperor penguins are the tallest",
"url": "http://example.com/foo"
},
{
"title": "Tall penguins",
"snippet": "Baby penguins are the tallest",
"url": "https://example.com/foo"
}
],
"mode": "augmented_generation"
}'
{
"response_id": "8a9f55f6-26aa-455e-bc4c-3e93d4b0d9e6",
"text": "Penguins are a group of flightless birds that live in the Southern Hemisphere. There are many different types of penguins, including the Emperor penguin, which is the tallest of the penguin species. Baby penguins are also known to be the tallest species of penguin. \n\nWould you like to know more about the different types of penguins?",
"generation_id": "65ef2270-46bb-427d-b54c-2e5f4d7daa90",
"chatlog": "User: tell me about penguins\nChatbot: Penguins are a group of flightless birds that live in the Southern Hemisphere. There are many different types of penguins, including the Emperor penguin, which is the tallest of the penguin species. Baby penguins are also known to be the tallest species of penguin. \n\nWould you like to know more about the different types of penguins? ",
"token_count": {
"prompt_tokens": 435,
"response_tokens": 68,
"total_tokens": 503
},
"meta": {
"api_version": {
"version": "2022-12-06"
},
"billed_units": {
"input_tokens": 4,
"output_tokens": 68
}
},
"citations": [
{
"start": 15,
"end": 40,
"text": "group of flightless birds",
"document_ids": [
"doc_1"
]
},
{
"start": 58,
"end": 78,
"text": "Southern Hemisphere.",
"document_ids": [
"doc_1"
]
},
{
"start": 137,
"end": 152,
"text": "Emperor penguin",
"document_ids": [
"doc_0"
]
},
{
"start": 167,
"end": 174,
"text": "tallest",
"document_ids": [
"doc_0"
]
},
{
"start": 238,
"end": 265,
"text": "tallest species of penguin.",
"document_ids": [
"doc_1"
]
}
],
"documents": [
{
"id": "doc_1",
"snippet": "Baby penguins are the tallest",
"title": "Tall penguins",
"url": "https://example.com/foo"
},
{
"id": "doc_0",
"snippet": "Emperor penguins are the tallest",
"title": "Tall penguins",
"url": "http://example.com/foo"
}
]
}
Remember that this is only an illustrative deployment. Feel free to modify it as needed to accommodate your environment.
A Note on Air-gapped Environments
All images in the proxy.replicated.com
registry are available to pull and copy into an air-gapped environment. These can be pulled using the license ID and steps previously provided by Cohere.
Updated 16 days ago