Limited Access

New accounts on the Cohere platform have limitations in place by default to promote the responsible use of Cohere’s technology. Users can Request Full Access to the platform by filling a form inside the Playground. The form helps us better understand your intended use case, potential risks, and commitment to responsibility. Upon approval, you will have full access to the features listed below. Both levels of access are subject to Cohere’s Usage Guidelines and violation of these guidelines may lead to suspension of service.

Limited access features:

  • Access to Cohere’s large language models. This includes generation models and representation models of different sizes.
  • Access to Cohere’s Generate, Embed, and Classify endpoints.
  • Ability to interact with the models via the web playground, SDKs, and CLI.

Limited access limitations:

  • A total usage quota of 500,000 characters for the Generation endpoint.
  • Build custom, finetuned models.
  • The API is subject to rate limits across all endpoints that restrict the number of calls your application can make to the API per minute/day. Full access raises these limits significantly, thus allowing you to use the platform in your production environment.
    • Generate: 60 calls / minute. 50,000 calls / day.
    • Each of the other endpoints: 500 calls / minute. 100,000 calls / day.
  • Limited to testing and experimentation. Not for production usage.

Full Access#

You can apply for full access through a link in the Cohere Playground. The Cohere team will review applications and approve submissions that pass a more comprehensive safety check.

Full access features:

  • Significantly increased API rate limits (i.e. ability to serve in production scenarios).
  • Access to all three models
  • No characters usage quota on Generation endpoint.


Usage Quota#

How is the usage quota calculated?Character usage is tallied as the sum of characters in the input prompt in addition to characters in the output of API calls.
What happens after the usage quota is reached?Access to the generation endpoint is suspended after reaching the usage quota. Applying for full access will reinstate access to the generation endpoint.
My organization’s account has multiple users, do we have separate usage quotas?The usage quota is calculated per organization. It is calculated by summing the usage of all the users that belong to the organization.

Rate limits#

What happens after the API rate limit is reached?The endpoint will not be accessible immediately after reaching the API rate limit. Allow some time and try again.
My organization’s account has multiple users, do we have separate rate limits?Rate limits are also calculated per organization. They are calculated by summing the usage of all the users that belong to the organization.
How are the durations of the API rate limit calculated?We evaluate the sliding window prior to the current API call to determine if the call is still within the rate limit.

Limited Access and Full Access#

Why have two modes of access?This arrangement (of two modes) aims to strike a careful balance. It gives developers effortless access to test and experiment with language models while mitigating the possible risks of misusing this technology in the field. We believe organizations must develop and release their technologies responsibly. This means proactively working to build safer products and accepting a duty of care to users, the environment, and society.
Can I serve Cohere outputs in production when my account is in Limited Access mode?Users need to be approved for full access to serve the API in production. Think of Limited Access like a laboratory phase where you test the platform against a specific use case.
What happens after my application for full access is approved?After the Cohere team evaluates your form and approves the application, your account will be switched to full access mode.