Skip to main content


Our language models understand "tokens" rather than characters or bytes. One token can be a part of a word, an entire word, or punctuation. Very common words like "water" will have their own unique tokens. A longer, less frequent word might be encoded into 2-3 tokens, e.g. "waterfall" gets encoded into two tokens, one for "water" and one for "fall". Note that tokenization is sensitive to whitespace and capitalization.

Here are some references to calibrate how many tokens are in a text:

  • one word tends to be about 2-3 tokens
  • a verse of a song is about 128 tokens
  • this short article has about 300 tokens

The number of tokens per word depends on the complexity of the text. Simple text may approach 1 token per word on average, while complex texts may use less common words that require 3-4 tokens per word on average. Our representation models are currently limited to processing sequences with a maximum length of 4096 tokens. As for the generation models, the maximum token length is 2048, inclusive of the prompt and the generation, for all model sizes.

Our vocabulary of tokens is created using Byte Pair Encoding.

Turning text into tokens

Turning text into tokens

How to pick max_tokens when sampling#

The easiest way to determine a good number of tokens is to guess and check using our playground. It is common to request more tokens than required and then run additional processing to retrieve the desired output.