Skip to main content


This endpoint generates realistic text conditioned on a given input.


curl --location --request POST '' \
--header 'Authorization: BEARER {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"prompt": "Once upon a time in a magical land called",
"max_tokens": 50,
"temperature": 1,
"k": 0,
"p": 0.75

Sample Response#

"text": " Brumana, lived the country folk. Among them was Zhulan Noyan, an adventurer and immortal, one who possesses an innate power of changing forms and abilities, just like a person. His goal was to hunt down"



Represents the prompt or text to be completed.



Denotes the number of tokens to predict per generation. See BPE Tokens for more details.



A non-negative float that tunes the degree of randomness in generation. Lower temperatures mean less random generations. See Temperature for more details.

k (optional)#


Defaults to 0 (disabled). If set to a positive integer, it ensures only the top k most likely tokens are considered for generation at each step.

p (optional)#


Defaults to 0.75. Set to 1.0 to disable. If set to a probability 0.0 < p < 1.0, it ensures that only the most likely tokens, with total probability mass of p, are considered for generation at each step. If both k and p are enabled, p acts after k.

frequency_penalty (optional)#


Defaults to 0.0, max value of 1.0. Can be used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.

presence_penalty (optional)#


Defaults to 0.0, max value of 1.0. Can be used to reduce repetitiveness of generated tokens. Similar to frequency_penalty, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies.

stop_sequences (optional)#

array of string

A stop sequence will cut off your generation at the end of the sequence. Providing multiple stop sequences in the array will cut the generation at the first stop sequence in the generation, if applicable.

return_likelihoods (optional)#

One of GENERATION|ALL|NONE to specify how and if the token likelihoods are returns with the response. Defaults to NONE.

If GENERATION is selected, the token likelihoods will only be provided for generated text

If ALL is selected, the token likelihoods will be provided both for the prompt and the generated text.



array of strings

Contains the generated text.


array of objects

Only returned if return_likelihoods is not set to NONE.

An array of objects with the following shape:

"token": string,
"likelihood": float

The likelihood refers to the log-likelihood of the token. The first token of a context will not have a likelihood.