Here, we discuss a few principles and techniques for writing prompts (inputs for our models) that will help you get the best generations for your task. Choosing the right temperature can also have a big influence on generation quality. We discuss temperature separately here.
We call the input into
generatea prompt and the output generation or completion.
We find that there are two main ideas to keep in mind while designing prompts for our models.
generate, it is useful to try a range of different prompts for the problem you are trying to solve. Different formulations of the same prompt which might sound similar to humans can lead to generations that are quite different from each other. This might happen, for instance, because our models have learned that the different formulations are actually used in very different contexts and for different purposes. Below we give a number of examples that we've found to work particularly well for different tasks.
Additionally, you can also use the
likelihood feature in the playground to see if there are particular words, phrases, or structures that the model has trouble understanding. However, keep in mind that the average likelihood of tokens will always be high at the beginning of the sequence. The model might assign low likelihood to the first time you introduce a novel concept or name, but once it has seen it once it can readily use it in the generation. You can also use the
likelihood capability to see if there is any spelling or punctuation that is creating issues for tokenization.
Provide the model with enough context. Let's consider a few aspects of this by looking at an example. Say, that you would like to use our models to assist your customer satisfaction department by automatically generating plausible responses to customer requests.
Say a customer contacts your company with the following question:
How do we design a prompt around this to get useful generations for the agent interacting with the customer?
Let's begin with telling our model what the general setting is and what the remainder of the prompt is going to contain:
Great, we've told our model about what to expect and have made it clear that our query is a question of customer. Next, let's show the model the beginning of the response we would like to give the customer.
Note how we've stated clearly that the next sentence is a response to the question, that it comes from a customer service agent, and that we want to give a positive answer. Putting this all together, we obtain the following prompt:
Feeding this into our shark model multiple times we get following completions:
- Yes, we are able to accept returns if the product is unused and unopened.
- Yes, we are happy to refund your purchase. However, we do require you to return the item to our store for a full refund.
- Yes, we can do that. Please send us a message with your name, phone number, and the reason for the refund. We will get back to you as soon as possible.
Note, that even though this is simplified example we get plausible completions from the model even without finetuning it on customer service interactions! This could be further improved by finetuning it on examples of how you would like the model to handle specific questions and requests.
Give a few examples of the types of the type of generation you want. This is called few-shot learning. Let's look at an example. Say, you'd like to use our models to classify whether a movie review is positive, negative or neutral. Imagine that you feed the following prompt into our model:
An actual generation based on this prompt by our shark model reads:
Clearly, there are generations that are model sees as likely that are not the type of generation we'd like to get.
One technique of narrowing down the type of generation we would like is called few-show learning, that is, we extend the prompt with a few examples of what a "successful" generation would look like. Additionally, we also tell the model at the beginning of the prompt what's going on: "This is a movie review sentiment classifier."
Putting this all together and feeding this new prompt
shark, we reliably get the generation positive.
Few-shot generations will generally work better with our larger models. You can use the
likelihood endpoint to see how uncertain the model is about the correct answers given in the examples.
Try using prose instead of commands. An intuitive way to interact with the
generate models is to give commands to the model about the type of generation that you want e.g.
Give a list of artistic professions:. However, since much of the text that our models have seen is internet articles, sometimes this way of writing will be misunderstood. Try rephrasing the command into prose in a way such that the model will give the desired output:
Similarly, when doing summarization, if appending
Summary: to the end of the article is not working, try writing
Summarize the following text:,
To summarize: or
TL;DR: which we find to work particularly well.
In general, you may want to experiment with different styles of writing until you get something that works. Examples include writing in the style of a news article, a blog post, or a dialogue.
Here we showcase how we can use to apply the principles above by looking at two specific tasks: generating keywords based on a given passage and generating additional examples given a few existing examples.
Keyword generation: Let's imagine that we have text passages that we'd like to automatically tag with the most relevant concepts appearing in the text.
By combining a number of the techniques discussed above, we can
generate just that! First, we state what the setting for this prompt is at the beginning of the prompt. Then we show the model two examples of what we want it to do: label a passage from John von Neumann's Wikipedia page with the label "John von Neumann", and label a paragraph from the Wikipedia page on Feminism with the label "Feminism". Lastly, we give the model a passage from the Wikipedia page on Python.
This prompt reliably generates "Python" as an answer – while sometimes also returning "Guido van Rossum", another plausible option.
Example generation. A common task is to try to get the model to generate examples according to some description. Formulating the prompt as a list in the following style tends to work well.
which then gives us generations like: