Sampling from generation models incorporates randomness, so that the same prompt may yield different outputs each time you hit "generate". Temperature is a number used to tune the degree of randomness.
A lower temperature means less randomness; a temperature of 0 will always yield the same output. Lower temperatures (less than 1) are more appropriate when performing tasks that have a "correct" answer, like question answering or summarization. If the model starts repeating itself this is a sign that the temperature is too low.
High temperature means more randomness, which can help the model give more creative outputs. If the model starts going off topic or giving non-sensical outputs, this is a sign that the temperature is too high.
Temperature can be tuned for different problems, but most people will find that a temperature of 1 is a good starting point.
As sequences get longer, the model naturally becomes more confident in its predictions, so you can raise the temperature much higher for long prompts without going off topic. In contrast, using high temperatures on short prompts can lead to outputs being very unstable.