3. Script Language > AI - Artificial Intelligence Commands > AIC. - Artificial Intelligence Command > AI

AIC.Set Top_K.

Previous Top Next

MiniRobotLanguage (MRL)

AIC.Set Top_K.

Set the Top_K Value in the LLM

Intention

The top_k parameter in OpenAI's GPT-3 API is used to limit the number of tokens considered for each step of the generation process.

It's a form of stochastic truncation, helping to focus the model's predictive capabilities on a smaller subset of its vocabulary.

The minimum value for top_k is 0.

This means the model will consider all possible tokens for each step, leading to a highly diverse and potentially less focused output.

There isn't a strict maximum value for top_k, but it should be noted that setting top_k to a very high value (greater than the size of the model's vocabulary) is effectively the same as not setting it at all, as the model will consider all possible tokens for each step.

If the top_k parameter is not provided in the call, the model will use its default behavior, which is to consider all tokens in its vocabulary.

This is equivalent to setting top_k to a value larger than the size of the model's vocabulary (larger 30.000).

While there's no strict rule for what values of top_k are "regular" or most commonly used, many developers find that values in the range of 20 to 50 provide a good balance between diversity and focus in the model's responses.

For instance, a top_k value of 40 means that at each step, the model will consider the top 40 most likely next words based on its internal calculations.

Here is a more detailed explanation and comparison of the `top_k`, `top_p`, and `temperature` parameters in the context of OpenAI's GPT-3 API.

When GPT-3 generates text, it does so word by word. For each word it generates, it calculates a probability for every word in its vocabulary, and then selects the next word based on these probabilities. The `top_k`, `top_p`, and `temperature` parameters are all ways to influence this selection process.

1. `top_k`: This parameter limits the number of words that the model considers as the next possible word. If `top_k` is set to 50, for example, the model will only consider the 50 words it thinks are most likely. This can make the output more focused and less random, because it's only choosing from a subset of words. However, it can also make the output less diverse, because it's ignoring a lot of potential words. The `top_k` value can be any non-negative integer, with larger values leading to more randomness and smaller values leading to less randomness. If `top_k` is not set, the model considers all possible words.

2. `top_p`: Also known as nucleus sampling, this parameter is a bit more dynamic. Instead of always considering a fixed number of words like `top_k`, `top_p` considers however many words are needed to reach a certain cumulative probability. For example, if `top_p` is set to 0.9, the model will consider the smallest set of words that have a combined probability of 90%. This set of words can be larger or smaller depending on the specific probabilities for each word. Like `top_k`, `top_p` can make the output more focused and less random, but it can also reduce diversity. The `top_p` value is a float between 0 and 1, with larger values leading to more randomness and smaller values leading to less randomness.

3. `temperature`: This parameter controls the "sharpness" of the probability distribution. If `temperature` is set to a high value (close to 1), the model's word selection will be more random and less deterministic, even if some words have much higher probabilities than others. If `temperature` is set to a low value (close to 0), the model's word selection will be more deterministic and less random, with the model strongly favoring words that have higher probabilities. In other words, a high `temperature` makes the model more "adventurous" in its word choices, while a low `temperature` makes the model more "conservative".

In summary, `top_k` and `top_p` are ways to limit the number of words that the model considers for each step of the generation process, while `temperature` is a way to control the randomness of the model's word selection within those limits. All three parameters can be used together to finely tune the behavior of the model. The optimal values for these parameters can depend on your specific use case and the desired behavior of the model. It's a good idea to experiment with different values to see what works best for your needs.

Syntax

AIC.Set Top_K[|P1]

AIC.STK[|P1]

Parameter Explanation

P1 - (optional) numeric value, 0 to 30000. If omitted or -1, then the parameter is not used therefore the System will use internal default values.

Example

'***********************************

Remarks

Limitations: