LLM Sampling Visualizer

Watch how the three knobs that control LLM text generation — temperature, top-k and top-p (nucleus) — reshape the next-token distribution. Enter a handful of logits (one token: value per line), then move the sliders: temperature sharpens or flattens the softmax, top-k keeps only the highest-probability tokens, and top-p keeps the smallest set whose probability mass reaches a threshold. The bars show the final renormalised distribution you'd actually sample from next to the temperature-only baseline, so you can see exactly which tokens survive — computed live in your browser.

How to use the LLM Sampling Visualizer

Type one candidate per line as token: logit — the raw scores a model produces before the softmax — or paste bare numbers and they'll be labelled automatically. Then experiment with the controls. Temperature divides the logits before the softmax: below 1 it sharpens the distribution toward the top token (more deterministic), above 1 it flattens it (more random), and near 0 it becomes greedy. Top-k keeps only the k most probable tokens and zeroes the rest. Top-p, or nucleus sampling, keeps the smallest group of top tokens whose probabilities add up to at least p, so the cut-off adapts to how peaked the distribution is.

The table sorts tokens by probability and shows the final distribution — after temperature, then top-k, then top-p, then renormalising what remains — beside the temperature-only base column, with dimmed rows for tokens that got filtered out. This is the exact pipeline most inference libraries apply in order, so the visualiser is a faithful mental model of what happens at each generation step. Use it to build intuition: see why a low temperature makes output repetitive, why top-p adapts where a fixed top-k can't, and how the two truncation methods interact when used together.

How temperature, top-k and top-p shape generation

At every step a language model outputs a vector of logits — one raw score per token in its vocabulary. A softmax turns those into a probability distribution, and the generator draws the next token from it. Left untouched, that sampling can pick low-probability tokens often enough to produce incoherent text, so practical decoding reshapes the distribution first. The three standard controls do this in different ways, and they are usually applied together in a fixed order.

Temperature rescales the logits before the softmax by dividing them by a constant T. A high temperature pulls the probabilities toward uniform, increasing diversity and surprise; a low temperature exaggerates the gaps so the top token dominates, increasing determinism; at T→0 it collapses to greedy decoding, always taking the single most likely token. Temperature changes how peaked the distribution is but never removes any token entirely. Top-k truncation does the removing: it keeps only the k highest-probability tokens and discards the long tail, which caps the worst-case randomness but uses the same cut-off whether the model is confident or not. Top-p, or nucleus sampling, fixes that rigidity by keeping the smallest set of top tokens whose cumulative probability reaches p — so when the model is sure, the nucleus is tiny, and when it's uncertain, the nucleus widens to include more options. After truncation the surviving probabilities are renormalised to sum to one, and the token is sampled from that.

Reading the interaction is the key skill. A common recipe is a moderate temperature with top-p around 0.9, which gives fluent but non-repetitive text; lowering temperature or p tightens toward safe, predictable output, while raising them invites creativity at the risk of incoherence. The reason greedy and very-low-temperature decoding can sound robotic is visible here: collapse the distribution and the model loops on its highest-probability continuations. The reason top-p generalises better than top-k is also visible: drag the temperature and watch the nucleus expand and contract with the model's confidence while a fixed k cannot. Building that intuition — which this tool exists to do — is what lets you tune a model's "creativity" deliberately instead of by trial and error.

Common use cases

  • Tuning generation. Build intuition for what temperature, top-k and top-p actually do before setting them on a real model.
  • Debugging output. Understand why a setting makes text repetitive, incoherent or off-topic.
  • Teaching. Show students or teammates how softmax, truncation and renormalisation chain together.
  • Comparing strategies. See directly why nucleus sampling adapts where a fixed top-k cannot.

Frequently asked questions

What are logits?

Logits are the raw, unnormalised scores a model assigns to every token in its vocabulary before the softmax converts them to probabilities. Higher logit means the model favours that token. This tool takes a small set of logits so you can see how the sampling controls transform them into the distribution you sample from.

In what order are temperature, top-k and top-p applied?

The standard pipeline, which this tool follows, is: divide logits by temperature and softmax, then keep the top-k tokens, then keep the top-p nucleus of what remains, then renormalise the survivors so they sum to one. Sampling draws from that final distribution.

What is the difference between top-k and top-p?

Top-k always keeps a fixed number of tokens regardless of how confident the model is. Top-p (nucleus) keeps a variable number — the smallest set whose probabilities sum to p — so the cut-off adapts: a tiny nucleus when the model is certain, a wider one when it is unsure. Top-p generally handles varying confidence better.

What does temperature 0 do?

As temperature approaches 0 the softmax becomes infinitely peaked on the highest logit, so sampling always picks the single most probable token — this is greedy decoding. It is fully deterministic but tends to produce repetitive text. Higher temperatures flatten the distribution and increase diversity.

Is my input sent anywhere?

No. The softmax, truncation and renormalisation all run in your browser as you type. Nothing you enter is uploaded or stored.