LLM Sampling Visualizer
Watch how the three knobs that control LLM text generation — temperature, top-k and top-p (nucleus) — reshape the next-token distribution. Enter a handful of logits (one token: value per line), then move the sliders: temperature sharpens or flattens the softmax, top-k keeps only the highest-probability tokens, and top-p keeps the smallest set whose probability mass reaches a threshold. The bars show the final renormalised distribution you'd actually sample from next to the temperature-only baseline, so you can see exactly which tokens survive — computed live in your browser.
How to use the LLM Sampling Visualizer
Type one candidate per line as token: logit — the raw scores a model produces before the softmax — or paste bare numbers and they'll be labelled automatically. Then experiment with the controls. Temperature divides the logits before the softmax: below 1 it sharpens the distribution toward the top token (more deterministic), above 1 it flattens it (more random), and near 0 it becomes greedy. Top-k keeps only the k most probable tokens and zeroes the rest. Top-p, or nucleus sampling, keeps the smallest group of top tokens whose probabilities add up to at least p, so the cut-off adapts to how peaked the distribution is.
The table sorts tokens by probability and shows the final distribution — after temperature, then top-k, then top-p, then renormalising what remains — beside the temperature-only base column, with dimmed rows for tokens that got filtered out. This is the exact pipeline most inference libraries apply in order, so the visualiser is a faithful mental model of what happens at each generation step. Use it to build intuition: see why a low temperature makes output repetitive, why top-p adapts where a fixed top-k can't, and how the two truncation methods interact when used together.
How temperature, top-k and top-p shape generation
At every step a language model outputs a vector of logits — one raw score per token in its vocabulary. A softmax turns those into a probability distribution, and the generator draws the next token from it. Left untouched, that sampling can pick low-probability tokens often enough to produce incoherent text, so practical decoding reshapes the distribution first. The three standard controls do this in different ways, and they are usually applied together in a fixed order.
Temperature rescales the logits before the softmax by dividing them by a constant T. A high temperature pulls the probabilities toward uniform, increasing diversity and surprise; a low temperature exaggerates the gaps so the top token dominates, increasing determinism; at T→0 it collapses to greedy decoding, always taking the single most likely token. Temperature changes how peaked the distribution is but never removes any token entirely. Top-k truncation does the removing: it keeps only the k highest-probability tokens and discards the long tail, which caps the worst-case randomness but uses the same cut-off whether the model is confident or not. Top-p, or nucleus sampling, fixes that rigidity by keeping the smallest set of top tokens whose cumulative probability reaches p — so when the model is sure, the nucleus is tiny, and when it's uncertain, the nucleus widens to include more options. After truncation the surviving probabilities are renormalised to sum to one, and the token is sampled from that.
Reading the interaction is the key skill. A common recipe is a moderate temperature with top-p around 0.9, which gives fluent but non-repetitive text; lowering temperature or p tightens toward safe, predictable output, while raising them invites creativity at the risk of incoherence. The reason greedy and very-low-temperature decoding can sound robotic is visible here: collapse the distribution and the model loops on its highest-probability continuations. The reason top-p generalises better than top-k is also visible: drag the temperature and watch the nucleus expand and contract with the model's confidence while a fixed k cannot. Building that intuition — which this tool exists to do — is what lets you tune a model's "creativity" deliberately instead of by trial and error.
Common use cases
- Tuning generation. Build intuition for what temperature, top-k and top-p actually do before setting them on a real model.
- Debugging output. Understand why a setting makes text repetitive, incoherent or off-topic.
- Teaching. Show students or teammates how softmax, truncation and renormalisation chain together.
- Comparing strategies. See directly why nucleus sampling adapts where a fixed top-k cannot.