RoPE / YaRN Context Extension Calculator

Extend a model's context beyond what it was trained on and get the exact config to do it. Enter the model's native context and the target context, pick a method (linear, dynamic-NTK, or YaRN), and this returns the scaling factor and the ready-to-paste rope_scaling block for Hugging Face config.json. The math runs in your browser.

How to use the RoPE / YaRN Context Extension Calculator

Enter the native context the model was trained with (from its config or the GGUF metadata) and the target context you want. The scaling factor is simply target ÷ native. Pick a method and copy the generated rope_scaling block into the model's config.json (for Hugging Face transformers / vLLM) — for llama.cpp, the same factor maps to --rope-scaling and --rope-scale.

Bigger factors stretch the model further but cost quality. As a rule of thumb, linear interpolation is fine up to ~2×, dynamic-NTK holds a little further, and YaRN is the method to reach 4× and beyond with the least degradation. None of them is free — always test on long-context tasks after extending.

How context extension works

Transformers encode token positions with Rotary Position Embeddings (RoPE). A model only sees positions up to its trained context, so feeding it longer sequences puts tokens at positions it never learned, and quality falls off a cliff. RoPE scaling rescales those position frequencies so the longer sequence maps back into the range the model understands.

Linear (position interpolation) divides all positions by the factor — simple, but it compresses fine-grained local positions, hurting short-range precision. Dynamic NTK scales the RoPE base instead, spreading the cost across frequencies and degrading more gracefully. YaRN goes further: it scales different frequency bands differently and adds an attention-temperature correction, which is why it reaches large factors (4×–8×) with markedly less loss and has become the default for long-context finetunes. The factor itself is the same target ÷ native ratio for all three; the methods differ in how they apply it.

Common use cases

  • Run a model past its limit — serve a 32K context on an 8K-trained model with the right config.
  • Pick a method — see when linear is enough and when you need YaRN.
  • Generate the config — get the exact rope_scaling JSON instead of guessing field names.
  • Sanity-check a finetune — confirm the factor implied by a model's advertised context.

Frequently asked questions

How is the scaling factor calculated?

Factor = target context ÷ native context. A model trained at 8K extended to 32K uses a factor of 4. All three methods use this same ratio; they differ in how the positions are rescaled.

Which method should I use?

Linear is fine up to about 2×. For larger extensions use YaRN, which keeps quality best at 4× and beyond. Dynamic NTK is a middle option that needs no finetuning.

Will extending context hurt quality?

Yes, to some degree — more so at higher factors and with linear scaling. YaRN minimises it but does not eliminate it. Always evaluate on real long-context tasks after extending.

Does this also work for llama.cpp?

Yes. The factor is the same; in llama.cpp pass --rope-scaling (linear/yarn) and --rope-scale with this factor, plus --ctx-size for the target length.
Embed this tool on your site

Free to embed, no attribution required (but appreciated). Paste this where you want the tool to appear: