RoPE / YaRN Context Extension Calculator
Extend a model's context beyond what it was trained on and get the exact config to do it. Enter the model's native context and the target context, pick a method (linear, dynamic-NTK, or YaRN), and this returns the scaling factor and the ready-to-paste rope_scaling block for Hugging Face config.json. The math runs in your browser.
How to use the RoPE / YaRN Context Extension Calculator
Enter the native context the model was trained with (from its config or the GGUF metadata) and the target context you want. The scaling factor is simply target ÷ native. Pick a method and copy the generated rope_scaling block into the model's config.json (for Hugging Face transformers / vLLM) — for llama.cpp, the same factor maps to --rope-scaling and --rope-scale.
Bigger factors stretch the model further but cost quality. As a rule of thumb, linear interpolation is fine up to ~2×, dynamic-NTK holds a little further, and YaRN is the method to reach 4× and beyond with the least degradation. None of them is free — always test on long-context tasks after extending.
How context extension works
Transformers encode token positions with Rotary Position Embeddings (RoPE). A model only sees positions up to its trained context, so feeding it longer sequences puts tokens at positions it never learned, and quality falls off a cliff. RoPE scaling rescales those position frequencies so the longer sequence maps back into the range the model understands.
Linear (position interpolation) divides all positions by the factor — simple, but it compresses fine-grained local positions, hurting short-range precision. Dynamic NTK scales the RoPE base instead, spreading the cost across frequencies and degrading more gracefully. YaRN goes further: it scales different frequency bands differently and adds an attention-temperature correction, which is why it reaches large factors (4×–8×) with markedly less loss and has become the default for long-context finetunes. The factor itself is the same target ÷ native ratio for all three; the methods differ in how they apply it.
Common use cases
- Run a model past its limit — serve a 32K context on an 8K-trained model with the right config.
- Pick a method — see when linear is enough and when you need YaRN.
- Generate the config — get the exact rope_scaling JSON instead of guessing field names.
- Sanity-check a finetune — confirm the factor implied by a model's advertised context.