JSON Schema to GBNF Grammar
Convert a JSON Schema to a GBNF grammar that constrains llama.cpp output to exactly the schema\'s structure. Paste your schema and get a grammar ready for --grammar or --grammar-file, with named rules for reuse across nested schemas.
How to use the JSON Schema to GBNF Grammar
Paste a JSON Schema object into the textarea, then click Generate GBNF. The tool outputs a GBNF grammar file with:
- A
rootrule that matches the top-level schema. - Named rules for each nested object/array/type, enabling reuse when the same schema appears multiple times.
- Primitive rules:
ws(optional whitespace),string,number,integer,boolean,null.
Use the grammar with llama.cpp:
- CLI:
llama-cli --grammar-file output.gbnf ... - llama-server:
{"grammar": "..."}in the API request body. - Python llama-cpp-python:
llm(prompt, grammar=LlamaGrammar.from_string(gbnf))
Supported: object, array, string, integer, number, boolean, null, enum (string literals), local (same schema definitions). Not supported: anyOf/oneOf with more than 2 alternatives, remote , pattern, format.
GBNF grammars and constrained decoding
GBNF (GGML BNF) is a context-free grammar format supported by llama.cpp for constrained decoding. During text generation, the grammar restricts which tokens the model is allowed to emit at each step, guaranteeing that the output is a member of the grammar\'s language. For JSON output, this means the model cannot produce invalid JSON, missing required fields, or wrong value types — eliminating JSON parse errors in production pipelines that call local LLMs.
A GBNF grammar is a sequence of named rules, each with an alternation of terminal strings and references to other rules. The special root rule is the entry point. Primitives like string, number, and ws (whitespace) are typically defined once and referenced everywhere. Object rules enumerate keys in a specific order (or with optional-key alternation) and reference the appropriate value rules. This structure maps directly from a JSON Schema\'s properties, required, type, and enum fields.
Constrained decoding is more reliable than prompt engineering for structured output from small local models (7B–13B), where instruction-following is weaker. Unlike JSON mode in the OpenAI API (which is post-hoc validation), GBNF constraints are enforced at the token level — the model physically cannot produce invalid output. The trade-off is that very complex grammars can slow down generation slightly due to the constraint evaluation overhead.
Common use cases
- Local LLM structured output — guarantee JSON-valid responses from llama.cpp-served models without post-hoc parsing or retry logic.
- Function calling on local models — constrain tool call argument output to exact parameter schemas for local function-calling pipelines.
- Data extraction grammars — extract specific fields from documents using a schema that matches exactly the fields you need.
- Classification with enum — use enum constraints to guarantee the model picks from a fixed list of category labels.
- RAG structured answers — constrain answers to include required fields like source, confidence, and answer in a defined schema.