JSON Schema to GBNF Grammar

Convert a JSON Schema to a GBNF grammar that constrains llama.cpp output to exactly the schema\'s structure. Paste your schema and get a grammar ready for --grammar or --grammar-file, with named rules for reuse across nested schemas.

How to use the JSON Schema to GBNF Grammar

Paste a JSON Schema object into the textarea, then click Generate GBNF. The tool outputs a GBNF grammar file with:

  • A root rule that matches the top-level schema.
  • Named rules for each nested object/array/type, enabling reuse when the same schema appears multiple times.
  • Primitive rules: ws (optional whitespace), string, number, integer, boolean, null.

Use the grammar with llama.cpp:

  • CLI: llama-cli --grammar-file output.gbnf ...
  • llama-server: {"grammar": "..."} in the API request body.
  • Python llama-cpp-python: llm(prompt, grammar=LlamaGrammar.from_string(gbnf))

Supported: object, array, string, integer, number, boolean, null, enum (string literals), local (same schema definitions). Not supported: anyOf/oneOf with more than 2 alternatives, remote , pattern, format.

GBNF grammars and constrained decoding

GBNF (GGML BNF) is a context-free grammar format supported by llama.cpp for constrained decoding. During text generation, the grammar restricts which tokens the model is allowed to emit at each step, guaranteeing that the output is a member of the grammar\'s language. For JSON output, this means the model cannot produce invalid JSON, missing required fields, or wrong value types — eliminating JSON parse errors in production pipelines that call local LLMs.

A GBNF grammar is a sequence of named rules, each with an alternation of terminal strings and references to other rules. The special root rule is the entry point. Primitives like string, number, and ws (whitespace) are typically defined once and referenced everywhere. Object rules enumerate keys in a specific order (or with optional-key alternation) and reference the appropriate value rules. This structure maps directly from a JSON Schema\'s properties, required, type, and enum fields.

Constrained decoding is more reliable than prompt engineering for structured output from small local models (7B–13B), where instruction-following is weaker. Unlike JSON mode in the OpenAI API (which is post-hoc validation), GBNF constraints are enforced at the token level — the model physically cannot produce invalid output. The trade-off is that very complex grammars can slow down generation slightly due to the constraint evaluation overhead.

Common use cases

  • Local LLM structured output — guarantee JSON-valid responses from llama.cpp-served models without post-hoc parsing or retry logic.
  • Function calling on local models — constrain tool call argument output to exact parameter schemas for local function-calling pipelines.
  • Data extraction grammars — extract specific fields from documents using a schema that matches exactly the fields you need.
  • Classification with enum — use enum constraints to guarantee the model picks from a fixed list of category labels.
  • RAG structured answers — constrain answers to include required fields like source, confidence, and answer in a defined schema.

Frequently asked questions

What version of llama.cpp does GBNF support apply to?

GBNF has been supported in llama.cpp since mid-2023 (builds after commit ~b1217). The --grammar-file flag is available in llama-cli and the llama-server API exposes a "grammar" parameter. llama-cpp-python exposes LlamaGrammar.from_string().

Why are optional properties hard in GBNF?

JSON objects have unordered keys, but GBNF rules are ordered sequences. Handling all permutations of optional keys combinatorially is exponential in the number of optional properties. This generator uses a simplified approach: required keys are emitted in declaration order; optional keys may be omitted but appear in a fixed order when present.

Does constrained decoding affect output quality?

For well-defined schemas it improves reliability (no invalid JSON) with negligible quality loss. Very tight constraints (e.g., a specific enum with one option) effectively force the output and bypass reasoning. For creative tasks, use constraints only for the structured parts of the response.

Can I use GBNF with OpenAI-compatible servers like Ollama?

Ollama does not directly expose GBNF, but supports a "format": "json" mode and structured output via JSON schema in recent versions. llama-server (the official HTTP server from llama.cpp) accepts GBNF directly. LM Studio supports grammar files as of recent releases.

What is the ws rule?

ws stands for optional whitespace. It matches zero or more spaces, tabs, and newlines. JSON allows whitespace between any two tokens, so every grammar that validates JSON must include ws between keys, values, commas, and braces.