GGUF Metadata Reader

Inspect what is actually inside a GGUF model file without loading it into llama.cpp. Drop a .gguf and this reads the file header and metadata key-values — architecture, context length, embedding size, block (layer) count, attention heads, quantization type and tokenizer — and shows them in a table. Only the first part of the file is read; the multi-gigabyte tensor data is never touched, and nothing is uploaded (parsing runs in your browser).

How to use the GGUF Metadata Reader

Click the box and pick a .gguf file (or drag one in). The tool reads just the start of the file — the magic number, GGUF version, tensor and metadata counts, and every metadata key-value — and lists them, with the most useful fields (architecture, context length, embedding length, block count, head counts, quantization, tokenizer model) pulled to the top.

Large arrays such as the token vocabulary are summarised by type and length rather than printed in full. Because only the header region is read (not the weights), it works on a 40GB model as fast as on a 2GB one, and the file never leaves your machine.

What GGUF metadata contains

GGUF is the single-file format llama.cpp and Ollama use. After a short header (magic GGUF, a version number, and counts of tensors and metadata entries) comes a block of typed key-value metadata, then the tensor data. The metadata is self-describing: it records the model's architecture (general.architecture), its trained context length (*.context_length), embedding size, number of layers and attention heads, the RoPE settings, the quantization (general.file_type), and the full tokenizer — vocabulary, merges and special token IDs.

Reading it answers practical questions before you commit to a download or a run: is this really a 32K-context model, what architecture is it, which quant is this file, and does it carry a chat template. Those are exactly the fields people screenshot when a model misbehaves, and they live in the first few megabytes of the file.

Common use cases

  • Verify a download — confirm the quant, architecture and context length match what the model card claims.
  • Debug loading errors — check the architecture string and GGUF version a runtime expects.
  • Find the chat template — see whether the file embeds tokenizer.chat_template.
  • Audit a folder of models — quickly read what each .gguf actually is.

Frequently asked questions

Is my model file uploaded?

No. The file is read locally in your browser using the File API; only the header region is parsed and nothing is sent to any server.

Why does it only read part of the file?

All metadata sits at the start of a GGUF, before the tensor weights. Reading the first ~48 MB captures it without loading tens of gigabytes of weights, so it is instant on any size model.

It says metadata is truncated — why?

A few models carry an unusually large tokenizer vocabulary that pushes metadata past the read window. The header and most fields still parse; re-run is not needed for the common fields.

Which GGUF versions are supported?

GGUF v2 and v3, which covers essentially every model produced by current llama.cpp and conversion tools.
Embed this tool on your site

Free to embed, no attribution required (but appreciated). Paste this where you want the tool to appear: