GGUF Metadata Reader
Inspect what is actually inside a GGUF model file without loading it into llama.cpp. Drop a .gguf and this reads the file header and metadata key-values — architecture, context length, embedding size, block (layer) count, attention heads, quantization type and tokenizer — and shows them in a table. Only the first part of the file is read; the multi-gigabyte tensor data is never touched, and nothing is uploaded (parsing runs in your browser).
How to use the GGUF Metadata Reader
Click the box and pick a .gguf file (or drag one in). The tool reads just the start of the file — the magic number, GGUF version, tensor and metadata counts, and every metadata key-value — and lists them, with the most useful fields (architecture, context length, embedding length, block count, head counts, quantization, tokenizer model) pulled to the top.
Large arrays such as the token vocabulary are summarised by type and length rather than printed in full. Because only the header region is read (not the weights), it works on a 40GB model as fast as on a 2GB one, and the file never leaves your machine.
What GGUF metadata contains
GGUF is the single-file format llama.cpp and Ollama use. After a short header (magic GGUF, a version number, and counts of tensors and metadata entries) comes a block of typed key-value metadata, then the tensor data. The metadata is self-describing: it records the model's architecture (general.architecture), its trained context length (*.context_length), embedding size, number of layers and attention heads, the RoPE settings, the quantization (general.file_type), and the full tokenizer — vocabulary, merges and special token IDs.
Reading it answers practical questions before you commit to a download or a run: is this really a 32K-context model, what architecture is it, which quant is this file, and does it carry a chat template. Those are exactly the fields people screenshot when a model misbehaves, and they live in the first few megabytes of the file.
Common use cases
- Verify a download — confirm the quant, architecture and context length match what the model card claims.
- Debug loading errors — check the architecture string and GGUF version a runtime expects.
- Find the chat template — see whether the file embeds
tokenizer.chat_template. - Audit a folder of models — quickly read what each
.ggufactually is.