LLM Download Size Calculator
Work out how big a model is to download before you start pulling tens of gigabytes. Enter the parameter count and pick a precision or GGUF quant (F16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, Q2_K…) and get the file size in GB and GiB, using each quant's real bits-per-weight. Pair it with the VRAM calculator to check it also fits your GPU.
How to use the LLM Download Size Calculator
Enter the model's parameter count in billions (e.g. 8 for an 8B model, 70 for 70B) and choose the precision or GGUF quant you intend to download. The size is computed as parameters × bits-per-weight ÷ 8, then shown in GB (decimal, what your ISP and download manager report) and GiB (binary, what your OS shows for disk).
The bits-per-weight figures are the real measured averages for llama.cpp GGUF quants, not a flat round number, so a Q4_K_M of an 8B model lands near the ~4.9GB you actually see on Hugging Face rather than a naive 4GB.
How download size relates to quantization
A model's file size is set almost entirely by how many bits each weight uses. Full precision (F16) is 16 bits per weight, so an 8B model is about 16GB. Quantization shrinks that: Q8_0 is ~8.5 bits, Q4_K_M ~4.83 bits, Q2_K ~3.35 bits. Lower bits mean a smaller download and less VRAM, at some cost to quality — Q4_K_M is the usual sweet spot, Q8_0 is near-lossless, and Q2_K is for when you are desperate to fit a big model on small hardware.
The 'K' quants (Q4_K_M, Q5_K_M, Q6_K) are k-quants that spend slightly more bits on the most important weights, which is why their bits-per-weight are not whole numbers. Multiplying by the parameter count gives the download; add a little for the tokenizer and metadata, which are negligible at these scales.
Common use cases
- Before you pull a model — check whether the Q4 or Q5 fits your disk and your patience.
- Picking a quant — compare F16 vs Q8 vs Q4 sizes for the same model at a glance.
- Capacity planning — size a disk or a cache for a set of models.
- Bandwidth budgeting — estimate the download on a metered or slow connection.