pgvector Index Sizing Calculator

Calculate how much disk and RAM your pgvector deployment will need before you provision hardware. Set the number of vectors, dimensions, element type (float32 / halfvec / int8), and index type (flat / IVFFlat / HNSW) to get humanised size estimates and a recommended RAM target for keeping the index hot.

How to use the pgvector Index Sizing Calculator

Set the vector count, dimensionality, element type, and index type, then click Calculate:

  • N — total number of vectors you plan to store. 1 million is a typical starting point; RAG pipelines often scale to 10-100M.
  • D — dimensionality of each vector. Common values: 384 (MiniLM), 768 (BERT), 1536 (text-embedding-3-small), 3072 (text-embedding-3-large).
  • Element type — float32 is the default and gives full precision. halfvec (pgvector 0.7+) halves storage at minor recall cost. int8 / bit vectors are for binary quantisation.
  • Index type — flat uses a sequential scan (no index overhead, good for <100K vectors); IVFFlat clusters vectors into lists and probes a subset at query time; HNSW builds a navigable small-world graph and typically gives the best recall/speed trade-off.
  • HNSW m — number of graph links per node. Higher m → better recall, larger index. Default 16; 32-64 for high-recall workloads; 8 for memory-constrained deployments.

The calculator outputs raw vector storage, table overhead, index size, total disk, and recommended RAM (table + index × 1.2 so the working set fits in memory).

pgvector index types: flat, IVFFlat, HNSW

pgvector is a PostgreSQL extension that stores and indexes floating-point vectors alongside relational data. It supports three query strategies, each with different storage and recall characteristics. The flat (sequential scan) strategy computes exact nearest-neighbour distances by scanning every vector — 100% recall, zero index overhead, but O(N) per query. At a million vectors with 1536 dimensions this is typically too slow for interactive use. The IVFFlat (Inverted File with Flat lists) index partitions vectors into k lists using k-means at build time, then probes only a subset of lists at query time. The index adds roughly 5% overhead to the raw vector storage. Build time is fast (minutes), but recall drops noticeably if the number of probes is too low. The HNSW (Hierarchical Navigable Small World) index builds a layered proximity graph. It delivers the best recall/QPS trade-off in most benchmarks, at the cost of significantly more RAM — the graph links add approximately N × m × 8.5 bytes on top of the raw vectors. Build time is also slower than IVFFlat.

The recommended RAM figure (table + index × 1.2) ensures the index fits in the OS page cache. If the index exceeds available RAM, queries will hit disk and latency will spike dramatically — usually 10-100x. For HNSW on large datasets, provisioning enough RAM is the single most important deployment decision. If RAM is constrained, use halfvec (2 bytes/element) to cut vector storage in half, or reduce m from 16 to 8, which roughly halves the graph link overhead.

Common use cases

  • RAG pipeline sizing — estimate Postgres RAM before loading your document corpus for a retrieval-augmented generation system.
  • Instance right-sizing — compare float32 vs halfvec storage costs to decide whether to upgrade to a larger AWS RDS or Supabase plan.
  • HNSW m tuning — see the RAM impact of changing m from 16 to 32 or 64 before modifying a production index.
  • Flat vs HNSW decision — for small datasets (<200K vectors) the flat scan may fit comfortably in RAM and adds zero maintenance overhead.
  • Capacity planning — project disk usage for 6 months of vector ingestion at a known rate (e.g. 50K new documents/day).

Frequently asked questions

What is the HNSW m parameter and how do I choose it?

The m parameter controls how many graph links each node has. Higher m gives better recall at query time (more paths to explore) but increases both index build time and RAM usage. The default of 16 works well for most workloads. Use m=8 to save ~40% RAM; m=32-64 if you need >99% recall at high QPS.

Does pgvector support binary vectors?

Yes, pgvector 0.7+ supports the bit type for binary vectors (1 bit per dimension, stored as bytes). The effective bytes-per-element is 1/8 — use "1 byte" in this calculator as an approximation; actual storage is approximately D/8 bytes per vector.

How much RAM does the IVFFlat index use?

IVFFlat stores an inverted-file structure: the cluster centroids (k × D × 4 bytes) plus the vector IDs in each list. This calculator approximates it as 5% overhead over raw vector storage, which is accurate for typical k values (sqrt(N)). The centroid table itself is tiny.

Can pgvector run on a small Postgres instance (e.g., 4 GB RAM)?

Yes, for small corpora. 100K vectors at 384 dimensions with float32 is about 154 MB raw; the HNSW index adds ~14 MB. That comfortably fits in 4 GB RAM. Above ~500K vectors at 1536 dimensions, you typically need 16+ GB RAM to keep the HNSW index warm.

Should I use pgvector or a dedicated vector database?

pgvector is the right choice when your vectors live alongside relational data (user records, document metadata) and you want a single database. Dedicated databases (Pinecone, Qdrant, Weaviate) offer higher QPS at billion-scale and managed indexing, but add operational complexity. For most RAG deployments under 10M vectors, pgvector on a properly-sized Postgres instance performs excellently.