CSV Schema Inference Tool

Point it at CSV and get an inferred schema. For each column it detects the type — integer, number, boolean, date, datetime, UUID, email or string — along with whether it's nullable, how many distinct values it holds, its maximum length, and whether it looks like a candidate unique key. Then export the result as a PostgreSQL CREATE TABLE, a TypeScript interface, or a JSON Schema. It parses real CSV (quoted fields, embedded commas) and runs entirely in your browser.

How to use the CSV Schema Inference Tool

Paste CSV with a header row, or open a file. The first row is treated as column names and every following row as data. The delimiter is auto-detected from the first line by default, but you can force comma, tab, semicolon or pipe if detection guesses wrong (which can happen when a free-text column is full of commas). The parser is RFC 4180-aware, so quoted fields containing commas, quotes or newlines are handled correctly rather than split mid-value.

For each column the tool reports an inferred type, decided by testing the non-empty values against ordered patterns — boolean (true/false/yes/no), integer, decimal number, UUID, datetime, date, email — and falling back to string. A column is nullable if any row is empty in that position; distinct counts the unique values; and a column whose values are all distinct (and there is more than one) is starred as a candidate unique key. Switch the Output selector from "report only" to generate a starting CREATE TABLE (PostgreSQL types, with NOT NULL where appropriate and a sized VARCHAR for short strings), a TypeScript interface (optional members for nullable columns), or a draft JSON Schema with formats for dates, emails and UUIDs. Inference is a starting point — widen a type or add constraints if your sample doesn't represent the full data.

How column types are inferred from CSV

CSV carries no type information — every value is just text — so loading a file into a typed system always begins with the same question: what is each column, really? Is credits an integer or could it hold decimals? Is active a boolean? Is id unique enough to be a primary key? Answering by eye on a large file is slow and unreliable, and getting it wrong means a failed import or a column typed too narrowly to hold the next batch of data.

Schema inference automates that judgement by scanning the actual values. The approach is to test each column's non-empty cells against a sequence of patterns from most specific to least: if every value matches the boolean set it's a boolean; otherwise if all are integers it's an integer; then decimal, UUID, datetime, date, and email are tried in turn; and anything that fails them all is a string. Emptiness drives nullability — a column with any blank cell is marked nullable — and counting distinct values reveals likely keys: a column whose every value is unique is a candidate identifier. These are the same heuristics that database import wizards and data-profiling tools apply, made transparent so you can see why each decision was made.

The inferred profile then maps cleanly onto whatever target you need. A PostgreSQL table wants INTEGER, DOUBLE PRECISION, BOOLEAN, DATE, TIMESTAMP, UUID and sized VARCHAR/TEXT columns, with NOT NULL where no blanks appeared; a TypeScript interface wants number, boolean and string members, optional where the column is nullable; a JSON Schema wants typed properties with format hints for dates, emails and UUIDs and a required list. The crucial caveat is that inference describes the sample, not the universe: a column that happens to hold only integers in your file might legitimately contain decimals elsewhere, and a column that looks unique in 100 rows may collide in a million. Treat the generated schema as an accurate, fast first draft, then widen types and relax or tighten constraints based on what you know about the real data.

Common use cases

  • Bootstrap a table. Generate a PostgreSQL CREATE TABLE from a CSV export before importing it.
  • Type a data feed. Produce a TypeScript interface for rows parsed from a CSV file.
  • Profile unknown data. See types, null rates, distinct counts and candidate keys for a file you were just handed.
  • Draft a JSON Schema. Get a validation schema with date, email and UUID formats from sample data.

Frequently asked questions

Which types can it detect?

Per column: boolean (true/false/yes/no), integer, number (decimal), uuid, datetime, date, email, and string as the fallback. It tests the non-empty values from most specific to least specific, so a column is only called a narrower type when every value matches it.

How is nullability and uniqueness determined?

A column is nullable if at least one row has an empty value in that position. It is flagged as a candidate unique key (marked with a star) when every non-empty value is distinct and there is more than one row, which is the signal you would use to pick a primary key.

Does it parse quoted CSV correctly?

Yes. The parser follows RFC 4180: fields wrapped in double quotes may contain commas, line breaks and escaped quotes ("" for a literal quote), and are not split on the delimiter. This avoids the classic mistake of a naive split mangling a quoted free-text column.

What does the generated SQL look like?

A PostgreSQL CREATE TABLE with one column per field, using INTEGER, DOUBLE PRECISION, BOOLEAN, DATE, TIMESTAMP, UUID, or a sized VARCHAR/TEXT for strings, and NOT NULL on columns with no empty values. It is a starting point — review the types, add primary keys, indexes and length limits to match your real constraints.

Will the inferred schema always be correct?

It describes your sample, not all possible data. A column that holds only integers in this file might carry decimals elsewhere, and values unique across 100 rows may collide across millions. Use the result as a fast, accurate first draft and widen types or adjust constraints based on what you know about the full dataset.
Embed this tool on your site

Free to embed, no attribution required (but appreciated). Paste this where you want the tool to appear: