CSV Schema Inference Tool
Point it at CSV and get an inferred schema. For each column it detects the type — integer, number, boolean, date, datetime, UUID, email or string — along with whether it's nullable, how many distinct values it holds, its maximum length, and whether it looks like a candidate unique key. Then export the result as a PostgreSQL CREATE TABLE, a TypeScript interface, or a JSON Schema. It parses real CSV (quoted fields, embedded commas) and runs entirely in your browser.
How to use the CSV Schema Inference Tool
Paste CSV with a header row, or open a file. The first row is treated as column names and every following row as data. The delimiter is auto-detected from the first line by default, but you can force comma, tab, semicolon or pipe if detection guesses wrong (which can happen when a free-text column is full of commas). The parser is RFC 4180-aware, so quoted fields containing commas, quotes or newlines are handled correctly rather than split mid-value.
For each column the tool reports an inferred type, decided by testing the non-empty values against ordered patterns — boolean (true/false/yes/no), integer, decimal number, UUID, datetime, date, email — and falling back to string. A column is nullable if any row is empty in that position; distinct counts the unique values; and a column whose values are all distinct (and there is more than one) is starred as a candidate unique key. Switch the Output selector from "report only" to generate a starting CREATE TABLE (PostgreSQL types, with NOT NULL where appropriate and a sized VARCHAR for short strings), a TypeScript interface (optional members for nullable columns), or a draft JSON Schema with formats for dates, emails and UUIDs. Inference is a starting point — widen a type or add constraints if your sample doesn't represent the full data.
How column types are inferred from CSV
CSV carries no type information — every value is just text — so loading a file into a typed system always begins with the same question: what is each column, really? Is credits an integer or could it hold decimals? Is active a boolean? Is id unique enough to be a primary key? Answering by eye on a large file is slow and unreliable, and getting it wrong means a failed import or a column typed too narrowly to hold the next batch of data.
Schema inference automates that judgement by scanning the actual values. The approach is to test each column's non-empty cells against a sequence of patterns from most specific to least: if every value matches the boolean set it's a boolean; otherwise if all are integers it's an integer; then decimal, UUID, datetime, date, and email are tried in turn; and anything that fails them all is a string. Emptiness drives nullability — a column with any blank cell is marked nullable — and counting distinct values reveals likely keys: a column whose every value is unique is a candidate identifier. These are the same heuristics that database import wizards and data-profiling tools apply, made transparent so you can see why each decision was made.
The inferred profile then maps cleanly onto whatever target you need. A PostgreSQL table wants INTEGER, DOUBLE PRECISION, BOOLEAN, DATE, TIMESTAMP, UUID and sized VARCHAR/TEXT columns, with NOT NULL where no blanks appeared; a TypeScript interface wants number, boolean and string members, optional where the column is nullable; a JSON Schema wants typed properties with format hints for dates, emails and UUIDs and a required list. The crucial caveat is that inference describes the sample, not the universe: a column that happens to hold only integers in your file might legitimately contain decimals elsewhere, and a column that looks unique in 100 rows may collide in a million. Treat the generated schema as an accurate, fast first draft, then widen types and relax or tighten constraints based on what you know about the real data.
Common use cases
- Bootstrap a table. Generate a PostgreSQL CREATE TABLE from a CSV export before importing it.
- Type a data feed. Produce a TypeScript interface for rows parsed from a CSV file.
- Profile unknown data. See types, null rates, distinct counts and candidate keys for a file you were just handed.
- Draft a JSON Schema. Get a validation schema with date, email and UUID formats from sample data.