Avro Schema from JSON Generator
Turn a sample JSON object — or an array of objects — into an Apache Avro schema. It infers each field's type, makes a field nullable (a ["null", T] union with default: null) when any sample is null or the key is missing from some records, widens a field that mixes integers and floats to double, and recurses into nested objects and arrays as Avro records and array types. Set the record name and namespace and copy the resulting .avsc. It runs entirely in your browser.
How to use the Avro Schema from JSON Generator
Paste a representative JSON sample. A single object produces a record with one field per key; an array of objects is the better input, because the more records you provide, the more accurately the generator can tell which fields are optional and which types vary. Give the top-level record a name (Avro requires every record to be named) and an optional namespace to qualify it — both are emitted into the schema.
Two inference rules do most of the useful work. A field becomes nullable — encoded as the union ["null", T] with "default": null — whenever a sample value is null or the key is absent from at least one record, which is exactly the condition Avro needs a default for. And when a numeric field holds both whole numbers and decimals across your samples, it is widened to double rather than guessed as long, so it won't reject a fractional value later. Nested objects become nested named records and arrays become Avro array types with an inferred items schema. Review the result — Avro distinguishes int/long and float/double and the generator chooses the wider, safer type, which you may want to narrow by hand if you know the true range.
Avro schemas and why JSON inference helps
Apache Avro is a compact binary serialization format used heavily in the data and streaming world — Kafka topics, Hadoop pipelines, and schema registries all speak Avro. Its defining feature is that data is always written together with a schema: a JSON document (conventionally a .avsc file) that names the record and declares each field's type. Because the schema travels with the data, Avro can evolve fields over time and readers can reconcile an old schema with a new one, which is why it underpins so many long-lived event streams.
Writing that schema by hand is tedious and error-prone, especially the parts that matter most for evolution. Avro has no native "optional" — a field that might be absent or null must be modelled as a union with null, and to be safely added later it needs a default. The numeric types are also stricter than JSON's single number: Avro separates int from long and float from double, so a field that ever carries a decimal must be declared as floating point up front. Getting these wrong produces schemas that validate your first sample but reject the second.
Inferring the schema from real data sidesteps those traps. By scanning a set of sample records, the generator can see which keys are sometimes missing (making them nullable unions with defaults), which numeric fields mix integers and decimals (widening them to double), and how nested objects and arrays are shaped (turning them into nested records and typed arrays). The output is a conventional, registry-ready starting point you can refine — tightening types you know are bounded, adding documentation strings, or adjusting names. It is a draft generator, not a substitute for understanding your data's contract: review the unions and numeric widths before publishing the schema to a registry that downstream consumers depend on.
Common use cases
- Kafka / schema registry. Bootstrap an Avro schema for a topic from a few sample event payloads.
- Optional-field modelling. Let missing and null samples drive correct ["null", T] unions with defaults.
- Pipeline onboarding. Generate a first .avsc for an existing JSON feed you need to ingest as Avro.
- Learning Avro. Compare JSON you understand against the Avro types it maps to, including nested records.