Avro Schema from JSON Generator

Turn a sample JSON object — or an array of objects — into an Apache Avro schema. It infers each field's type, makes a field nullable (a ["null", T] union with default: null) when any sample is null or the key is missing from some records, widens a field that mixes integers and floats to double, and recurses into nested objects and arrays as Avro records and array types. Set the record name and namespace and copy the resulting .avsc. It runs entirely in your browser.

How to use the Avro Schema from JSON Generator

Paste a representative JSON sample. A single object produces a record with one field per key; an array of objects is the better input, because the more records you provide, the more accurately the generator can tell which fields are optional and which types vary. Give the top-level record a name (Avro requires every record to be named) and an optional namespace to qualify it — both are emitted into the schema.

Two inference rules do most of the useful work. A field becomes nullable — encoded as the union ["null", T] with "default": null — whenever a sample value is null or the key is absent from at least one record, which is exactly the condition Avro needs a default for. And when a numeric field holds both whole numbers and decimals across your samples, it is widened to double rather than guessed as long, so it won't reject a fractional value later. Nested objects become nested named records and arrays become Avro array types with an inferred items schema. Review the result — Avro distinguishes int/long and float/double and the generator chooses the wider, safer type, which you may want to narrow by hand if you know the true range.

Avro schemas and why JSON inference helps

Apache Avro is a compact binary serialization format used heavily in the data and streaming world — Kafka topics, Hadoop pipelines, and schema registries all speak Avro. Its defining feature is that data is always written together with a schema: a JSON document (conventionally a .avsc file) that names the record and declares each field's type. Because the schema travels with the data, Avro can evolve fields over time and readers can reconcile an old schema with a new one, which is why it underpins so many long-lived event streams.

Writing that schema by hand is tedious and error-prone, especially the parts that matter most for evolution. Avro has no native "optional" — a field that might be absent or null must be modelled as a union with null, and to be safely added later it needs a default. The numeric types are also stricter than JSON's single number: Avro separates int from long and float from double, so a field that ever carries a decimal must be declared as floating point up front. Getting these wrong produces schemas that validate your first sample but reject the second.

Inferring the schema from real data sidesteps those traps. By scanning a set of sample records, the generator can see which keys are sometimes missing (making them nullable unions with defaults), which numeric fields mix integers and decimals (widening them to double), and how nested objects and arrays are shaped (turning them into nested records and typed arrays). The output is a conventional, registry-ready starting point you can refine — tightening types you know are bounded, adding documentation strings, or adjusting names. It is a draft generator, not a substitute for understanding your data's contract: review the unions and numeric widths before publishing the schema to a registry that downstream consumers depend on.

Common use cases

  • Kafka / schema registry. Bootstrap an Avro schema for a topic from a few sample event payloads.
  • Optional-field modelling. Let missing and null samples drive correct ["null", T] unions with defaults.
  • Pipeline onboarding. Generate a first .avsc for an existing JSON feed you need to ingest as Avro.
  • Learning Avro. Compare JSON you understand against the Avro types it maps to, including nested records.

Frequently asked questions

How does it decide a field is nullable?

A field is made nullable when any sample value for it is null, or when the key is missing from at least one record in an array of samples. Nullable fields are emitted as an Avro union ["null", T] with "default": null, which is both correct Avro and the form required for the field to be safely added in a later schema version.

Why did my integer field become a double?

Because at least one sample held a decimal value. JSON has a single number type, but Avro separates integers (int/long) from floating point (float/double). If a field mixes whole and fractional numbers across your samples, it is widened to double so it will not reject a decimal later. Narrow it to long by hand only if you are certain the field is always integral.

Does it choose int or long, float or double?

It uses the wider, safer types — long for integers and double for decimals — because inference from samples cannot prove a value will always fit a narrower type. If you know the true range, edit the generated schema to use int or float to save space.

How are nested objects and arrays handled?

A nested object becomes a nested Avro record (with a name derived from the field), and a JSON array becomes an Avro array whose items schema is inferred from the elements. Empty arrays and all-null fields fall back to a string items type, which you should adjust once you have representative data.

Is the output ready to register as-is?

It is a solid draft, not a final contract. Review the unions, numeric widths, and any fallback string types, add doc strings or aliases your consumers expect, and confirm the names and namespace. Treat it as a fast starting point that removes the boilerplate, then refine before publishing to a registry others depend on.