JSON to Parquet Schema

Convert JSON objects to Apache Parquet schema definitions. Generate Python pyarrow code ready to write Parquet files from your JSON data.

Input JSON
Parquet Schema & Python Code

JSON to Parquet Guide

Understanding JSON to Parquet Conversion

Apache Parquet is a columnar storage format widely used in data engineering. Unlike JSON, which is row-oriented and schema-less, Parquet stores data column-by-column with an explicit schema and efficient compression. Converting JSON to a Parquet schema is the first step before writing Parquet files in your data pipeline.

This tool generates two outputs: a Parquet message schema (similar to protobuf format) and ready-to-run Python code using the pyarrow library.

JSON to Parquet Type Mapping

JSON Type         → Parquet / pyarrow Type
─────────────────────────────────────────
string            → pa.string() or pa.large_string()
integer number    → pa.int64()
float number      → pa.float64()
boolean           → pa.bool_()
null              → pa.null_() (or nullable field)
ISO date string   → pa.timestamp('us')    (auto-detected)
array             → pa.list_(inner_type)
nested object     → pa.struct([...fields])

Using the Generated Python Code

  1. Paste your JSON object or array into the input panel.
  2. Configure options: enable timestamp detection for ISO date strings, nullable for optional fields.
  3. Click "Generate Parquet Schema" to get both the schema definition and Python code.
  4. Copy the Python code and run it with pip install pyarrow installed in your environment.
  5. The generated code reads data.json and writes to output.parquet — edit the filenames as needed.

Features

🗜️ Parquet Type Mapping

  • int64 / float64 / bool_ / string
  • Nested struct fields
  • list_() for JSON arrays
  • ISO timestamp detection

🐍 pyarrow Python Code

  • Full pa.schema() definition
  • Table.from_pylist() usage
  • pq.write_table() example
  • Copy-paste ready code