Python Pandas: Reading JSON Files with read

JSON (JavaScript Object Notation) is a popular data format that’s commonly used for transmitting data between servers and web applications. JSON is also widely used for data storage.

Pandas provides a flexible and powerful way to work with JSON files through its read_json() function, which reads JSON data into a DataFrame.

In this tutorial, we will cover:

Basic JSON Reading
Reading JSON from Different File Structures
Specifying JSON Orientations
Reading JSON from URLs
Handling Nested JSON Data
Working with Large JSON Files

Let’s explore each topic with examples.

1. Basic JSON Reading

The simplest way to read a JSON file in Pandas is to pass the file path to read_json().

Example 1: Basic JSON Reading

import pandas as pd

# Load JSON data from a file into a DataFrame
df = pd.read_json("sample.json")
print(df.head())  # Display the first few rows

Explanation: This reads the JSON file sample.json into a DataFrame df.

JSON Example Structure:

If sample.json is structured as follows, it will load smoothly into a DataFrame:

[
  {"Name": "Alice", "Age": 24, "City": "New York"},
  {"Name": "Bob", "Age": 27, "City": "Los Angeles"},
  {"Name": "Charlie", "Age": 22, "City": "Chicago"}
]

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago

2. Reading JSON from Different File Structures

JSON files can have various structures, such as objects, arrays, or nested data. By default, Pandas reads JSON in the records orientation, but you can adjust the orientation to match the structure of your JSON file.

Example 2: JSON Data as an Object of Arrays

{
  "Name": ["Alice", "Bob", "Charlie"],
  "Age": [24, 27, 22],
  "City": ["New York", "Los Angeles", "Chicago"]
}

To read this JSON structure, use orient=’split’:

df = pd.read_json("sample_object.json", orient="split")
print(df)

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago

3. Specifying JSON Orientations

JSON can be structured in various orientations, and Pandas can read these different structures by specifying the orient parameter. The most common JSON orientations are records, split, index, columns, and values.

Example 3: JSON Orientations

Here’s a summary of JSON structures and how to read them with read_json():

Orientation: Records (Default)

[
  {"Name": "Alice", "Age": 24, "City": "New York"},
  {"Name": "Bob", "Age": 27, "City": "Los Angeles"},
  {"Name": "Charlie", "Age": 22, "City": "Chicago"}
]

df = pd.read_json("sample_records.json", orient="records")

Orientation: Index

{
  "Name": {"0": "Alice", "1": "Bob", "2": "Charlie"},
  "Age": {"0": 24, "1": 27, "2": 22},
  "City": {"0": "New York", "1": "Los Angeles", "2": "Chicago"}
}

df = pd.read_json("sample_index.json", orient="index")

Orientation: Columns

{
  "Name": ["Alice", "Bob", "Charlie"],
  "Age": [24, 27, 22],
  "City": ["New York", "Los Angeles", "Chicago"]
}

df = pd.read_json("sample_columns.json", orient="columns")

Explanation:

orient=”records” is the default and treats each JSON object as a row.
orient=”index” expects keys to represent index labels.
orient=”columns” treats keys as column headers.

4. Reading JSON from URLs

You can also read JSON data directly from a URL, which is useful for loading JSON data from APIs.

Example 4: Reading JSON from a URL

url = "https://jsonplaceholder.typicode.com/users"
df = pd.read_json(url)
print(df.head())

Explanation: This reads JSON data from the URL and loads it into a DataFrame. Ensure you have an active internet connection and use an accessible JSON URL.

5. Handling Nested JSON Data

JSON files often contain nested structures, such as dictionaries or lists within lists. To handle nested JSON, we can use the json_normalize() function, which flattens nested structures.

Example 5: Flattening Nested JSON

Given the following JSON structure:

[
  {
    "Name": "Alice",
    "Age": 24,
    "Address": {
      "City": "New York",
      "Zip": "10001"
    }
  },
  {
    "Name": "Bob",
    "Age": 27,
    "Address": {
      "City": "Los Angeles",
      "Zip": "90001"
    }
  }
]

We can flatten the nested structure using json_normalize:

import json
from pandas import json_normalize

# Load JSON and flatten it
with open("nested.json", "r") as file:
    data = json.load(file)

df = json_normalize(data, sep="_")
print(df)

Output:

     Name  Age     Address_City  Address_Zip
0   Alice   24       New York      10001
1     Bob   27    Los Angeles      90001

Explanation: json_normalize(data, sep=”_”) flattens the nested dictionary under the “Address” key, creating separate columns for each nested field.

Example 6: Flattening Multiple Levels of Nesting

For deeply nested JSON data, you can specify multiple paths to normalize specific structures.

{
  "users": [
    {
      "id": 1,
      "name": "Alice",
      "contacts": {
        "email": "alice@example.com",
        "phone": "123-456-7890"
      }
    },
    {
      "id": 2,
      "name": "Bob",
      "contacts": {
        "email": "bob@example.com",
        "phone": "098-765-4321"
      }
    }
  ]
}

# Normalize multiple levels of nesting
df = json_normalize(data["users"], sep="_", record_prefix="contacts_")
print(df)

Output:

    id   name    contacts_email       contacts_phone
0    1  Alice   alice@example.com      123-456-7890
1    2    Bob      bob@example.com      098-765-4321

Explanation:

json_normalize(data[“users”], sep=”_”) flattens multiple nested fields within the JSON object.

6. Working with Large JSON Files

For large JSON files, it is sometimes better to load data in chunks or only specific fields. This is especially helpful when working with massive datasets.

Example 7: Loading Large JSON Files in Chunks

# Load a JSON file in chunks
chunk_size = 1000
for chunk in pd.read_json("large_data.json", lines=True, chunksize=chunk_size):
    print(chunk.head())

Explanation: By setting chunksize=1000, the JSON file is read in chunks of 1000 records at a time, reducing memory usage.

Example 8: Loading Specific Fields from a JSON File

If you only need a subset of fields, you can select them by filtering the DataFrame after reading.

df = pd.read_json("large_data.json", lines=True)
df = df[["id", "name", "contacts_email"]]
print(df.head())

Explanation: This reads the JSON file and then selects only the specified columns, which can help reduce memory usage and focus on relevant data.

Summary of Key read_json() Parameters

Parameter	Description
path_or_buf	Path to the JSON file, URL, or JSON string.
orient	Specifies the JSON orientation (default is records).
lines	Set to True if each line in the JSON file represents a separate JSON object.
typ	Specifies the type of JSON to parse (frame or series).
chunksize	Reads the JSON file in chunks (useful for large files).

Example Summary: Full Workflow with read_json()

To demonstrate various options together, here’s a comprehensive example:

# Comprehensive JSON reading

 example
df = pd.read_json(
    "sample.json",
    orient="records",
    lines=False,
    dtype={"Age": "int32"},
)

# Display the first few rows and column data types
print(df.head())
print(df.dtypes)

Explanation:

This example reads sample.json with the default records orientation, specifying Age as int32 for memory efficiency.

Conclusion

In this tutorial, we covered the read_json() function in Pandas, including:

Basic JSON reading and handling different JSON structures.
Specifying JSON orientations to match data structures.
Reading JSON from URLs.
Flattening nested JSON data with json_normalize.
Handling large JSON files efficiently with chunk loading.

Pandas Reading JSON

Python Pandas: Reading JSON Files with read_json()

1. Basic JSON Reading

Example 1: Basic JSON Reading

JSON Example Structure:

2. Reading JSON from Different File Structures

Example 2: JSON Data as an Object of Arrays

3. Specifying JSON Orientations

Example 3: JSON Orientations

Orientation: Records (Default)

Orientation: Index

Orientation: Columns

4. Reading JSON from URLs

Example 4: Reading JSON from a URL

5. Handling Nested JSON Data

Example 5: Flattening Nested JSON

Example 6: Flattening Multiple Levels of Nesting

6. Working with Large JSON Files

Example 7: Loading Large JSON Files in Chunks

Example 8: Loading Specific Fields from a JSON File

Summary of Key read_json() Parameters

Example Summary: Full Workflow with read_json()

Conclusion

Python Pandas: Reading CSV Files with read_csv()

Getting Started with Pandas in Python: A Beginner’s Tutorial

You may also like