Home » Python Pandas: Reading JSON Files with read_json()

Python Pandas: Reading JSON Files with read_json()

Oracle Java Certification
1 Year Subscription
Spring Framework Basics Video Course
Java SE 11 Developer (Upgrade) [1Z0-817]
Java SE 11 Programmer II [1Z0-816] Practice Tests
Java SE 11 Programmer I [1Z0-815] Practice Tests

JSON (JavaScript Object Notation) is a popular data format that’s commonly used for transmitting data between servers and web applications. JSON is also widely used for data storage.

Pandas provides a flexible and powerful way to work with JSON files through its read_json() function, which reads JSON data into a DataFrame.

In this tutorial, we will cover:

  1. Basic JSON Reading
  2. Reading JSON from Different File Structures
  3. Specifying JSON Orientations
  4. Reading JSON from URLs
  5. Handling Nested JSON Data
  6. Working with Large JSON Files

Let’s explore each topic with examples.

1. Basic JSON Reading

The simplest way to read a JSON file in Pandas is to pass the file path to read_json().

Example 1: Basic JSON Reading

import pandas as pd

# Load JSON data from a file into a DataFrame
df = pd.read_json("sample.json")
print(df.head())  # Display the first few rows
  • Explanation: This reads the JSON file sample.json into a DataFrame df.

JSON Example Structure:

If sample.json is structured as follows, it will load smoothly into a DataFrame:

[
  {"Name": "Alice", "Age": 24, "City": "New York"},
  {"Name": "Bob", "Age": 27, "City": "Los Angeles"},
  {"Name": "Charlie", "Age": 22, "City": "Chicago"}
]

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago

2. Reading JSON from Different File Structures

JSON files can have various structures, such as objects, arrays, or nested data. By default, Pandas reads JSON in the records orientation, but you can adjust the orientation to match the structure of your JSON file.

Example 2: JSON Data as an Object of Arrays

{
  "Name": ["Alice", "Bob", "Charlie"],
  "Age": [24, 27, 22],
  "City": ["New York", "Los Angeles", "Chicago"]
}

To read this JSON structure, use orient='split':

df = pd.read_json("sample_object.json", orient="split")
print(df)

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago

3. Specifying JSON Orientations

JSON can be structured in various orientations, and Pandas can read these different structures by specifying the orient parameter. The most common JSON orientations are records, split, index, columns, and values.

Example 3: JSON Orientations

Here’s a summary of JSON structures and how to read them with read_json():

Orientation: Records (Default)

[
  {"Name": "Alice", "Age": 24, "City": "New York"},
  {"Name": "Bob", "Age": 27, "City": "Los Angeles"},
  {"Name": "Charlie", "Age": 22, "City": "Chicago"}
]
df = pd.read_json("sample_records.json", orient="records")

Orientation: Index

{
  "Name": {"0": "Alice", "1": "Bob", "2": "Charlie"},
  "Age": {"0": 24, "1": 27, "2": 22},
  "City": {"0": "New York", "1": "Los Angeles", "2": "Chicago"}
}
df = pd.read_json("sample_index.json", orient="index")

Orientation: Columns

{
  "Name": ["Alice", "Bob", "Charlie"],
  "Age": [24, 27, 22],
  "City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.read_json("sample_columns.json", orient="columns")

Explanation:

  • orient=”records” is the default and treats each JSON object as a row.
  • orient=”index” expects keys to represent index labels.
  • orient=”columns” treats keys as column headers.

4. Reading JSON from URLs

You can also read JSON data directly from a URL, which is useful for loading JSON data from APIs.

Example 4: Reading JSON from a URL

url = "https://jsonplaceholder.typicode.com/users"
df = pd.read_json(url)
print(df.head())

Explanation: This reads JSON data from the URL and loads it into a DataFrame. Ensure you have an active internet connection and use an accessible JSON URL.

5. Handling Nested JSON Data

JSON files often contain nested structures, such as dictionaries or lists within lists. To handle nested JSON, we can use the json_normalize() function, which flattens nested structures.

Example 5: Flattening Nested JSON

Given the following JSON structure:

[
  {
    "Name": "Alice",
    "Age": 24,
    "Address": {
      "City": "New York",
      "Zip": "10001"
    }
  },
  {
    "Name": "Bob",
    "Age": 27,
    "Address": {
      "City": "Los Angeles",
      "Zip": "90001"
    }
  }
]

We can flatten the nested structure using json_normalize:

import json
from pandas import json_normalize

# Load JSON and flatten it
with open("nested.json", "r") as file:
    data = json.load(file)

df = json_normalize(data, sep="_")
print(df)

Output:

     Name  Age     Address_City  Address_Zip
0   Alice   24       New York      10001
1     Bob   27    Los Angeles      90001
  • Explanation: json_normalize(data, sep=”_”) flattens the nested dictionary under the “Address” key, creating separate columns for each nested field.

Example 6: Flattening Multiple Levels of Nesting

For deeply nested JSON data, you can specify multiple paths to normalize specific structures.

{
  "users": [
    {
      "id": 1,
      "name": "Alice",
      "contacts": {
        "email": "alice@example.com",
        "phone": "123-456-7890"
      }
    },
    {
      "id": 2,
      "name": "Bob",
      "contacts": {
        "email": "bob@example.com",
        "phone": "098-765-4321"
      }
    }
  ]
}
# Normalize multiple levels of nesting
df = json_normalize(data["users"], sep="_", record_prefix="contacts_")
print(df)

Output:

    id   name    contacts_email       contacts_phone
0    1  Alice   alice@example.com      123-456-7890
1    2    Bob      bob@example.com      098-765-4321

Explanation:

  • json_normalize(data[“users”], sep=”_”) flattens multiple nested fields within the JSON object.

6. Working with Large JSON Files

For large JSON files, it is sometimes better to load data in chunks or only specific fields. This is especially helpful when working with massive datasets.

Example 7: Loading Large JSON Files in Chunks

# Load a JSON file in chunks
chunk_size = 1000
for chunk in pd.read_json("large_data.json", lines=True, chunksize=chunk_size):
    print(chunk.head())
  • Explanation: By setting chunksize=1000, the JSON file is read in chunks of 1000 records at a time, reducing memory usage.

Example 8: Loading Specific Fields from a JSON File

If you only need a subset of fields, you can select them by filtering the DataFrame after reading.

df = pd.read_json("large_data.json", lines=True)
df = df[["id", "name", "contacts_email"]]
print(df.head())

Explanation: This reads the JSON file and then selects only the specified columns, which can help reduce memory usage and focus on relevant data.

Summary of Key read_json() Parameters

Parameter Description
path_or_buf Path to the JSON file, URL, or JSON string.
orient Specifies the JSON orientation (default is records).
lines Set to True if each line in the JSON file represents a separate JSON object.
typ Specifies the type of JSON to parse (frame or series).
chunksize Reads the JSON file in chunks (useful for large files).

Example Summary: Full Workflow with read_json()

To demonstrate various options together, here’s a comprehensive example:

# Comprehensive JSON reading

 example
df = pd.read_json(
    "sample.json",
    orient="records",
    lines=False,
    dtype={"Age": "int32"},
)

# Display the first few rows and column data types
print(df.head())
print(df.dtypes)

Explanation:

  • This example reads sample.json with the default records orientation, specifying Age as int32 for memory efficiency.

Conclusion

In this tutorial, we covered the read_json() function in Pandas, including:

  • Basic JSON reading and handling different JSON structures.
  • Specifying JSON orientations to match data structures.
  • Reading JSON from URLs.
  • Flattening nested JSON data with json_normalize.
  • Handling large JSON files efficiently with chunk loading.

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More