JSON (JavaScript Object Notation) is a popular data format that’s commonly used for transmitting data between servers and web applications. JSON is also widely used for data storage.
Pandas provides a flexible and powerful way to work with JSON files through its read_json() function, which reads JSON data into a DataFrame.
In this tutorial, we will cover:
- Basic JSON Reading
- Reading JSON from Different File Structures
- Specifying JSON Orientations
- Reading JSON from URLs
- Handling Nested JSON Data
- Working with Large JSON Files
Let’s explore each topic with examples.
1. Basic JSON Reading
The simplest way to read a JSON file in Pandas is to pass the file path to read_json().
Example 1: Basic JSON Reading
import pandas as pd # Load JSON data from a file into a DataFrame df = pd.read_json("sample.json") print(df.head()) # Display the first few rows
- Explanation: This reads the JSON file sample.json into a DataFrame df.
JSON Example Structure:
If sample.json is structured as follows, it will load smoothly into a DataFrame:
[ {"Name": "Alice", "Age": 24, "City": "New York"}, {"Name": "Bob", "Age": 27, "City": "Los Angeles"}, {"Name": "Charlie", "Age": 22, "City": "Chicago"} ]
Output:
Name Age City 0 Alice 24 New York 1 Bob 27 Los Angeles 2 Charlie 22 Chicago
2. Reading JSON from Different File Structures
JSON files can have various structures, such as objects, arrays, or nested data. By default, Pandas reads JSON in the records orientation, but you can adjust the orientation to match the structure of your JSON file.
Example 2: JSON Data as an Object of Arrays
{ "Name": ["Alice", "Bob", "Charlie"], "Age": [24, 27, 22], "City": ["New York", "Los Angeles", "Chicago"] }
To read this JSON structure, use orient='split':
df = pd.read_json("sample_object.json", orient="split") print(df)
Output:
Name Age City 0 Alice 24 New York 1 Bob 27 Los Angeles 2 Charlie 22 Chicago
3. Specifying JSON Orientations
JSON can be structured in various orientations, and Pandas can read these different structures by specifying the orient parameter. The most common JSON orientations are records, split, index, columns, and values.
Example 3: JSON Orientations
Here’s a summary of JSON structures and how to read them with read_json():
Orientation: Records (Default)
[ {"Name": "Alice", "Age": 24, "City": "New York"}, {"Name": "Bob", "Age": 27, "City": "Los Angeles"}, {"Name": "Charlie", "Age": 22, "City": "Chicago"} ]
df = pd.read_json("sample_records.json", orient="records")
Orientation: Index
{ "Name": {"0": "Alice", "1": "Bob", "2": "Charlie"}, "Age": {"0": 24, "1": 27, "2": 22}, "City": {"0": "New York", "1": "Los Angeles", "2": "Chicago"} }
df = pd.read_json("sample_index.json", orient="index")
Orientation: Columns
{ "Name": ["Alice", "Bob", "Charlie"], "Age": [24, 27, 22], "City": ["New York", "Los Angeles", "Chicago"] }
df = pd.read_json("sample_columns.json", orient="columns")
Explanation:
- orient=”records” is the default and treats each JSON object as a row.
- orient=”index” expects keys to represent index labels.
- orient=”columns” treats keys as column headers.
4. Reading JSON from URLs
You can also read JSON data directly from a URL, which is useful for loading JSON data from APIs.
Example 4: Reading JSON from a URL
url = "https://jsonplaceholder.typicode.com/users" df = pd.read_json(url) print(df.head())
Explanation: This reads JSON data from the URL and loads it into a DataFrame. Ensure you have an active internet connection and use an accessible JSON URL.
5. Handling Nested JSON Data
JSON files often contain nested structures, such as dictionaries or lists within lists. To handle nested JSON, we can use the json_normalize() function, which flattens nested structures.
Example 5: Flattening Nested JSON
Given the following JSON structure:
[ { "Name": "Alice", "Age": 24, "Address": { "City": "New York", "Zip": "10001" } }, { "Name": "Bob", "Age": 27, "Address": { "City": "Los Angeles", "Zip": "90001" } } ]
We can flatten the nested structure using json_normalize:
import json from pandas import json_normalize # Load JSON and flatten it with open("nested.json", "r") as file: data = json.load(file) df = json_normalize(data, sep="_") print(df)
Output:
Name Age Address_City Address_Zip 0 Alice 24 New York 10001 1 Bob 27 Los Angeles 90001
- Explanation: json_normalize(data, sep=”_”) flattens the nested dictionary under the “Address” key, creating separate columns for each nested field.
Example 6: Flattening Multiple Levels of Nesting
For deeply nested JSON data, you can specify multiple paths to normalize specific structures.
{ "users": [ { "id": 1, "name": "Alice", "contacts": { "email": "alice@example.com", "phone": "123-456-7890" } }, { "id": 2, "name": "Bob", "contacts": { "email": "bob@example.com", "phone": "098-765-4321" } } ] }
# Normalize multiple levels of nesting df = json_normalize(data["users"], sep="_", record_prefix="contacts_") print(df)
Output:
id name contacts_email contacts_phone 0 1 Alice alice@example.com 123-456-7890 1 2 Bob bob@example.com 098-765-4321
Explanation:
- json_normalize(data[“users”], sep=”_”) flattens multiple nested fields within the JSON object.
6. Working with Large JSON Files
For large JSON files, it is sometimes better to load data in chunks or only specific fields. This is especially helpful when working with massive datasets.
Example 7: Loading Large JSON Files in Chunks
# Load a JSON file in chunks chunk_size = 1000 for chunk in pd.read_json("large_data.json", lines=True, chunksize=chunk_size): print(chunk.head())
- Explanation: By setting chunksize=1000, the JSON file is read in chunks of 1000 records at a time, reducing memory usage.
Example 8: Loading Specific Fields from a JSON File
If you only need a subset of fields, you can select them by filtering the DataFrame after reading.
df = pd.read_json("large_data.json", lines=True) df = df[["id", "name", "contacts_email"]] print(df.head())
Explanation: This reads the JSON file and then selects only the specified columns, which can help reduce memory usage and focus on relevant data.
Summary of Key read_json() Parameters
Parameter | Description |
---|---|
path_or_buf | Path to the JSON file, URL, or JSON string. |
orient | Specifies the JSON orientation (default is records). |
lines | Set to True if each line in the JSON file represents a separate JSON object. |
typ | Specifies the type of JSON to parse (frame or series). |
chunksize | Reads the JSON file in chunks (useful for large files). |
Example Summary: Full Workflow with read_json()
To demonstrate various options together, here’s a comprehensive example:
# Comprehensive JSON reading example df = pd.read_json( "sample.json", orient="records", lines=False, dtype={"Age": "int32"}, ) # Display the first few rows and column data types print(df.head()) print(df.dtypes)
Explanation:
- This example reads sample.json with the default records orientation, specifying Age as int32 for memory efficiency.
Conclusion
In this tutorial, we covered the read_json() function in Pandas, including:
- Basic JSON reading and handling different JSON structures.
- Specifying JSON orientations to match data structures.
- Reading JSON from URLs.
- Flattening nested JSON data with json_normalize.
- Handling large JSON files efficiently with chunk loading.