Home » Python Pandas DataFrame Tutorial with Examples

Python Pandas DataFrame Tutorial with Examples

Java SE 11 Programmer II [1Z0-816] Practice Tests
Java SE 11 Developer (Upgrade) [1Z0-817]
1 Year Subscription
Spring Framework Basics Video Course
Java SE 11 Programmer I [1Z0-815] Practice Tests
Oracle Java Certification

A Pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure.

It is the primary data structure used in the Pandas library and is perfect for organizing and analyzing structured data.

The DataFrame is similar to a spreadsheet, a SQL table, or a dictionary of Series objects.

In this tutorial, we will cover:

  1. Creating a DataFrame
  2. Accessing Data in a DataFrame
  3. Modifying Data in a DataFrame
  4. Performing Operations on DataFrames
  5. DataFrame Methods for Analysis
  6. Handling Missing Data in a DataFrame

Let’s explore each section with code examples.

1. Creating a DataFrame

You can create a DataFrame from various data structures, including lists, dictionaries, and NumPy arrays.

Example 1: Creating a DataFrame from a Dictionary

import pandas as pd

# Create a DataFrame from a dictionary
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [24, 27, 22],
    "City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)
print(df)

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago
  • Explanation: Each key in the dictionary becomes a column, and each value in the list corresponds to the row entries.

Example 2: Creating a DataFrame from a List of Lists

# Create a DataFrame from a list of lists
data = [
    ["Alice", 24, "New York"],
    ["Bob", 27, "Los Angeles"],
    ["Charlie", 22, "Chicago"]
]
df = pd.DataFrame(data, columns=["Name", "Age", "City"])
print(df)

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago

Example 3: Creating a DataFrame from a Dictionary of Series

# Create a DataFrame from a dictionary of Series
data = {
    "Name": pd.Series(["Alice", "Bob", "Charlie"]),
    "Age": pd.Series([24, 27, 22]),
    "City": pd.Series(["New York", "Los Angeles", "Chicago"])
}
df = pd.DataFrame(data)
print(df)
  • Explanation: You can also use Pandas Series as values for each column, which can be useful if you need each column to be a Series object.

2. Accessing Data in a DataFrame

Pandas provides multiple ways to access and retrieve data in a DataFrame, including indexing, slicing, and selecting rows and columns.

Example 4: Selecting a Column by Label

# Select a column by label
print(df["Name"])  # Outputs the "Name" column

Output:

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

Example 5: Selecting Multiple Columns

# Select multiple columns
print(df[["Name", "Age"]])

Output:

      Name  Age
0    Alice   24
1      Bob   27
2  Charlie   22

Example 6: Selecting Rows by Index Position with iloc

# Select rows by index position
print(df.iloc[1])  # Output the second row

Output:

Name           Bob
Age             27
City    Los Angeles
Name: 1, dtype: object

Example 7: Selecting Rows by Label with loc

# Set a custom index
df.set_index("Name", inplace=True)
print(df)

# Select a row by label
print(df.loc["Alice"])  # Outputs the row for Alice

Output:

         Age         City
Name                    
Alice      24     New York
Bob        27  Los Angeles
Charlie    22      Chicago

Age      24
City     New York
Name: Alice, dtype: object

3. Modifying Data in a DataFrame

You can modify data in a DataFrame by adding, updating, or deleting columns and rows.

Example 8: Adding a New Column

# Add a new column
df["Salary"] = [50000, 55000, 60000]
print(df)

Output:

         Age         City  Salary
Name                             
Alice      24     New York   50000
Bob        27  Los Angeles   55000
Charlie    22      Chicago   60000

Example 9: Updating Column Values

# Update a column value for a specific row
df.at["Alice", "Salary"] = 52000
print(df)

Output:

         Age         City  Salary
Name                             
Alice      24     New York   52000
Bob        27  Los Angeles   55000
Charlie    22      Chicago   60000

Example 10: Deleting a Column

# Delete a column
df.drop(columns=["Salary"], inplace=True)
print(df)

Output:

         Age         City
Name                    
Alice      24     New York
Bob        27  Los Angeles
Charlie    22      Chicago

Example 11: Adding a New Row

# Add a new row
df.loc["David"] = [29, "San Francisco"]
print(df)

Output:

            Age           City
Name                          
Alice         24       New York
Bob           27    Los Angeles
Charlie       22        Chicago
David         29  San Francisco

4. Performing Operations on DataFrames

DataFrames support element-wise and scalar operations.

Example 12: Arithmetic Operations

# Add a constant to the "Age" column
df["Age"] = df["Age"] + 1
print(df)

Output:

            Age           City
Name                          
Alice         25       New York
Bob           28    Los Angeles
Charlie       23        Chicago
David         30  San Francisco

Example 13: Using Conditional Selection

# Select rows based on a condition
young_people = df[df["Age"] < 28]
print(young_people)

Output:

            Age         City
Name                        
Alice         25     New York
Charlie       23      Chicago

5. DataFrame Methods for Analysis

DataFrames have built-in methods for quick data analysis and summarization.

Example 14: Summary Statistics

# Summary statistics
print(df.describe())

Output:

             Age
count   4.000000
mean   26.500000
std     3.109126
min    23.000000
25%    24.500000
50%    26.500000
75%    28.500000
max    30.000000

Example 15: Using value_counts() for Categorical Data

# Count occurrences of each city
city_counts = df["City"].value_counts()
print(city_counts)

Output:

New York         1
Los Angeles      1
Chicago          1
San Francisco    1
Name: City, dtype: int64

6. Handling Missing Data in a DataFrame

Handling missing values is essential when working with real-world datasets.

Example 16: Detecting Missing Values

# Create a DataFrame with NaN values
data = {
    "Name": ["Alice", "Bob", None],
    "Age": [24, None, 22],
    "City": ["New York", "Los Angeles", None]
}
df = pd.DataFrame(data)
print(df.isnull())

Output:

    Name    Age   City
0  False  False  False
1  False   True  False
2   True  False   True

Example 17: Filling Missing Values

# Fill missing values with a placeholder
df.fillna("Unknown", inplace=True)
print(df)

Output:

      Name     Age         City
0    Alice    24.0     New York
1      Bob  Unknown  Los Angeles
2  Unknown    22.0      Unknown

Example 18: Dropping Rows with Missing Values

# Drop rows with NaN values
df = pd.DataFrame(data)  # Recreate the original DataFrame
df.dropna(inplace=True)
print(df)

Output:

    Name   Age       City
0  Alice  24.0   New York

Summary of Key Pandas DataFrame Concepts

Concept Description
Creating a DataFrame DataFrames can be created from dictionaries, lists, or Series.
Accessing Data Use indexing, slicing, and loc/iloc for accessing rows and columns.
Modifying Data DataFrames allow adding, updating, and deleting rows and columns.
Data Analysis Methods Methods like describe(), value_counts(), and sum() are useful for analysis.
Handling Missing Data Use fillna(), dropna(), and isnull() to manage NaN values.

Conclusion

In this tutorial, we explored the Pandas DataFrame object, covering:

  • Creating DataFrames from various data structures.
  • Accessing and modifying data within a DataFrame.
  • Applying conditional operations and performing data analysis.
  • Handling missing data effectively.

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More