Getting Started with Pandas in Python: A Beginner's Tutorial

Pandas is a powerful Python library for data analysis and data manipulation.

It provides data structures and tools that allow you to quickly explore, clean, and analyze data.

The two primary data structures in Pandas are Series (1D) and DataFrame (2D), both of which are highly versatile for handling and manipulating data.

In this tutorial, we will cover:

Installing Pandas
Basic Pandas Objects (Series and DataFrame)
Creating DataFrames and Series
Basic Data Exploration
Data Selection and Filtering
Modifying Data in DataFrames
Descriptive Statistics

Let's dive in and explore Pandas step-by-step!

1. Installing Pandas

To install Pandas, use the following command:

pip install pandas

Once installed, you can import it in your script or Jupyter Notebook as follows:

import pandas as pd

2. Basic Pandas Objects: Series and DataFrame

Series

A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floats, etc.). It is similar to a single column in a table or Excel sheet.

import pandas as pd

# Creating a Series
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)

Output:

0    10
1    20
2    30
3    40
dtype: int64

Each value has a default index starting from 0.

DataFrame

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table.

# Creating a DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)
print(df)

Output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

3. Creating DataFrames and Series

You can create Pandas DataFrames and Series from lists, dictionaries, and even external data sources like CSV files.

Creating a Series from a Dictionary

# Creating a Series from a dictionary
data = {"A": 1, "B": 2, "C": 3}
series = pd.Series(data)
print(series)

Output:

A    1
B    2
C    3
dtype: int64

Creating a DataFrame from a List of Dictionaries

# Creating a DataFrame from a list of dictionaries
data = [
    {"Name": "Alice", "Age": 25},
    {"Name": "Bob", "Age": 30},
    {"Name": "Charlie", "Age": 35}
]
df = pd.DataFrame(data)
print(df)

Output:

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Loading Data from a CSV File

You can also load data directly from a CSV file:

df = pd.read_csv("sample.csv")
print(df.head())  # Display the first few rows

Explanation: read_csv() reads the CSV file into a DataFrame, and head() displays the first few rows.

4. Basic Data Exploration

Once you load data into a DataFrame, it’s useful to explore it before diving into analysis.

Example 1: Displaying the Top Rows

# Display the first few rows of the DataFrame
print(df.head())

Example 2: Viewing Data Types

# Get the data types of each column
print(df.dtypes)

Example 3: Displaying Basic Information

# Display basic info about the DataFrame
print(df.info())

Example 4: Getting Summary Statistics

# Display summary statistics for numerical columns
print(df.describe())

Explanation: describe() provides statistics like mean, median, minimum, and maximum values for numeric columns.

5. Data Selection and Filtering

Pandas offers various ways to select and filter data in DataFrames.

Selecting Columns

You can select a single column by specifying its name in square brackets:

# Select a single column
print(df["Name"])

To select multiple columns, provide a list of column names:

# Select multiple columns
print(df[["Name", "Age"]])

Selecting Rows by Index with iloc

Use iloc for positional-based selection.

# Select the first row
print(df.iloc[0])

Selecting Rows by Condition

You can filter rows based on conditions:

# Filter rows where Age is greater than 30
filtered_df = df[df["Age"] > 30]
print(filtered_df)

Output:

      Name  Age         City
2  Charlie   35      Chicago

6. Modifying Data in DataFrames

DataFrames are mutable, which means you can modify values within them. This includes updating, adding, and removing columns and rows.

Adding a New Column

# Add a new column
df["Salary"] = [50000, 60000, 70000]
print(df)

Output:

      Name  Age         City  Salary
0    Alice   25     New York  50000
1      Bob   30  Los Angeles  60000
2  Charlie   35      Chicago  70000

Updating Values in a Column

# Update values in the "Salary" column
df["Salary"] = df["Salary"] + 5000
print(df)

Output:

      Name  Age         City  Salary
0    Alice   25     New York  55000
1      Bob   30  Los Angeles  65000
2  Charlie   35      Chicago  75000

Deleting a Column

# Delete a column
df.drop(columns=["Salary"], inplace=True)
print(df)

Output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

Adding a New Row

You can add a new row by using append().

# Add a new row
new_row = {"Name": "David", "Age": 28, "City": "San Francisco"}
df = df.append(new_row, ignore_index=True)
print(df)

Output:

      Name  Age           City
0    Alice   25       New York
1      Bob   30    Los Angeles
2  Charlie   35        Chicago
3    David   28  San Francisco

7. Descriptive Statistics

Pandas provides many functions for calculating descriptive statistics on DataFrames.

Example 1: Mean, Median, and Sum

# Calculate the mean, median, and sum of the "Age" column
mean_age = df["Age"].mean()
median_age = df["Age"].median()
sum_age = df["Age"].sum()

print("Mean Age:", mean_age)
print("Median Age:", median_age)
print("Sum of Ages:", sum_age)

Example 2: Grouping Data with groupby()

You can use groupby() to group data based on a specific column and perform calculations.

# Calculate the mean age by City
grouped_df = df.groupby("City")["Age"].mean()
print(grouped_df)

Output:

City
Chicago           35.0
Los Angeles       30.0
New York          25.0
San Francisco     28.0
Name: Age, dtype: float64

Example 3: Value Counts

To count occurrences of unique values in a column, use value_counts().

# Count occurrences of each city
city_counts = df["City"].value_counts()
print(city_counts)

Output:

New York         1
Los Angeles      1
Chicago          1
San Francisco    1
Name: City, dtype: int64

Summary of Key Concepts in Pandas

Concept	Description
Series	A 1D labeled array, similar to a single column in Excel.
DataFrame	A 2D labeled data structure, like a table or spreadsheet.
Basic Exploration	Use head(), info(), describe() to understand data at a glance.
Selection and Filtering	Select data with column names, iloc, and filtering conditions.
Modifying Data	Add, update, or delete rows and columns within the DataFrame.
Descriptive Statistics	Use functions like mean(), sum(), and groupby() for statistical analysis.

Conclusion

In this tutorial, we explored the basics of Pandas in Python, covering:

Installing Pandas and creating basic data structures (Series and DataFrames).
Loading and exploring data, modifying values, and filtering data.
Performing statistical operations and grouping data for analysis.

Pandas

Getting Started with Pandas in Python: A Beginner’s Tutorial

1. Installing Pandas

2. Basic Pandas Objects: Series and DataFrame

Series

DataFrame

3. Creating DataFrames and Series

Creating a Series from a Dictionary

Creating a DataFrame from a List of Dictionaries

Loading Data from a CSV File

4. Basic Data Exploration

Example 1: Displaying the Top Rows

Example 2: Viewing Data Types

Example 3: Displaying Basic Information

Example 4: Getting Summary Statistics

5. Data Selection and Filtering

Selecting Columns

Selecting Rows by Index with iloc

Selecting Rows by Condition

6. Modifying Data in DataFrames

Adding a New Column

Updating Values in a Column

Deleting a Column

Adding a New Row

7. Descriptive Statistics

Example 1: Mean, Median, and Sum

Example 2: Grouping Data with groupby()

Example 3: Value Counts

Summary of Key Concepts in Pandas

Conclusion

Python Pandas: Reading JSON Files with read_json()

Getting Started with NumPy in Python: A Beginner’s Tutorial

You may also like