Home » Create a Pandas DataFrame With Examples

Create a Pandas DataFrame With Examples

Spring Framework Basics Video Course
Oracle Java Certification
Java SE 11 Programmer II [1Z0-816] Practice Tests
1 Year Subscription
Java SE 11 Programmer I [1Z0-815] Practice Tests
Java SE 11 Developer (Upgrade) [1Z0-817]

A DataFrame in Pandas is a 2-dimensional labeled data structure like a table which has rows and columns. The size and values of the dataframe are mutable,i.e., that means that they can be modified.

A Pandas DataFrame can be created in multiple ways. In this article we will show different ways to create a DataFrame.

Syntax

The easiest way to create a pandas DataFrame is by using its constructor. A DataFrame constructor takes several optional params that are used to specify the characteristics of the DataFrame.

# DataFrame constructor syntax
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)

 

Create a Dataframe from Lists

# Create pandas DataFrame from List
import pandas as pd
capitals = [ ["Tokyo","Japan","37732000"], 
                 ["Seoul","South Korea","23016000"], 
               ]
df=pd.DataFrame(technologies)
print(df)

When you run this you will see the following

 0 1 2
0 Tokyo Japan 37732000
1 Seoul South Korea 23016000
The DataFrame by default assigns incremental sequence numbers as labels to both rows and columns because we did not give it an index or column labels.
We will now use the column param and the index param to provide column and custom index to the DataFrame. Lets do that now
# Create pandas DataFrame from List
import pandas as pd
capitals = [ ["Tokyo","Japan","37732000"], 
                 ["Seoul","South Korea","23016000"], 
               ]
df=pd.DataFrame(capitals)
column_names=["Capital","Country","Population"]
row_label=["a","b"]
df=pd.DataFrame(capitals,columns=column_names,index=row_label)
print(df)

When run you will see the following

 Capital Country Population
a Tokyo Japan 37732000
b Seoul South Korea 23016000

By default, pandas will identify the data types from the data and assign’s to the DataFrame.

You can use df.dtypes to return the data type of each column.

Capital object
Country object
Population object
dtype: object

You can change the data type of a column, in our example lets make the population a float

# Create pandas DataFrame from List
import pandas as pd
capitals = [ ["Tokyo","Japan","37732000"], 
                 ["Seoul","South Korea","23016000"], 
               ]
df=pd.DataFrame(capitals)
column_names=["Capital","Country","Population"]
row_label=["a","b"]
df=pd.DataFrame(capitals,columns=column_names,index=row_label)
types={'Capital': str,'Country':str,'Population':float}
df=df.astype(types)
print(df.dtypes)
print(df)

 

Create DataFrame from a Dict (dictionary).

Another way to create a pandas DataFrame is from the python Dict (dictionary) object.

import pandas as pd
# Create DataFrame from Dict
capitals = {
    'Capital':["Tokyo","Seoul"],
    'Country' :["Japan","South Korea"],
    'Population':['37732000','23016000']
              }
df = pd.DataFrame(capitals)
print(df)

 

Create an empty DataFrame in pandas

Sometimes you would need to create an empty pandas DataFrame with or without columns.

# Create an empty pandas DataFrame
import pandas as pd
df = pd.DataFrame()
print(df)

You can also create an empty DataFrame with just column names but no data.

import pandas as pd
df = pd.DataFrame(columns = ["Capital","Country","Population"])
print(df)

 

Create DataFrame From CSV File

A common use case is that we are often required to read the contents of CSV files and create a DataFrame.

In pandas, creating a DataFrame from CSV is done by using pandas.read_csv() method. This returns a DataFrame with the contents of a CSV file.

import pandas as pd
# Create DataFrame from CSV file
df = pd.read_csv('testdata.csv')

 

Creating dataframe from series

To create a dataframe from series, we must pass the series as an argument to DataFrame() function.

import pandas as pd
  
# Initialize data to series.
d =  pd.Series(['Tokyo', 'Berlin', 'Paris'])
# creates Dataframe.
df = pd.DataFrame(d)
  
# Print data.
print(df)

 

Creating DataFrame using zip() function.

Two lists can be merged by using list(zip()) function. Now, create the pandas DataFrame by calling pd.DataFrame() function.

import pandas as pd
  
# List1
Capital = ['Tokyo', 'Berlin', 'Paris']
  
# List2
Country = ['Japan', 'Germany','France']
  
# get the list of tuples from two lists.
# and merge them by using zip().
list_of_tuples = list(zip(Capital, Country))
  
# Assign data to tuples.
list_of_tuples
  
  
# Converting lists of tuples into
# pandas Dataframe.
df = pd.DataFrame(list_of_tuples,
                  columns=['Capital', 'Country'])
  
# Print data.
print(df)

 

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More