A DataFrame in Pandas is a 2-dimensional labeled data structure like a table which has rows and columns. The size and values of the dataframe are mutable,i.e., that means that they can be modified.
A Pandas DataFrame can be created in multiple ways. In this article we will show different ways to create a DataFrame.
Syntax
The easiest way to create a pandas DataFrame is by using its constructor. A DataFrame constructor takes several optional params that are used to specify the characteristics of the DataFrame.
# DataFrame constructor syntax pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
Create a Dataframe from Lists
# Create pandas DataFrame from List import pandas as pd capitals = [ ["Tokyo","Japan","37732000"], ["Seoul","South Korea","23016000"], ] df=pd.DataFrame(technologies) print(df)
When you run this you will see the following
0 1 2 0 Tokyo Japan 37732000 1 Seoul South Korea 23016000
# Create pandas DataFrame from List import pandas as pd capitals = [ ["Tokyo","Japan","37732000"], ["Seoul","South Korea","23016000"], ] df=pd.DataFrame(capitals) column_names=["Capital","Country","Population"] row_label=["a","b"] df=pd.DataFrame(capitals,columns=column_names,index=row_label) print(df)
When run you will see the following
Capital Country Population a Tokyo Japan 37732000 b Seoul South Korea 23016000
By default, pandas will identify the data types from the data and assign’s to the DataFrame.
You can use df.dtypes to return the data type of each column.
Capital object Country object Population object dtype: object
You can change the data type of a column, in our example lets make the population a float
# Create pandas DataFrame from List import pandas as pd capitals = [ ["Tokyo","Japan","37732000"], ["Seoul","South Korea","23016000"], ] df=pd.DataFrame(capitals) column_names=["Capital","Country","Population"] row_label=["a","b"] df=pd.DataFrame(capitals,columns=column_names,index=row_label) types={'Capital': str,'Country':str,'Population':float} df=df.astype(types) print(df.dtypes) print(df)
Create DataFrame from a Dict (dictionary).
Another way to create a pandas DataFrame is from the python Dict (dictionary) object.
import pandas as pd # Create DataFrame from Dict capitals = { 'Capital':["Tokyo","Seoul"], 'Country' :["Japan","South Korea"], 'Population':['37732000','23016000'] } df = pd.DataFrame(capitals) print(df)
Create an empty DataFrame in pandas
Sometimes you would need to create an empty pandas DataFrame with or without columns.
# Create an empty pandas DataFrame import pandas as pd df = pd.DataFrame() print(df)
You can also create an empty DataFrame with just column names but no data.
import pandas as pd df = pd.DataFrame(columns = ["Capital","Country","Population"]) print(df)
Create DataFrame From CSV File
A common use case is that we are often required to read the contents of CSV files and create a DataFrame.
In pandas, creating a DataFrame from CSV is done by using pandas.read_csv() method. This returns a DataFrame with the contents of a CSV file.
import pandas as pd # Create DataFrame from CSV file df = pd.read_csv('testdata.csv')
Creating dataframe from series
To create a dataframe from series, we must pass the series as an argument to DataFrame() function.
import pandas as pd # Initialize data to series. d = pd.Series(['Tokyo', 'Berlin', 'Paris']) # creates Dataframe. df = pd.DataFrame(d) # Print data. print(df)
Creating DataFrame using zip() function.
Two lists can be merged by using list(zip()) function. Now, create the pandas DataFrame by calling pd.DataFrame() function.
import pandas as pd # List1 Capital = ['Tokyo', 'Berlin', 'Paris'] # List2 Country = ['Japan', 'Germany','France'] # get the list of tuples from two lists. # and merge them by using zip(). list_of_tuples = list(zip(Capital, Country)) # Assign data to tuples. list_of_tuples # Converting lists of tuples into # pandas Dataframe. df = pd.DataFrame(list_of_tuples, columns=['Capital', 'Country']) # Print data. print(df)