Home » Pandas read_csv() function with Examples

Pandas read_csv() function with Examples

Java SE 11 Developer (Upgrade) [1Z0-817]
1 Year Subscription
Java SE 11 Programmer II [1Z0-816] Practice Tests
Java SE 11 Programmer I [1Z0-815] Practice Tests
Spring Framework Basics Video Course
Oracle Java Certification

In this article we will look at using the Pandas read_csv() function to read a CSV file into a DataFrame

Syntax

Following is the Syntax of read_csv() function.

 

pandas.read_csv(filepath_or_buffer, sep=NoDefault.no_default, delimiter=None, header='infer', names=NoDefault.no_default, index_col=None, usecols=None, squeeze=None, prefix=NoDefault.no_default, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=None, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, encoding_errors='strict', dialect=None, error_bad_lines=None, warn_bad_lines=None, on_bad_lines=None, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None, storage_options=None)

 

The list of parameters can be bewildering and many of these you will never use.

Read a CSV file into DataFrame

In this example we will load a population csv file using the read.csv function

# Import pandas
import pandas as pd

# Read CSV file into DataFrame
df = pd.read_csv('population.csv')
print(df)

 

When you run this you will see the following, I use Visual Studio code

Rank CCA3 Country/Territory Capital Continent 2022 Population Area (km²) Density (per km²) Growth Rate World Population Percentage
0 36 AFG Afghanistan Kabul Asia 41128771 652230 63.0587 1.0257 0.52
1 138 ALB Albania Tirana Europe 2842321 28748 98.8702 0.9957 0.04
2 34 DZA Algeria Algiers Africa 44903225 2381741 18.8531 1.0164 0.56
3 213 ASM American Samoa Pago Pago Oceania 44273 199 222.4774 0.9831 0.00
4 203 AND Andorra Andorra la Vella Europe 79824 468 170.5641 1.0100 0.00
.. … … … … … … … … … …
229 226 WLF Wallis and Futuna Mata-Utu Oceania 11572 142 81.4930 0.9953 0.00
230 172 ESH Western Sahara El Aaiún Africa 575986 266000 2.1654 1.0184 0.01
231 46 YEM Yemen Sanaa Asia 33696614 527968 63.8232 1.0217 0.42
232 63 ZMB Zambia Lusaka Africa 20017675 752612 26.5976 1.0280 0.25
233 74 ZWE Zimbabwe Harare Africa 16320537 390757 41.7665 1.0204 0.20

By default, it reads the first rows off the CSV file as column names and it creates an incremental numerical number as an index which starts from zero.

You can use either the sep or delimiter to specify the separator of the columns. The default is a comma, which is what the sample file is.

Set Column as Index

You can set a specific column as an index using index_col as param.

This param takes values {int, str, sequence of int / str, or False, optional, default None}.

I like the look of Rank, so lets use that

import pandas as pd

# Read CSV file into DataFrame
df = pd.read_csv('population.csv', index_col='Rank')
print(df)

Run this and you will see this

CCA3 Country/Territory Capital Continent 2022 Population Area (km²) Density (per km²) Growth Rate World Population Percentage
Rank
36 AFG Afghanistan Kabul Asia 41128771 652230 63.0587 1.0257 0.52
138 ALB Albania Tirana Europe 2842321 28748 98.8702 0.9957 0.04
34 DZA Algeria Algiers Africa 44903225 2381741 18.8531 1.0164 0.56

Skip Rows

Sometimes you may need to skip the first rows or skip footer rows, you can use the skiprows and skipfooter parameters.

import pandas as pd

# Read CSV file into DataFrame
df = pd.read_csv('population.csv', index_col='Rank')
print(df)
df = pd.read_csv('population.csv', header=None, skiprows=5)
print(df)

 

Ignore Column Names

By default, the first row is used as a header and assigned as the DataFrame column names.

If you do not want to consider the first row as a data record then use header=None param and use the names param to specify column names.

Not specifying names results in column names with numerical numbers.

import pandas as pd

# Read CSV file into DataFrame
df = pd.read_csv('population.csv', index_col='Rank')
print(df)
columns = ['rank','code','capital','continent' ,'population','area', 'density','growth', 'percentage']
df = pd.read_csv('population.csv', header=None,names=columns,skiprows=1)
print(df)

Running this will result in this

rank code capital continent population area density growth percentage
36 AFG Afghanistan Kabul Asia 41128771 652230 63.0587 1.0257 0.52
138 ALB Albania Tirana Europe 2842321 28748 98.8702 0.9957 0.04
34 DZA Algeria Algiers Africa 44903225 2381741 18.8531 1.0164 0.56

Load only Certain Columns

You can use the usecols param and select columns to load from the CSV file.

This takes columns as a list of strings or a list of integers.

import pandas as pd

# Read CSV file into DataFrame
df = pd.read_csv('population.csv', usecols =['Country/Territory','Capital','Continent'])
print(df)

When you run this you will see the following

Country/Territory Capital Continent
0 Afghanistan Kabul Asia
1 Albania Tirana Europe
2 Algeria Algiers Africa
3 American Samoa Pago Pago Oceania
4 Andorra Andorra la Vella Europe

 

References

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

https://github.com/programmershelp/maxpython/tree/main/pandas/readcsv

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More