Home » Reading a csv file in python

Reading a csv file in python

Java SE 11 Programmer II [1Z0-816] Practice Tests
Oracle Java Certification
Java SE 11 Developer (Upgrade) [1Z0-817]
Java SE 11 Programmer I [1Z0-815] Practice Tests
Spring Framework Basics Video Course
1 Year Subscription

In this article, we will learn about reading a csv file in python.

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields.

We will look at the CSV module first

CSV Module

Here are the functions that are available in this module

  • csv.field_size_limit – return maximum field size
  • csv.get_dialect – get the dialect which is associated with the name
  • csv.list_dialects – show all registered dialects
  • csv.reader – read data from a csv file
  • csv.register_dialect – associate dialect with name
  • csv.writer – write data to a csv file
  • csv.unregister_dialect – delete the dialect associated with the name the dialect registry
  • csv.QUOTE_ALL – Quote everything, regardless of type.
  • csv.QUOTE_MINIMAL – Quote fields with special characters
  • csv.QUOTE_NONNUMERIC – Quote all fields that aren’t numbers value
  • csv.QUOTE_NONE – Don’t quote anything in output

Reading CSV files

At first, the CSV file is opened using the open() method in ‘read’ mode which returns the file object then it is read by using the reader() method of CSV module that returns the reader object that iterates throughout the lines in the specified CSV document.

Our CSV file contains the following

Entry, Country, Capital
1, France, Paris
2, Germany, Berlin
3, Spain, Madrid
4, Italy, Rome
5, UK, London

Lets look at an example

# load csv module
import csv

# open file for reading
with open('countries.csv') as csvDataFile:

    # read file as csv file 
    csvReader = csv.reader(csvDataFile)

    # for every row, print the row
    for row in csvReader:
        print(row)

When run you should see something like this

Python 3.7.9 (bundled)
>>> %Run readcsv.py
['Entry', ' Country', ' Capital']
['1', ' France', ' Paris']
['2', ' Germany', ' Berlin']
['3', ' Spain', ' Madrid']
['4', ' Italy', ' Rome']
['5', ' UK', ' London']

You can read every row in the file. Every row is returned as an array and can be accessed as such, to print the first cells we could simply write:

print(row[0])

For the second cell, you would use:

print(row[1])

If you want to use a different delimiter such as a semi-colon simply change the reader call to this

csvReader = csv.reader(delimiter=';')

 

Read a CSV file into a Dictionary

We can also use DictReader() function to read the csv file directly into a dictionary rather than deal with a list of individual string elements.

the CSV file is first opened using the open() method then it is read by using the DictReader class of csv module which works like a regular reader but maps the information in the CSV file into a dictionary.

The very first line of the file consists of dictionary keys.

import csv      
with open('countries.csv', mode='r') as csv_file:    
    csv_reader = csv.DictReader(csv_file)    
    line_count = 0    
    for row in csv_reader:    
        if line_count == 0:    
            print(f'The Column names are as follows {", ".join(row)}')    
            line_count += 1    
        print(f'\t{row["Capital"]} is the capital of {row["Country"]}.')    
        line_count += 1    
    print(f'Processed {line_count} lines.')   

You will see the following

The Column names are as follows Entry, Country, Capital
	Paris is the capital of France.
	Berlin is the capital of Germany.
	Madrid is the capital of Spain.
	Rome is the capital of Italy.
	London is the capital of UK.
Processed 6 lines.

Reading a csv file with Pandas

Reading a csv file into a pandas DataFrame is easy and only requires a couple of lines of code.

Pandas is not part of the Python standard library, so you will need to install it with the pip package manager. Panda’s read_csv function can read multiple columns

import pandas    
df = pandas.read_csv('countries.csv')    
print(df)   

Running this and you should see something like this

>>> %Run readcsvpandas.py
   Entry  Country Capital
0      1   France   Paris
1      2  Germany  Berlin
2      3    Spain  Madrid
3      4    Italy    Rome
4      5       UK  London

To access a row you can use the index like this

print(df.loc[0])

Link

github link with examples and csv file

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More