In this article, we will learn about reading a csv file in python.
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields.
We will look at the CSV module first
CSV Module
Here are the functions that are available in this module
- csv.field_size_limit – return maximum field size
- csv.get_dialect – get the dialect which is associated with the name
- csv.list_dialects – show all registered dialects
- csv.reader – read data from a csv file
- csv.register_dialect – associate dialect with name
- csv.writer – write data to a csv file
- csv.unregister_dialect – delete the dialect associated with the name the dialect registry
- csv.QUOTE_ALL – Quote everything, regardless of type.
- csv.QUOTE_MINIMAL – Quote fields with special characters
- csv.QUOTE_NONNUMERIC – Quote all fields that aren’t numbers value
- csv.QUOTE_NONE – Don’t quote anything in output
Reading CSV files
At first, the CSV file is opened using the open() method in ‘read’ mode which returns the file object then it is read by using the reader() method of CSV module that returns the reader object that iterates throughout the lines in the specified CSV document.
Our CSV file contains the following
Entry, Country, Capital
1, France, Paris
2, Germany, Berlin
3, Spain, Madrid
4, Italy, Rome
5, UK, London
Lets look at an example
# load csv module import csv # open file for reading with open('countries.csv') as csvDataFile: # read file as csv file csvReader = csv.reader(csvDataFile) # for every row, print the row for row in csvReader: print(row)
When run you should see something like this
Python 3.7.9 (bundled) >>> %Run readcsv.py ['Entry', ' Country', ' Capital'] ['1', ' France', ' Paris'] ['2', ' Germany', ' Berlin'] ['3', ' Spain', ' Madrid'] ['4', ' Italy', ' Rome'] ['5', ' UK', ' London']
You can read every row in the file. Every row is returned as an array and can be accessed as such, to print the first cells we could simply write:
print(row[0])
For the second cell, you would use:
print(row[1])
If you want to use a different delimiter such as a semi-colon simply change the reader call to this
csvReader = csv.reader(delimiter=';')
Read a CSV file into a Dictionary
We can also use DictReader() function to read the csv file directly into a dictionary rather than deal with a list of individual string elements.
the CSV file is first opened using the open() method then it is read by using the DictReader class of csv module which works like a regular reader but maps the information in the CSV file into a dictionary.
The very first line of the file consists of dictionary keys.
import csv with open('countries.csv', mode='r') as csv_file: csv_reader = csv.DictReader(csv_file) line_count = 0 for row in csv_reader: if line_count == 0: print(f'The Column names are as follows {", ".join(row)}') line_count += 1 print(f'\t{row["Capital"]} is the capital of {row["Country"]}.') line_count += 1 print(f'Processed {line_count} lines.')
You will see the following
The Column names are as follows Entry, Country, Capital Paris is the capital of France. Berlin is the capital of Germany. Madrid is the capital of Spain. Rome is the capital of Italy. London is the capital of UK. Processed 6 lines.
Reading a csv file with Pandas
Reading a csv file into a pandas DataFrame is easy and only requires a couple of lines of code.
Pandas is not part of the Python standard library, so you will need to install it with the pip package manager. Panda’s read_csv function can read multiple columns
import pandas df = pandas.read_csv('countries.csv') print(df)
Running this and you should see something like this
>>> %Run readcsvpandas.py Entry Country Capital 0 1 France Paris 1 2 Germany Berlin 2 3 Spain Madrid 3 4 Italy Rome 4 5 UK London
To access a row you can use the index like this
print(df.loc[0])
Link
github link with examples and csv file