How to manipulate comma separated values, or csv files with Python.

Sometimes, your program would need to work on data that is generated by some other program. Or you have some business report that has been exported to a CSV file and your program now needs to read or write to such CSV files.

What are CSV Files?

  • CSV stands for ‘Comma-separated values’
  • It is a plain text file which stores tabular data. For example:

values, separated, by, commas

notice, the, last, word, in, each, row

it, doesn’t, contain, a, comma

instead, it, has, a, newline, character

  • CSV is the most common import and export format for spreadsheets and databases
  • No official ‘CSV standard’ exists, so the format of a CSV varies based on the application reading and writing the file
  • Variations in CSV files across applications cause difficulties in processing multiple-source CSVs
  • Python’s ‘csv’ module streamlines the reading and writing operations across CSVs of various sources, making these operations easier for programmers.

What should I know about Python’s ‘csv’ module?

  • The module implements classes for reading and writing CSV files
  • Programmers can choose their desired format for the CSV file without knowing its exact implementation in that format – e.g. you can choose to read/write data from/to Excel or SQLite as required.
  • Important classes in CSV module:
    • Reader: Reads sequences from a CSV file
    • Writer: Writes sequences to a CSV file
    • DictReader: Reads data in dictionary format {fieldname1 : value1…} from a CSV file
    • DictWriter: Writes data present in dictionary form  {fieldname1 : value1…}  to a CSV file
  • Input to the CSV module should preferably be in UTF-8 or printable ASCII due to the issues in ASCII NULL characters and lack of support for Unicode input.

Here are examples to illustrate how to use the important classes in the Python CSV module:

Suppose you have a CSV file as follows:

Hello.csv

Hi, hi, hi, hello

Hello, hello, hi, hi

1) Use of csv.reader(csvfile, dialect=’excel’, **fmtparams):

1
2
3
4
5
6
7
8
9
import csv
fields = []
rows = []
with open("hello.csv", 'r') as file:
helloiterator = csv.reader(file)  #csv.reader produces an iterator which can traverse through      #hello.csv
    fields = helloiterator.next()
    for row in helloiterator:
        rows.append(row) #Each word in a row is appended to the ‘row’ list. This continues till end of file.
print(rows)

1
2
Output:
[‘Hi’, ‘hi’, ‘hi’, ‘hello’, ‘Hello’, ‘hello’, ‘hi’, ‘hi’]

2) Using csv.writer(csvfile, dialect=’excel’, **fmtparams):

1
2
3
4
import csv
with open('hello.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile, delimiter=',')
    writer.writerow(['Help', 'Coffee', 'Tea']) #writes the row into the csv file

1
2
3
4
5
6
(Modified CSV file)
hello.csv

Hi, hi, hi, hello
Hello, hello, hi, hi
Help, Coffee, Tea

3) Using DictReader(file, fieldnames=None…):

1
2
3
4
5
import csv
with open('hello.csv') as csvfile:
reader = csv.DictReader(csvfile,fieldnames=[‘Field1’,’Field2’,’Field3’,’Field4’]) #Converts csv to #dictionary form, with fieldnames as keys and entries in the file as values.
             for row in reader:
           print(row['Field1'], row['Field2']) #You can read from the dictionary

1
2
3
4
Output:
Hi hi
Hello hello
Help Coffee

4) Using DictWriter(file, fieldnames=None…):

1
2
3
4
5
6
import csv

with open('hello.csv', 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=[‘Field1’,’Field2’,’Field3’,’Field4’])
    writer.writeheader() #Write the fieldnames into the csv file
    writer.writerow({'Field1': 'New', 'Field2': 'Coffee',’Field3’:’Is’,’Field4’:’Fresh’}) #write a dictionary to the CSV file.

1
2
3
4
5
6
7
8
(Modified CSV file)
hello.csv

Field 1, Field 2, Field 3, Field 4
Hi, hi, hi, hello
Hello, hello, hi, hi
Help, Coffee, Tea
New, Coffee, Is, Fresh

Is there an easier way to read and write CSV files in Python?

Meet the Pandas library. Pandas has the following methods to read from and write to a CSV file, respectively:

  1. Pandas.read_csv
  2. Pandas.DataFrame.to_csv

Note:

  • Reading involves storing csv information in a Pandas DataFrame object.
  • Writing involves converting information from a pandas DataFrame into a comma separated format, which can optionally be stored in a specified URL.

1) Using Pandas.read_csv():

Syntax: pandas.read_csv(filepath_or_buffer, sep=’, , delimiter=None, ….)

Filepath/buffer: URL to the csv file being read

Sep: Separator character (usually a comma, given it is a csv file)

Delimiter: An alias for separator

Here is an example to illustrate this:

1
2
df = pd.read_csv(‘hello.csv', sep=’,’,delimiter=none)
df

1
2
Output:
(Displays the contents of hello.csv)

2) Using Pandas.DataFrame.to_csv():

Syntax: DataFrame.to_csv(path_or_buf=None, sep=’, ‘,index=False..)

Path/buffer: URL to the csv file being written into

Sep: Separator character (usually a comma, given it is a csv file)

Index: If true, then write row names; if false, do not.

Here is an example to illustrate this:

1
2
3
4
5
6
df = pd.DataFrame({'Field1': [‘Cool’],
                   'Field2': [‘Beautiful’],
                    'Field3': [‘Ocean’]})
 df.to_csv(index=False)

df

1
2
Output:
Cool, Beautiful, Ocean


2 Comments

backstreets of hickory · August 13, 2019 at 7:23 pm

Appreciating the dedication you put into your website and in depth information you present.

It’s good to come across a blog every once in a while that isn’t the
same outdated rehashed information. Excellent read! I’ve saved
your site and I’m adding your RSS feeds to my Google account.

    fibg4 · August 16, 2019 at 6:16 am

    Thank you for the positive feedback! It is very encouraging! Lately I haven’t been getting time to work on this site due to my own hectic job but I hope to update the content soon.

Leave a Reply

Your email address will not be published. Required fields are marked *