File IO

Overview

In this session we will look at how to read and write files in Python. We will cover the following topics:

Opening and closing files
Reading and writing text files
context managers

We will also continue to develop our csv writer program from the previous session.

What is File I/O?

File I/O stands for File Input/Output. It is the process of reading data from and writing data to files on a computer. In Python, file I/O is done using file objects, which are Python objects that represent files on the computer’s file system. This is an abstraction that allows us to interact with files in a high-level way, without having to worry about the low-level details of how files are stored on disk (as each operating system has its own way of storing files).

There are a lot of complexities with files and paths that we are going to overlook at present. As we are using linux files will use the forward slash / as a path separator. If you are using Windows you will need to use the backslash \ or use the os.path.join function to create paths. Later in this course we will look at the pathlib module which provides a more object oriented way of working with files and paths.

Opening and Closing Files

At it’s simplest level we can use the built in open function to open a file. The open function takes two arguments, the first is the name of the file to open as a string, and the second is the mode in which to open the file. The mode can be one of the following:

Character	Meaning
‘r’	open for reading (default)
‘w’	open for writing, truncating the file first
‘x’	open for exclusive creation, failing if the file already exists
‘a’	open for writing, appending to the end of file if it exists
‘b’	binary mode
’t'	text mode (default)
‘+’	open for updating (reading and writing)

To create a simple text file we can use the following code in the REPL:

f = open("test.txt", "w")
f.write("Hello, World!")
f.close()

This will create a file called test.txt in the current directory and write the string Hello, World! to it. The close method is used to close the file after we have finished writing to it. If we do not close the file, the changes we have made to it may not be saved.

Reading in a file

To read a file we can use the read method on an open file object. This will read the entire contents of the file into a string. For example:

f = open("test.txt", "r")
contents = f.read()
print(contents)
f.close()

context managers

You will notice in both examples we have to remember to close the file. This can be a source of bugs in programs if we forget to close the file. To help with this Python has a feature called a context manager. Context managers are objects that manage resources, such as files, and automatically clean up after themselves when they are no longer needed. We can use the with statement to create a context manager for a file. For example:

with open("test.txt", "r") as file:
    contents = file.read()
    print(contents)

This will automatically close the file when the block of code inside the with statement is finished. This is a much safer way to work with files as it ensures that the file is always closed, even if an error occurs while reading or writing to it. This is the recommended way to work with files in modern Python.

File exceptions

It may not always be possible to open a file, for example if the file does not exist or the user does not have permission to read or write to it. In these cases, Python will raise a FileNotFoundError or PermissionError exception. To handle these exceptions, we can use a try statement to catch the exception and handle it gracefully. For example:

#!/usr/bin/env python

try :
    with open("nothere", "r") as file:
        contents = file.read()
        print(contents)
except FileNotFoundError:
    print("File not found")
    
# throw a permission denied exception
try:
    with open("/etc/passwd", "w") as file:
        ...
except PermissionError:
    print("Permission denied")

Depending on the permissions of the user running the program you may or may not be able to write to the /etc/passwd file. If you are running as a normal user you will get a PermissionError exception.

Exercise

In the previous session we designed a program to parse command line arguments for a csv generator program, using this as the starting point write out some random CSV data to a file.

To recap the program should take the following arguments:

usage: csv_writer.py [-h] [-o OUTPUT] [-r ROWS] [-c COLUMNS] [-s SEPARATOR]

Argument	Description
-h, –help	show this help message and exit
-o OUTPUT, –output OUTPUT	The name of the output file
-r ROWS, –rows ROWS	The number of rows to write
-c COLUMNS, –columns COLUMNS	The number of columns to write
-s SEPARATOR, –separator SEPARATOR	The separator to use between values

You can use either the argparse or click methods to parse the arguments.

The next steps will be

open a file for writing using the output file name
loop over the number of rows and columns writing out random data using the separator

For now we can use the following code to generate random data:

import random

def random_data():
    return random.randint(0, 100)

This will return a random integer between 0 and 100.

Click here to see a possible solution using argparse

This code can be found here

#!/usr/bin/env python

import argparse
import random


def random_data() -> int:
    return random.randint(0, 100)


def main():
    parser = argparse.ArgumentParser(description="Generate Random CSV Data")
    parser.add_argument("-o", "--output", help="The name of the output file", required=True)
    parser.add_argument("-r", "--rows", type=int, help="The number of rows to write", default=10)
    parser.add_argument(
        "-c", "--columns", type=int, help="The number of columns to write", default=10
    )
    parser.add_argument(
        "-s", "--separator", help="The separator to use between values", default=","
    )
    args = parser.parse_args()
    output_file = args.output
    rows = args.rows
    columns = args.columns
    separator = args.separator

    with open(output_file, "w") as file:
        for row in range(rows):
            for column in range(columns):
                file.write(f"{random_data()}")
                if column < columns - 1:
                    file.write(separator)
            file.write("\n")


if __name__ == "__main__":
    main()

Click here to see a possible solution using click

This code can be found here

#!/usr/bin/env python

import random
import click


def random_data() -> int:
    return random.randint(0, 100)


@click.command()
@click.option("-o", "--output_file", help="The name of the output file", required=True)
@click.option("-r", "--rows", type=int, help="The number of rows to write", default=10)
@click.option("-c", "--columns", type=int, help="The number of columns to write", default=10)
@click.option("-s", "--separator", help="The separator to use between values", default=",")
def main(output_file: str, rows: int, columns: int, separator: str) -> None:
    """
    Writes a CSV file with the specified number of rows and columns.

    Args:
        output_file (str): The path to the output CSV file.
        rows (int): The number of rows to write in the CSV file.
        columns (int): The number of columns to write in the CSV file.
        separator (str): The separator to use between columns.

    Returns:
        None
    """

    with open(output_file, "w") as file:
        for row in range(rows):
            for column in range(columns):
                file.write(f"{random_data()}")
                if column < columns - 1:
                    file.write(separator)
            file.write("\n")


if __name__ == "__main__":
    main()

What next

The next thing we need to think about is how robust our program is. What happens if the user enters a non-existent file name for the output file?

How can we make the program more useful? Could we add more options to control the range of the random data? Could we add an option to specify the data type of the random data and how would you approach it?

SE for Media Python

Last updated on Sep 19, 2024