File IO
Overview
In this session we will look at how to read and write files in Python. We will cover the following topics:
- Opening and closing files
- Reading and writing text files
- context managers
We will also continue to develop our csv writer program from the previous session.
What is File I/O?
File I/O stands for File Input/Output. It is the process of reading data from and writing data to files on a computer. In Python, file I/O is done using file objects, which are Python objects that represent files on the computer’s file system. This is an abstraction that allows us to interact with files in a high-level way, without having to worry about the low-level details of how files are stored on disk (as each operating system has its own way of storing files).
/
as a path separator. If you are using Windows you will need to use the backslash \
or use the os.path.join
function to create paths. Later in this course we will look at the pathlib
module which provides a more object oriented way of working with files and paths.Opening and Closing Files
At it’s simplest level we can use the built in
open
function to open a file. The open
function takes two arguments, the first is the name of the file to open as a string, and the second is the mode in which to open the file. The mode can be one of the following:
Character | Meaning |
---|---|
‘r’ | open for reading (default) |
‘w’ | open for writing, truncating the file first |
‘x’ | open for exclusive creation, failing if the file already exists |
‘a’ | open for writing, appending to the end of file if it exists |
‘b’ | binary mode |
’t' | text mode (default) |
‘+’ | open for updating (reading and writing) |
To create a simple text file we can use the following code in the REPL:
f = open("test.txt", "w")
f.write("Hello, World!")
f.close()
This will create a file called test.txt
in the current directory and write the string Hello, World!
to it. The close
method is used to close the file after we have finished writing to it. If we do not close the file, the changes we have made to it may not be saved.
Reading in a file
To read a file we can use the read
method on an open file object. This will read the entire contents of the file into a string. For example:
f = open("test.txt", "r")
contents = f.read()
print(contents)
f.close()
context managers
You will notice in both examples we have to remember to close the file. This can be a source of bugs in programs if we forget to close the file. To help with this Python has a feature called a context manager. Context managers are objects that manage resources, such as files, and automatically clean up after themselves when they are no longer needed. We can use the with
statement to create a context manager for a file. For example:
with open("test.txt", "r") as file:
contents = file.read()
print(contents)
This will automatically close the file when the block of code inside the with
statement is finished. This is a much safer way to work with files as it ensures that the file is always closed, even if an error occurs while reading or writing to it. This is the recommended way to work with files in modern Python.
File exceptions
It may not always be possible to open a file, for example if the file does not exist or the user does not have permission to read or write to it. In these cases, Python will raise a FileNotFoundError
or PermissionError
exception. To handle these exceptions, we can use a try
statement to catch the exception and handle it gracefully. For example:
#!/usr/bin/env python
try :
with open("nothere", "r") as file:
contents = file.read()
print(contents)
except FileNotFoundError:
print("File not found")
# throw a permission denied exception
try:
with open("/etc/passwd", "w") as file:
...
except PermissionError:
print("Permission denied")
Depending on the permissions of the user running the program you may or may not be able to write to the /etc/passwd
file. If you are running as a normal user you will get a PermissionError
exception.
Exercise
In the previous session we designed a program to parse command line arguments for a csv generator program, using this as the starting point write out some random CSV data to a file.
To recap the program should take the following arguments:
usage: csv_writer.py [-h] [-o OUTPUT] [-r ROWS] [-c COLUMNS] [-s SEPARATOR]
Argument | Description |
---|---|
-h, –help | show this help message and exit |
-o OUTPUT, –output OUTPUT | The name of the output file |
-r ROWS, –rows ROWS | The number of rows to write |
-c COLUMNS, –columns COLUMNS | The number of columns to write |
-s SEPARATOR, –separator SEPARATOR | The separator to use between values |
You can use either the argparse or click methods to parse the arguments.
The next steps will be
- open a file for writing using the output file name
- loop over the number of rows and columns writing out random data using the separator
For now we can use the following code to generate random data:
import random
def random_data():
return random.randint(0, 100)
This will return a random integer between 0 and 100.
Click here to see a possible solution using argparse
This code can be found here
#!/usr/bin/env python
import argparse
import random
def random_data() -> int:
return random.randint(0, 100)
def main():
parser = argparse.ArgumentParser(description="Generate Random CSV Data")
parser.add_argument("-o", "--output", help="The name of the output file", required=True)
parser.add_argument("-r", "--rows", type=int, help="The number of rows to write", default=10)
parser.add_argument(
"-c", "--columns", type=int, help="The number of columns to write", default=10
)
parser.add_argument(
"-s", "--separator", help="The separator to use between values", default=","
)
args = parser.parse_args()
output_file = args.output
rows = args.rows
columns = args.columns
separator = args.separator
with open(output_file, "w") as file:
for row in range(rows):
for column in range(columns):
file.write(f"{random_data()}")
if column < columns - 1:
file.write(separator)
file.write("\n")
if __name__ == "__main__":
main()
Click here to see a possible solution using click
This code can be found here
#!/usr/bin/env python
import random
import click
def random_data() -> int:
return random.randint(0, 100)
@click.command()
@click.option("-o", "--output_file", help="The name of the output file", required=True)
@click.option("-r", "--rows", type=int, help="The number of rows to write", default=10)
@click.option("-c", "--columns", type=int, help="The number of columns to write", default=10)
@click.option("-s", "--separator", help="The separator to use between values", default=",")
def main(output_file: str, rows: int, columns: int, separator: str) -> None:
"""
Writes a CSV file with the specified number of rows and columns.
Args:
output_file (str): The path to the output CSV file.
rows (int): The number of rows to write in the CSV file.
columns (int): The number of columns to write in the CSV file.
separator (str): The separator to use between columns.
Returns:
None
"""
with open(output_file, "w") as file:
for row in range(rows):
for column in range(columns):
file.write(f"{random_data()}")
if column < columns - 1:
file.write(separator)
file.write("\n")
if __name__ == "__main__":
main()
What next
The next thing we need to think about is how robust our program is. What happens if the user enters a non-existent file name for the output file?
How can we make the program more useful? Could we add more options to control the range of the random data? Could we add an option to specify the data type of the random data and how would you approach it?