CSV (Comma-Separated Values) files are one of the most commonly used formats for storing tabular data, and Python provides a simple and powerful way to read them. With Python's built-in csv
module, you can easily read data from CSV files and use it in your programs for analysis, processing, or manipulation.
In this guide, we’ll cover:
csv
module in Pythoncsv.reader()
csv.DictReader()
A CSV file is a plain-text file used to store tabular data, where each row represents a data entry, and the values in each row are separated by a delimiter. The most common delimiter is a comma, but others, such as tabs or semicolons, are also used.
Name, Age, Occupation
John, 28, Software Engineer
Jane, 34, Data Scientist
Doe, 45, Project Manager
In the above example:
csv
Module in PythonPython's csv
module provides functions and tools to read and write CSV files. It includes:
csv.reader()
: Reads data from a CSV file as lists.csv.DictReader()
: Reads data as dictionaries, using the first row as keys.csv.reader()
and csv.writer()
: For working with custom delimiters.We’ll look into these functions in more detail below.
csv.reader()
The csv.reader()
function is the most basic way to read a CSV file. It treats each row in the CSV file as a list of values, with each value being a string.
csv.reader()
import csv
# Open the CSV file in read mode
with open('example.csv', 'r') as file:
# Create a CSV reader object
reader = csv.reader(file)
# Iterate over each row in the CSV file
for row in reader:
print(row)
['Name', 'Age', 'Occupation']
['John', '28', 'Software Engineer']
['Jane', '34', 'Data Scientist']
['Doe', '45', 'Project Manager']
csv.reader(file)
returns an iterator that will read each row in the CSV file.csv.DictReader()
When working with CSV files that have headers, it's more convenient to read them as dictionaries, where the column headers are used as dictionary keys, and the corresponding values are the data in each row. This is done using csv.DictReader()
.
csv.DictReader()
import csv
# Open the CSV file in read mode
with open('example.csv', 'r') as file:
# Create a DictReader object
reader = csv.DictReader(file)
# Iterate over each row (now as a dictionary)
for row in reader:
print(row)
{'Name': 'John', 'Age': '28', 'Occupation': 'Software Engineer'}
{'Name': 'Jane', 'Age': '34', 'Occupation': 'Data Scientist'}
{'Name': 'Doe', 'Age': '45', 'Occupation': 'Project Manager'}
Scatter plots are used to visualize individual data points. They’re particularly useful when you have discrete data or want to observe the relationship between two variables.
Example: Plotting individual points for the function with random noise.
Bar plots are useful when you need to compare discrete categories. Though they are not used directly for plotting continuous mathematical functions, they can be handy when plotting functions on discrete intervals or comparing several functions.
Example: Bar plot for over discrete intervals.
Histograms are used for frequency distributions of continuous data. When analyzing large datasets, you can plot the distribution of values to understand how data is distributed.
Example: Histogram of values generated from the function over random values of .
Heatmaps are an excellent way to visualize matrix data or 2D functions. They use color gradients to represent the intensity of values in a grid, which can be helpful when you need to explore complex functions or multi-dimensional data.
Example: Visualizing as a heatmap.
Customizing plots is essential to make them more informative, appealing, and easier to understand. Here are a few tips:
Title and Labels: Always add titles and axis labels to clarify the meaning of the plot.
Legends: If you plot multiple functions on the same graph, adding a legend helps to distinguish them.
Grid Lines: Enabling grid lines makes it easier to interpret the plot.
Color and Style: Different colors and line styles can help differentiate between multiple functions in a plot.
Example: Customized line plot for multiple functions.
For multi-variable functions or higher dimensions, 3D plotting can give you a comprehensive view of the function’s behavior. Python’s Matplotlib
and Plotly
offer excellent tools for 3D visualization.
Example: 3D plot for the function .
Interactive plots allow users to zoom, pan, and explore different parts of the graph. These plots are ideal for web applications, dashboards, and presentations.
Example: Creating interactive plots with Plotly.
In the example above, "Alice"
is passed as an argument to the greet
function. This is just one simple case, but Python allows for much more flexibility in how arguments are handled.
Python supports several types of arguments, each with its specific use case. Let’s break down each type in detail.
Positional arguments are the most basic and common type of arguments in Python functions. The order in which arguments are passed into the function matters; the first argument is assigned to the first parameter, the second argument to the second parameter, and so on.
Example:
In the example above, 5
is assigned to a
, and 3
is assigned to b
. The order in which you pass arguments is crucial.
Default arguments are function arguments that have a default value. If no argument is provided for such a parameter when the function is called, the default value is used.
Example:
Here, the parameter name
has a default value of "Stranger"
. If no argument is passed, this value is used.
Keyword arguments allow you to pass arguments to a function by explicitly specifying the parameter names. This makes the function call more readable and allows you to pass arguments in any order.
Example:
In the above example, the parameters age
, city
, and name
are passed as keyword arguments, and their order does not matter.
In some cases, you may not know in advance how many arguments will be passed to a function. Python allows functions to accept an arbitrary number of arguments using *args (for non-keyword arguments) and **kwargs (for keyword arguments).
The *args syntax allows you to pass a variable number of positional arguments to a function. These arguments are captured as a tuple.
Example:
Here, the function print_numbers
accepts any number of arguments, which are captured in args
as a tuple.
The **kwargs syntax allows you to pass a variable number of keyword arguments to a function. These arguments are captured as a dictionary.
Example:
In this example, the function student_info
accepts an arbitrary number of keyword arguments, which are captured in kwargs
as a dictionary.
Using function arguments effectively allows you to create versatile and powerful functions. Here are some examples that demonstrate different ways to use arguments:
In this example, width
has a default value of 5
, but it can be overridden by passing a second argument.
You can combine *args and **kwargs in the same function to accept both positional and keyword arguments.
In this example, the function accepts multiple positional arguments (1, 2, 3
) and keyword arguments (name="Alice", age=30
). The positional arguments are captured in args
, and the keyword arguments are captured in kwargs
.
While Python provides a lot of flexibility when it comes to function arguments, following best practices can improve code readability and maintainability:
Default arguments are great for providing optional parameters, but you should avoid mutable default arguments like lists or dictionaries. This can lead to unexpected behavior.
Bad practice:
Good practice:
By using None
as the default value and initializing the list inside the function, you avoid shared mutable state between function calls.
If a function takes too many arguments, it can be difficult to understand and maintain. Consider using keyword arguments or data structures (like dictionaries) to group related arguments.
Bad practice:
Good practice:
In this case, the dictionary person_info
allows the function to accept many parameters in a cleaner and more organized way.
If your function might receive a varying number of arguments, using *args and **kwargs allows you to handle these scenarios dynamically.
Packing arguments allows you to collect a group of arguments into a single variable, while unpacking allows you to split a collection into separate variables. This feature is incredibly useful when you are working with dynamic function calls.
Packing Arguments:
Unpacking Arguments:
Lambda functions are anonymous functions that can accept arguments. They are particularly useful when you need to define a small function on the fly.
In this case, x
is a global variable and can be accessed both inside and outside the function, while y
is a local variable and can only be accessed within my_function
.
Python uses different scopes to determine where a variable is visible and accessible. There are four main types of variable scope in Python:
The local scope refers to variables that are defined inside a function or block of code. These variables are only accessible within that function or block.
In the above code, a
is a local variable that exists only within the function my_function()
. Trying to access a
outside the function results in a NameError
.
The enclosing scope refers to variables in functions that are nested inside other functions. If a variable is not found in the local scope, Python will look for it in the enclosing scope, which is the scope of the outer function.
Here, b
is a variable in the enclosing scope (the outer_function
). The inner_function
can access b
because it’s enclosed by outer_function
. However, b
cannot be accessed outside the outer_function
.
The global scope refers to variables that are defined at the top level of a script or module, outside any function or class. These variables are accessible from anywhere in the program, as long as they are not shadowed by local or enclosing variables.
In this example, c
is a global variable, and both the function my_function()
and the script itself can access it.
The built-in scope contains variables and functions that are available in every Python program. These are predefined by Python and are always accessible. For instance, functions like print()
, len()
, and exceptions like IndexError
are part of the built-in scope.
You can also access other built-in objects, such as the int
class and exception classes, but you cannot modify or delete these built-in names.
Python follows the LEGB (Local, Enclosing, Global, Built-in) rule to resolve the scope of a variable. This means that Python looks for a variable in the following order:
Let’s see an example of how Python resolves variable names based on this rule:
In this example, when inner_function()
is called, it prints "local"
, which is the variable defined in its own scope. If x
weren’t defined locally, Python would search in the enclosing scope (outer_function
) and then the global scope.
global
and nonlocal
KeywordsIn Python, you can modify variables from an outer scope using the global
and nonlocal
keywords. These keywords tell Python to refer to variables in a different scope.
global
KeywordThe global
keyword allows you to modify a variable from the global scope inside a function. Without it, any assignment to a variable within a function would create a new local variable.
In this example, the global
keyword tells Python to modify the global variable x
instead of creating a new local variable.
nonlocal
KeywordThe nonlocal
keyword allows you to modify variables in an enclosing (but not global) scope. This is useful when you need to modify variables in a nested function.
In this case, x
is modified in the enclosing scope (outer_function
) using the nonlocal
keyword.
Managing variable scope effectively is essential for writing maintainable and readable Python code. Here are a few best practices to keep in mind:
Global variables can make code harder to debug and maintain, as they can be modified from anywhere in the program. It’s a good idea to limit their use and rely on passing arguments or returning values from functions.
Avoid using names that shadow built-in functions or objects (like str
, int
, list
, etc.). Overriding these names can lead to unexpected behavior and bugs in your code.
Local variables are the most efficient and safe option. They are confined to their specific function or block, making them less prone to accidental modification by other parts of the program.
nonlocal
and global
Keywords SparinglyWhile global
and nonlocal
are useful for modifying variables in outer scopes, they should be used sparingly. Overuse of these keywords can make the code harder to follow and more error-prone.
Let’s put together all the concepts to see how Python resolves variable scope in a complex example:
However, if you want to modify a variable that is defined outside the function, in the global scope, you need to use the global
keyword. The global
keyword tells Python that a variable should not be treated as local, but rather as a global variable.
global
KeywordTo use the global
keyword, simply precede the variable name inside the function with the keyword global
. This allows you to modify the variable in the global scope instead of creating a new local variable.
Here’s how you can modify a global variable from within a function:
global
keyword to modify a global variableIn the example above, the global variable x
is modified inside the function update_global_variable()
. Without the global
keyword, x
would be treated as a local variable, and any changes made to it inside the function would not affect the global x
.
global
KeywordThe global
keyword should be used only when you explicitly need to modify a global variable from within a function. Here are some common scenarios where the global
keyword may be useful:
It’s important to note that while global
can be useful in certain cases, it should be used cautiously. Overusing global variables can lead to code that is difficult to understand, test, and maintain. It’s often better to pass variables as function arguments or return values to avoid relying too much on the global state.
global
Keyword UsageIn this example, the global variable sum
is modified inside the function add_to_sum()
. Each time the function is called, the global variable is updated.
global
for Configuration SettingsIn this example, the global configuration dictionary config
is modified inside the function enable_debug()
. Without the global
keyword, changes to config
inside the function would not persist outside of it.
global
KeywordWhile the global
keyword is useful in specific scenarios, it is important to use it responsibly. Here are some best practices:
Relying too much on global variables can make your code difficult to understand and maintain. Ideally, functions should operate independently, passing data through arguments and returning results, rather than modifying global variables. This promotes modularity and makes your code more reusable and testable.
If you need to use global variables, they are best suited for values that remain consistent throughout your program, such as configuration settings or constants. This ensures that the global variables serve a clear purpose and are less likely to be modified unpredictably.
Avoid using global variables for complex or mutable states. For example, rather than using a global list to track user sessions or other dynamic data, consider using classes or passing data between functions explicitly.
If you must use global variables, ensure they are well-documented. A global variable’s purpose should be clear to anyone reading the code, as it’s accessible from many parts of the program.
In some cases, you may want to work with both global and local variables in a function. Here’s an example:
Let’s implement a recursive function to compute the factorial of a number:
factorial(n)
checks if n
is 0 or 1. If so, it returns 1 (the base case).n
is greater than 1, the function calls itself with n-1
, which reduces the problem step by step until the base case is reached.The factorial of a number n
(denoted as n!
) is the product of all positive integers less than or equal to n
. We can calculate it recursively:
Here, factorial(5)
is computed as 5 * factorial(4)
, and so on, until factorial(0)
is reached.
The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones. The sequence starts with 0 and 1. The recursive formula is:
Here is a Python function to compute the n
th Fibonacci number recursively:
In this example, fibonacci(6)
computes as fibonacci(5) + fibonacci(4)
, and so on, until it reaches the base case of fibonacci(1)
or fibonacci(0)
.
Recursion is very useful for tree traversal problems. Consider a simple binary tree structure and how recursion can be used to traverse it.
In this example, the function in_order_traversal()
recursively visits the left child, the node itself, and then the right child.
Python does not optimize tail recursion (unlike some other languages such as Scheme or Haskell), meaning that even tail-recursive functions will result in new stack frames. This can lead to a stack overflow error for very deep recursions.
In tail recursion, the recursive call is the last operation in the function. Here’s an example of a tail-recursive function for factorial calculation:
The file math_operations.py
is a Python module containing two functions: add()
and subtract()
.
To use the functionality defined in this module, we can import it into another Python script.
Python comes with a large collection of built-in modules that are available as part of the Python standard library. These modules provide functionality for file handling, regular expressions, math operations, and more. Some commonly used built-in modules include:
math
: Provides mathematical functions (e.g., math.sqrt()
, math.pow()
)os
: Provides functions to interact with the operating system (e.g., os.getcwd()
, os.path
)sys
: Provides access to system-specific parameters (e.g., sys.argv
, sys.exit()
)datetime
: Provides classes for working with dates and times (e.g., datetime.datetime.now()
)User-defined modules are Python files you create that contain functions, classes, and variables. These modules allow you to organize and reuse your code.
For instance, if you create a file called my_module.py
with the following code:
You can import and use it in another Python script.
import
StatementTo use a module in Python, you need to import it. The simplest way to import a module is by using the import
statement. Here's how you can import the math
module and use its functions:
In this example, the import math
statement makes the math
module available to the script, and we can call its sqrt()
function to calculate the square root.
from ... import
StatementYou can also import specific functions or variables from a module using the from ... import
syntax. This allows you to avoid prefixing the module name each time you call a function.
In this example, we imported the sqrt
function directly, so we don't need to reference the math
module every time.
While generally not recommended (due to possible naming conflicts), you can import all functions from a module using the from module import *
syntax.
Note that using import *
imports all functions and variables from the module, which may overwrite existing functions or variables in your code.
Once you have imported a module, you can access its functions and variables by using the dot (.
) notation.
In this example, we used the pow()
function from the math
module to calculate the power of 2 raised to 3.
You can assign an alias to a module using the as
keyword. This is especially useful when dealing with long module names or to avoid naming conflicts.
Here, we imported the numpy
module and assigned it an alias np
, so we can use np
instead of typing numpy
every time.
If you make changes to a module and want to reload it without restarting the Python interpreter, you can use the reload()
function from the importlib
module.
This is useful when working interactively in environments like Jupyter notebooks, where you might modify a module during a session.
Python has a vast collection of built-in modules, but here are some of the most commonly used ones:
os
: Provides functions to interact with the operating system (e.g., file handling, directories, environment variables).
math
: Contains mathematical functions such as square roots, trigonometry, and logarithms.
random
: Provides functions for generating random numbers and selecting random items.
datetime
: Provides functions for working with dates and times.
json
: Provides functions for working with JSON data (serialization and deserialization).
Keep Modules Small and Focused: Try to keep your modules focused on one thing. For example, a module could handle string manipulation, math operations, or file handling. This makes it easier to maintain and test.
Use Descriptive Names: Name your modules and functions meaningfully so that others can easily understand their purpose.
Avoid Using from module import *
: This can lead to naming conflicts and make your code less readable. Instead, use import module
or from module import function_name
.
Organize Modules into Packages: As your codebase grows, organize related modules into packages (directories containing an __init__.py
file). This makes your code more modular and reusable.
Document Your Modules: Always document your modules, functions, and classes. This helps other developers understand how to use them and what they do.
Creating a Python module is simple. You just need to save your Python code in a file with the .py
extension. Here's a step-by-step guide to creating a module:
Create a Python file called my_module.py
and write some functions in it.
Now you can import and use the functions defined in my_module.py
.
To run the program, simply execute main.py
, and it will import my_module.py
and call its functions.
This will output:
In this structure:
math_package
is the package.__init__.py
marks the directory as a package.addition.py
and subtraction.py
are individual modules within the package that handle different mathematical operations.Once the package is created, we can import the modules from it and use the functions defined within them.
Packages help to organize your Python code by logically grouping related modules together. They offer several advantages:
Creating a Python package is straightforward. Here’s how you can do it:
The first step is to create a directory for your package. This directory will contain all the modules that make up the package.
__init__.py
FileThe __init__.py
file marks the directory as a package. You can leave it empty, or use it to initialize your package with specific functionality.
The __init__.py
file allows you to import modules from the package and can also include package-level variables and functions.
Inside your package directory, you can create individual Python modules. Let’s create two modules, addition.py
and subtraction.py
, that contain functions for basic math operations.
Now that your package is set up, you can import and use the modules from the package. Here's an example of how to import and use the functions from the math_package
.
The final structure of your package would look like this:
A Python package typically has the following structure:
__init__.py
: This file is required to mark the directory as a package. It can also initialize the package when imported.module1.py
, module2.py
): These are the individual Python files that contain the code for specific functionality.sub_package1
, sub_package2
): Packages can also contain sub-packages, which are essentially packages within packages. These help in organizing even more complex projects.To use a module from a package, you can import it using either of these methods:
You can import an entire module from a package:
You can import specific functions or classes directly from a module, so you don’t have to use the full module path:
While generally not recommended due to potential naming conflicts, you can import all functions from a module:
You can give an alias to a module or a package to make it easier to refer to:
Inside a package, you can use relative imports to import modules from the same package or subpackages. This is useful when working with a large package structure.
For example, suppose you have a package structure like this:
In calculator.py
, you can use relative imports to access addition.py
and subtraction.py
:
If your package depends on external libraries, you can include these dependencies in a requirements.txt
file. This file lists all the packages that your package depends on, and you can use it to install dependencies using pip
.
Example requirements.txt
:
To install the dependencies, run:
pip
The most common way to install packages is through pip
, Python's package installer. To install a third-party package from the Python Package Index (PyPI), run the following command:
For example, to install the popular web scraping library requests
, use:
if __name__ == "__main__"
The line if __name__ == "__main__":
is a special Python construct. It checks whether the Python script is being run directly or imported as a module into another script. This allows your script to act as both an executable program and a reusable module.
"__main__"
to the special built-in variable __name__
.__name__
is "__main__"
, so the code block under if __name__ == "__main__":
will execute.__name__
will be the name of the module (not "__main__"
), and the code inside the if
block will not execute.This allows the main()
function to be called only when the script is executed directly, but not when it is imported as a module.
greet()
prints a greeting to the user.main()
function gets the user's name and calls the greet()
function.if __name__ == "__main__"
, and if so, it calls the main()
function.To call the main()
function, you simply place the main()
function call within the if __name__ == "__main__":
block. Here’s an example:
When you run this script, Python will first check if the script is being run directly. If it is, it will call the main()
function and start executing the program.
Encapsulate Program Logic: Place the main logic of your program inside the main()
function to keep your code organized and readable.
Avoid Running Code Outside Functions: Avoid placing executable code outside functions, including the main function. This makes it harder to import your script as a module and increases the risk of unexpected behavior.
Use if __name__ == "__main__"
: Always include the if __name__ == "__main__":
construct to allow your script to be used as both a standalone program and an importable module.
Keep Main Function Simple: The main()
function should serve as a coordinator of your program. Avoid putting detailed logic inside the main function. Instead, call other functions that handle specific tasks.
Testing: Use unit tests or other testing frameworks, and ensure the logic in the main()
function can be tested in isolation.
Command-line Arguments: When creating larger scripts, you can pass command-line arguments to your script through the sys.argv
list or using the argparse
module. This allows the script to be more dynamic.
Here’s a full example that demonstrates the use of a main()
function:
sys
to access command-line arguments.greet()
and farewell()
functions to greet and bid farewell to the user.main()
function checks if the script is being run with the correct number of command-line arguments. If not, it prints a usage message and exits. Otherwise, it extracts the name from the command-line arguments and calls the greet()
and farewell()
functions.To run the script, save it as script.py
and execute it in the terminal:
Output:
You can delete an empty directory with os.rmdir()
. To delete non-empty directories, use shutil.rmtree()
.
To list all the files and directories in a given directory, you can use os.listdir()
. It returns a list of filenames in the specified directory.
If you want to list only files or directories, you can filter the results:
You can change the current working directory using os.chdir()
.
To get the current working directory, you can use os.getcwd()
.
Python provides multiple ways to read files, such as using open()
. Here's an example:
'r'
is the mode for reading the file (default).'w'
is the mode for writing to a file (creates a new file or overwrites an existing file).'a'
is for appending data to an existing file.To write data to a file, you can use open()
in write ('w'
) or append ('a'
) mode.
shutil
and os
The shutil
module provides several utilities for file management, such as copying and moving files. Here are some examples:
You can rename a file using os.rename()
:
os.path
and pathlib
Python provides two primary ways to work with file paths: os.path
(traditional) and pathlib
(modern and object-oriented).
os.path
os.path
provides a variety of functions for manipulating file paths:
pathlib
The pathlib
module provides a modern, object-oriented way to work with file paths.
Python’s shutil
module makes it easy to copy, move, and rename files:
shutil.copy(src, dst)
or shutil.copy2(src, dst)
for preserving metadata.shutil.move(src, dst)
moves or renames files.os.rename(src, dst)
.To delete files or directories, you can use os.remove()
for files and os.rmdir()
for empty directories. For non-empty directories, shutil.rmtree()
is the recommended option.
In this case, the first row represents the header, and the following rows represent the data entries.
csv
Module to Read and Write CSV FilesPython’s built-in csv
module allows us to read from and write to CSV files with ease. Let’s explore some common operations.
To read a CSV file, use the csv.reader()
function. This function returns an iterator that can be used to iterate through each row in the file.
To write to a CSV file, you use the csv.writer()
function, which writes a sequence of rows to the file.
The newline=''
argument is used to prevent extra blank lines from being inserted between rows when writing on Windows systems.
CSV files often contain a header row with column names. The csv.DictReader()
and csv.DictWriter()
functions allow for more flexible handling of CSV files by working with dictionaries instead of lists.
The csv.DictReader()
function reads a CSV file and maps each row to a dictionary where the keys are the column names (from the header).
The csv.DictWriter()
function allows you to write dictionaries to a CSV file, where the keys of the dictionary correspond to the column names.
By default, the CSV module uses a comma ,
as the delimiter, but you can specify a custom delimiter if your CSV file uses a different character.
CSV files may have different encodings, such as UTF-8 or ISO-8859-1. You can specify the encoding when opening the file to handle different character sets.
with
Statements: Always use the with open()
syntax to ensure that the file is properly closed after reading or writing.None
or custom placeholders to handle missing values.'a'
mode.DictReader
and DictWriter
: For more flexibility and readability, prefer using csv.DictReader
and csv.DictWriter
when dealing with CSV files that have headers.Suppose you have a list of dictionaries and you want to write it to a CSV file. You can use the csv.DictWriter
for this.
You might need to process a CSV file and calculate the sum of certain columns. Here’s how you can do that:
In the above example:
csv
Module in PythonPython's csv
module provides functions and tools to read and write CSV files. It includes:
csv.reader()
: Reads data from a CSV file as lists.csv.DictReader()
: Reads data as dictionaries, using the first row as keys.csv.reader()
and csv.writer()
: For working with custom delimiters.We’ll look into these functions in more detail below.
csv.reader()
The csv.reader()
function is the most basic way to read a CSV file. It treats each row in the CSV file as a list of values, with each value being a string.
csv.reader()
csv.reader(file)
returns an iterator that will read each row in the CSV file.csv.DictReader()
When working with CSV files that have headers, it's more convenient to read them as dictionaries, where the column headers are used as dictionary keys, and the corresponding values are the data in each row. This is done using csv.DictReader()
.
csv.DictReader()
csv.DictReader(file)
automatically uses the first row of the CSV as the dictionary keys.By default, the csv.reader()
and csv.DictReader()
functions use a comma as the delimiter between values. However, sometimes you may encounter CSV files that use different delimiters, such as semicolons (;
) or tabs (\t
). In such cases, you can specify a custom delimiter.
;
) as the Delimiter
import csv
# Open the CSV file in read mode
with open('semicolon_example.csv', 'r') as file:
# Create a CSV reader object with a semicolon delimiter
reader = csv.reader(file, delimiter=';')
# Iterate over each row
for row in reader:
print(row)
['Name', 'Age', 'Occupation']
['John', '28', 'Software Engineer']
['Jane', '34', 'Data Scientist']
['Doe', '45', 'Project Manager']
In this example, we specify delimiter=';'
to handle CSV files where values are separated by semicolons instead of commas.
CSV files can be encoded in various formats, such as UTF-8, ISO-8859-1, etc. When opening a CSV file, you might encounter encoding issues. You can specify the encoding explicitly using the encoding
argument in the open()
function.
import csv
# Open the CSV file with UTF-8 encoding
with open('utf8_example.csv', 'r', encoding='utf-8') as file:
reader = csv.reader(file)
# Iterate over each row
for row in reader:
print(row)
If you know that the file is encoded in a different format, such as ISO-8859-1 (Latin-1), you can adjust the encoding like this:
with open('latin1_example.csv', 'r', encoding='iso-8859-1') as file:
reader = csv.reader(file)
for row in reader:
print(row)
CSV files may sometimes contain missing values or malformed rows. Python’s csv
module provides ways to handle these cases.
import csv
with open('example_with_missing_values.csv', 'r') as file:
reader = csv.DictReader(file)
for row in reader:
# Check if a value is missing, and handle accordingly
if not row['Age']:
print(f"Missing value in Age for {row['Name']}")
else:
print(row)
In the above example, if the Age
column is missing in any row, we print a message indicating the missing value.
To handle malformed rows, you can use try-except
blocks or csv.Error
exceptions.
import csv
with open('malformed_example.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
try:
# Process the row
if len(row) < 3:
raise ValueError("Malformed row: not enough values")
print(row)
except ValueError as e:
print(e)
Always Use with
Statements: The with
statement ensures that the file is properly closed after being processed, even if an error occurs during reading.
Specify the Correct Encoding: If you're dealing with non-ASCII characters, always specify the encoding when opening a file to avoid encoding errors.
Handle Missing or Corrupt Data: Be prepared for missing or malformed data by validating each row or using exception handling.
Use DictReader()
for Header Rows: If your CSV file contains headers, use csv.DictReader()
to automatically use the first row as column names, making it easier to work with the data.
Test with Sample Files: Always test your code with different CSV files (various delimiters, encoding, headers, etc.) to ensure robustness.