How To Read CSV Files Using Pandas?
Table Of Contents:
(1) How To Read CSV Files Using Pandas?
(2) Examples Of Reading CSV Files.
(1) How To Read CSV Files Using Pandas?
- Use the ‘read_csv( )’ method from pandas to read the CSV file.
- Read a comma-separated values (csv) file into DataFrame.
Syntax:
pandas.read_csv(filepath_or_buffer, *,
sep=_NoDefault.no_default,
delimiter=None,
header='infer',
names=_NoDefault.no_default,
index_col=None,
usecols=None,
squeeze=None,
prefix=_NoDefault.no_default,
mangle_dupe_cols=True,
dtype=None,
engine=None,
converters=None,
true_values=None,
false_values=None,
skipinitialspace=False,
skiprows=None,
skipfooter=0,
nrows=None,
na_values=None,
keep_default_na=True,
na_filter=True,
verbose=False,
skip_blank_lines=True,
parse_dates=None,
infer_datetime_format=False,
keep_date_col=False,
date_parser=None,
dayfirst=False,
cache_dates=True,
iterator=False,
chunksize=None,
compression='infer',
thousands=None,
decimal='.',
lineterminator=None,
quotechar='"',
quoting=0,
doublequote=True,
escapechar=None,
comment=None,
encoding=None,
encoding_errors='strict',
dialect=None,
error_bad_lines=None,
warn_bad_lines=None,
on_bad_lines=None,
delim_whitespace=False,
low_memory=True,
memory_map=False,
float_precision=None,
storage_options=None)
(2) Examples Of Reading CSV Files Using Pandas?
Example-1: Reading CSV From A CSV File
Parameter:
- filepath_or_buffer: str, path object or file-like object :
- Any valid string path is acceptable.
- The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file.
- For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details
Output:
Example-2: Using ‘usecols’ in read_csv()
Parameter:
- usecols: list-like or callable, optional :
- Returns a specified list of columns only.
- You need to specify column names inside a List like [
'foo', 'bar', 'baz'].
- You also can specify the position of the column names like
[0, 1, 2].
- The first Column will be positioned at Zero.
Before:
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details
After: With Column Names
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,usecols=['CustomerID', 'Genre', 'Age'])
customer_details
Note:
- Here in results we got only three columns, as we have specified the ‘usecols’ attribute.
After: With Column Position Values
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,usecols=[0,1,2])
customer_details
Note:
- Here I have specified the column position values as [0, 1, 2].
Example-3: Using ‘index_col’ in read_csv()
Parameter:
- index_col: int, str, sequence of int / str, or False, optional, default None:
- If you want to make a particular column(s) as an index, you can use index_col.
- It can accept single or multiple columns to make an index.
Before:
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details
After: With index_col
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,index_col=['Age','Genre'])
customer_details
Note:
- Here I have made the ‘Age’ and ‘Genre’ columns index columns.
Example-4: Using ‘nrows’ in read_csv()
Parameter:
- nrows: int, optional:
- Number of rows of file to read. Useful for reading pieces of large files.
With nrows
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,nrows=4)
customer_details
Note:
- Here you can see only the ‘4’ number of rows are fetched.
Example-5: Using ‘skiprows’ in read_csv()
Parameter:
- skiprows: list-like, int or callable, optional
- If you want to skip some particular lines, you can mention the line numbers inside a list, indexed from ‘0’
- If you have mentioned a single integer value, it will skip the lines from the start of the file.
With ‘skiprows’
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,skiprows=[1,3])
customer_details
Note:
- Here you can see CustomerID ‘1’ and ‘3’ has been skipped.
Example-6: Using ‘header’ in read_csv()
Parameter:
- header: int, list of int, None, default ‘infer’
- You can use the ‘header’ attribute if you want to keep some particular row as your header.
- By default, header = 0, which is the file’s first row.
With ‘header=0’
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,header=0)
customer_details
With ‘header=1’
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,header=1)
customer_details
Example-7: Using ‘names’ in read_csv()
Parameter:
- names: array-like, optional
- You can mention your own column names using the ‘names’ attribute.
- If the file contains a header row, then you should explicitly pass
header=0
to override the column names. - Duplicates in this list are not allowed.
With ‘names’
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,names=['A','B','C','D','E'])
customer_details
Note:
- Here I already have the header row, [‘CustomerID’, ‘Genre’, ‘Age’, ‘Annual_Income_(k$)’, ‘Spending_Score’].
- To ignore this row you should mention header = 0.
With ‘names and header=0’
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,header=0,names=['A','B','C','D','E'])
customer_details
Note:
- Here you can see that the header row, [‘CustomerID’, ‘Genre’, ‘Age’, ‘Annual_Income_(k$)’, ‘Spending_Score’] has gone
Example-8: Using ‘dtype’ in read_csv()
Parameter:
- dtype: Type name or dict of column -> type, optional
- You can explicitly specify the data types of the columns, by using the ‘dtype’ attribute.
- E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}
Without ‘dtype’
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details
print(customer_details['CustomerID'].dtype)
print(customer_details['Genre'].dtype)
print(customer_details['Age'].dtype)
print(customer_details['Annual_Income_(k$)'].dtype)
print(customer_details['Spending_Score'].dtype)
With ‘dtype’
import pandas as pd
import numpy as np
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path, dtype={'CustomerID':np.int32, 'Genre':str,
'Age':np.int32, 'Annual_Income_(k$)':np.int32, 'Spending_Score':np.int32})
print(customer_details['CustomerID'].dtype)
print(customer_details['Genre'].dtype)
print(customer_details['Age'].dtype)
print(customer_details['Annual_Income_(k$)'].dtype)
print(customer_details['Spending_Score'].dtype)
Note:
- Here you can see that I have changed the integer type from ‘int64’ to ‘int32’.
Example-9: Using ‘converters’ in read_csv()
Parameter:
- converters: dict, optional
- If you want to perform some operations on the values of a column, then you can use the ‘converters’ attribute.
- Here you need to pass a dictionary, the ‘key’ will be the column name and the ‘value’ will be a custom function, which you want to apply to the values.
With ‘converters’
import pandas as pd
import numpy as np
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
fun = lambda x: int(x) / 2
customer_details = pd.read_csv(path,converters={'Age': fun})
customer_details
Example-9: Using ‘na_values’ in read_csv()
Parameter:
- na_values: scalar, str, list-like, or dict, optional
- Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values.
- By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.
Without ‘na_values’
import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details
Input:
Output:
Note:
- Here input file contains [‘NULL’, ‘NA’, ‘NAN’, ‘N/A’] which will be automatically considered as ‘NaN’.
- But, I also want to consider ‘VOID’ also as ‘NaN’. To do that I can use the ‘na_values’ attribute.
With ‘na_values’
import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details
Input:
Output:
Note:
- Here you can see that, all the ‘VOID’ values has been converted to ‘NaN’ values.
Input:
Output:
Note:
- Here you can see that, all the blank spaces values has been converted to ‘NaN’ values.
Example-10: Using ‘verbose’ in read_csv()
Parameter:
- verbose: bool, default False
- The verbose parameter, when set to True prints additional information on reading a CSV file like time taken for:
- type conversion,
- memory cleanup, and
- tokenization.
- The verbose parameter, when set to True prints additional information on reading a CSV file like time taken for:
With ‘verbose’
import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,verbose=True)
customer_details
Output:
Example-10: Using ‘verbose’ in read_csv()
Parameter:
- parse_dates: bool or list of int or names or list of lists or dict, default: False
- By default, date columns are represented as objects when loading data from a CSV file.
- To read the date column correctly, we can use the argument parse_dates to specify a list of date columns.
Without ‘parse_dates’
import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details
Output:
customer_details.info()
Output:
Note:
- The date column gets read as an object data type using the default read_csv().
With ‘parse_dates’
import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,parse_dates=['date'])
customer_details
Output:
customer_details.info()
Output:
Note:
- Now the date column gets read as a datetime object data type using the parse_dates attribute.
Example-11: Using ‘infer_datetime_format’ in read_csv()
Parameter:
- infer_datetime_format: bool, default False
- Essentially, Pandas deduces the format of your
datetime
from the first element(s) and then assumes all other elements in the series will use the same format. - This means Pandas does not need to check multiple formats when attempting to convert a string to
datetime
. - If it is not able to infer DateTime, it will not parse the date.
- If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
- Essentially, Pandas deduces the format of your
With ‘infer_datetime_format’
import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,infer_datetime_format=False,parse_dates=['date'])
customer_details
Output:
customer_details.info()
Output:
Note:
- Here you can see that the ‘Date’ column is not converted to the ‘DateTime’ object, it’s still of the ‘Object’ type.
One response to “How To Read CSV Files Using Pandas?”
Thanks.