How To Read CSV Files Using Pandas?


How To Read CSV Files Using Pandas?

Table Of Contents:

(1) How To Read CSV Files Using Pandas?​

(2) Examples Of Reading CSV Files.

(1) How To Read CSV Files Using Pandas?​

  • Use the ‘read_csv( )’ method from pandas to read the CSV file.
  • Read a comma-separated values (csv) file into DataFrame.

Syntax:

pandas.read_csv(filepath_or_buffer, *,
     sep=_NoDefault.no_default, 
     delimiter=None, 
     header='infer',
     names=_NoDefault.no_default,
     index_col=None, 
     usecols=None, 
     squeeze=None, 
     prefix=_NoDefault.no_default,
     mangle_dupe_cols=True, 
     dtype=None, 
     engine=None, 
     converters=None, 
     true_values=None, 
     false_values=None,
     skipinitialspace=False, 
     skiprows=None, 
     skipfooter=0, 
     nrows=None, 
     na_values=None, 
     keep_default_na=True, 
     na_filter=True, 
     verbose=False, 
     skip_blank_lines=True, 
     parse_dates=None, 
     infer_datetime_format=False, 
     keep_date_col=False, 
     date_parser=None, 
     dayfirst=False, 
     cache_dates=True, 
     iterator=False, 
     chunksize=None, 
     compression='infer', 
     thousands=None, 
     decimal='.', 
     lineterminator=None, 
     quotechar='"', 
     quoting=0, 
     doublequote=True, 
     escapechar=None, 
     comment=None, 
     encoding=None, 
     encoding_errors='strict', 
     dialect=None, 
     error_bad_lines=None, 
     warn_bad_lines=None, 
     on_bad_lines=None, 
     delim_whitespace=False, 
     low_memory=True, 
     memory_map=False, 
     float_precision=None, 
     storage_options=None)

(2) Examples Of Reading CSV Files Using Pandas?

Example-1: Reading CSV From A CSV File

Parameter:

  • filepath_or_buffer: str, path object or file-like object :
    • Any valid string path is acceptable.
    • The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file.
    • For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.
import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details

Output:

Example-2: Using ‘usecols’ in read_csv()

Parameter:

  • usecols: list-like or callable, optional :
    • Returns a specified list of columns only.
    • You need to specify column names inside a List like ['foo', 'bar', 'baz'].
    • You also can specify the position of the column names like [0, 1, 2].
    • The first Column will be positioned at Zero.

Before:

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details

After: With Column Names

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,usecols=['CustomerID', 'Genre', 'Age'])
customer_details

Note:

  • Here in results we got only three columns, as we have specified the ‘usecols’ attribute.

After: With Column Position Values

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,usecols=[0,1,2])
customer_details

Note:

  • Here I have specified the column position values as [0, 1, 2].

Example-3: Using ‘index_col’ in read_csv()

Parameter:

  • index_col: int, str, sequence of int / str, or False, optional, default None: 
    • If you want to make a particular column(s) as an index, you can use index_col.
    • It can accept single or multiple columns to make an index.

Before:

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details

After: With index_col

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,index_col=['Age','Genre'])
customer_details

Note:

  • Here I have made the ‘Age’ and ‘Genre’ columns index columns.

Example-4: Using ‘nrows’ in read_csv()

Parameter:

  • nrows: int, optional: 
    • Number of rows of file to read. Useful for reading pieces of large files.

With nrows

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,nrows=4)
customer_details

Note:

  • Here you can see only the ‘4’ number of rows are fetched.

Example-5: Using ‘skiprows’ in read_csv()

Parameter:

  • skiprows: list-like, int or callable, optional
    • If you want to skip some particular lines, you can mention the line numbers inside a list, indexed from ‘0’
    • If you have mentioned a single integer value, it will skip the lines from the start of the file. 

With ‘skiprows’

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,skiprows=[1,3])
customer_details

Note:

  • Here you can see CustomerID ‘1’ and ‘3’ has been skipped.

Example-6: Using ‘header’ in read_csv()

Parameter:

  • header: int, list of int, None, default ‘infer’
    • You can use the ‘header’ attribute if you want to keep some particular row as your header.
    • By default, header = 0, which is the file’s first row.

With ‘header=0’

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,header=0)
customer_details

With ‘header=1’

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,header=1)
customer_details

Example-7: Using ‘names’ in read_csv()

Parameter:

  • names: array-like, optional
    • You can mention your own column names using the ‘names’ attribute.
    • If the file contains a header row, then you should explicitly pass header=0 to override the column names.
    • Duplicates in this list are not allowed.

With ‘names’

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,names=['A','B','C','D','E'])
customer_details

Note:

  • Here I already have the header row[‘CustomerID’, ‘Genre’, ‘Age’, ‘Annual_Income_(k$)’, ‘Spending_Score’].
  • To ignore this row you should mention header = 0.

With ‘names and header=0’

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,header=0,names=['A','B','C','D','E'])
customer_details

Note:

  • Here you can see that the header row,   [‘CustomerID’, ‘Genre’, ‘Age’, ‘Annual_Income_(k$)’, ‘Spending_Score’] has gone

Example-8: Using ‘dtype’ in read_csv()

Parameter:

  • dtype: Type name or dict of column -> type, optional
    • You can explicitly specify the data types of the columns, by using the ‘dtype’ attribute.
    • E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} 

Without ‘dtype’

import pandas as pd
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details

print(customer_details['CustomerID'].dtype)
print(customer_details['Genre'].dtype)
print(customer_details['Age'].dtype)
print(customer_details['Annual_Income_(k$)'].dtype)
print(customer_details['Spending_Score'].dtype)

With ‘dtype’

import pandas as pd
import numpy as np
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path, dtype={'CustomerID':np.int32, 'Genre':str, 
'Age':np.int32, 'Annual_Income_(k$)':np.int32, 'Spending_Score':np.int32})

print(customer_details['CustomerID'].dtype)
print(customer_details['Genre'].dtype)
print(customer_details['Age'].dtype)
print(customer_details['Annual_Income_(k$)'].dtype)
print(customer_details['Spending_Score'].dtype)

Note:

  • Here you can see that I have changed the integer type from ‘int64’ to ‘int32’.

Example-9: Using ‘converters’ in read_csv()

Parameter:

  • converters: dict, optional
    • If you want to perform some operations on the values of a column, then you can use the ‘converters’ attribute.
    • Here you need to pass a dictionary, the ‘key’ will be the column name and the ‘value’ will be a custom function, which you want to apply to the values.

With ‘converters’

import pandas as pd
import numpy as np
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
fun = lambda x: int(x) / 2
customer_details = pd.read_csv(path,converters={'Age': fun})
customer_details

Example-9: Using ‘na_values’ in read_csv()

Parameter:

  • na_values: scalar, str, list-like, or dict, optional
    • Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values.
    • By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

Without ‘na_values’

import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details

Input:

Output:

Note:

  • Here input file contains [‘NULL’, ‘NA’, ‘NAN’, ‘N/A’] which will be automatically considered as ‘NaN’.
  • But, I also want to consider ‘VOID’ also as  ‘NaN’. To do that I can use the ‘na_values’ attribute.

With ‘na_values’

import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details

Input:

Output:

Note:

  • Here you can see that, all the ‘VOID’ values has been converted to ‘NaN’ values.

Input:

Output:

Note:

  • Here you can see that, all the blank spaces values has been converted to ‘NaN’ values.

Example-10: Using ‘verbose’ in read_csv()

Parameter:

  • verbose: bool, default False
    • The verbose parameter, when set to True prints additional information on reading a CSV file like time taken for:
      • type conversion,
      • memory cleanup, and
      • tokenization.

With ‘verbose’

import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,verbose=True)
customer_details

Output:

Example-10: Using ‘verbose’ in read_csv()

Parameter:

  • parse_dates: bool or list of int or names or list of lists or dict, default: False
    • By default, date columns are represented as objects when loading data from a CSV file.
    • To read the date column correctly, we can use the argument parse_dates to specify a list of date columns.

Without ‘parse_dates’

import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path)
customer_details

Output:

customer_details.info()

Output:

Note:

  • The date column gets read as an object data type using the default read_csv().

With ‘parse_dates’

import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,parse_dates=['date'])
customer_details

Output:

customer_details.info()

Output:

Note:

  • Now the date column gets read as a datetime object data type using the parse_dates attribute.

Example-11: Using ‘infer_datetime_format’ in read_csv()

Parameter:

  •  infer_datetime_format: bool, default False
    • Essentially, Pandas deduces the format of your datetime from the first element(s) and then assumes all other elements in the series will use the same format.
    •  This means Pandas does not need to check multiple formats when attempting to convert a string to datetime.
    • If it is not able to infer DateTime, it will not parse the date.
    • If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.

With ‘infer_datetime_format’

import pandas as pd
import numpy as np
import math
path = "E:\Blogs\Pandas\Documents\Mall_Customers.csv"
customer_details = pd.read_csv(path,infer_datetime_format=False,parse_dates=['date'])
customer_details

Output:

customer_details.info()

Output:

Note:

  • Here you can see that the ‘Date’ column is not converted to the ‘DateTime’ object, it’s still of the ‘Object’ type.

One response to “How To Read CSV Files Using Pandas?”

Leave a Reply

Your email address will not be published. Required fields are marked *