How To Convert DataFrame To Parquet Format?
Table Of Contents:
- Syntax ‘to_parquet( )’ Method In Pandas.
- Examples ‘to_parquet( )’ Method.
(1) Syntax:
DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None,
partition_cols=None, storage_options=None, **kwargs)
Description:
Write a DataFrame to the binary parquet format.
This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.
Parameters:
- path: str, path object, file-like object, or None, default None –
String, path object (implementing
os.PathLike[str]
), or file-like object implementing a binarywrite()
function. If None, the result is returned as bytes. If a string or path, it will be used as Root Directory path when writing a partitioned dataset. - engine: {‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’ –
Parquet library to use. If ‘auto’, then the option
io.parquet.engine
is used. The defaultio.parquet.engine
behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable. - compression: {‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’ –
Name of the compression to use. Use
None
for no compression. - index: bool, default None – If
True
, include the dataframe’s index(es) in the file output. IfFalse
, they will not be written to the file. IfNone
, similar toTrue
the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output. - partition_cols: list, optional, default None –
Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.
- storage_options: dict, optional – Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Request
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open
. Please seefsspec
andurllib
for more details, and for more examples on storage options refer here. - **kwargs – Additional arguments passed to the parquet library. See pandas io for more details.
Returns
- bytes if no path argument is provided else None
(2) Examples Of to_parquet() Method:
Example-1:
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
df
Output:
# Converting DataFrame To Parquet
df.to_parquet('df.parquet.gzip',
compression='gzip',engine = 'pyarrow')
# Reading The Parquet File
pd.read_parquet('df.parquet.gzip')