How To Convert DataFrame To HDF File?
Table Of Contents:
- Syntax ‘to_hdf( )’ Method In Pandas.
- Examples ‘to_hdf( )’ Method.
(1) Syntax:
DataFrame.to_hdf(path_or_buf, key, mode='a', complevel=None, complib=None, append=False,
format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None,
data_columns=None, errors='strict', encoding='UTF-8')
Description:
Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.
In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.
Parameters:
- path_or_buf: str or pandas.HDFStore – File path or HDFStore object.
- key: str – An identifier for the group in the store.
- mode: {‘a’, ‘w’, ‘r+’}, default ‘a’ –
Mode to open file:
‘w’: write, a new file is created (an existing file with the same name would be deleted).
‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
‘r+’: similar to ‘a’, but the file must already exist.
- complevel: {0-9}, default None – Specifies a compression level for data. A value of 0 or None disables compression.
- complib: {‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’ – Specifies the compression library to be used. As of v0.20.2 these additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.
- append: bool, default False – For Table formats, append the input data to the existing.
- format: {‘fixed’, ‘table’, None}, default ‘fixed’:
Possible values:
‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.
‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.
If None, pd.get_option(‘io.hdf.default_format’) is checked, followed by fallback to “fixed”.
- index: bool, default True – Write DataFrame index as a column.
- min_itemsize: dict or int, optional – Map column names to minimum string sizes for columns.
- nan_rep: Any, optional – How to represent null values as str. Not allowed with append=True.
- dropna: bool, default False, optional – Remove missing values.
- data_columns: list of columns or True, optional – List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via data columns. for more information. Applicable only to format=’table’.
- errors: str, default ‘strict’ – Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()
for a full list of options. - encoding: str, default “UTF-8”
(2) Examples Of to_hdf() Method:
Example-1:
import pandas as pd
student = {'Name':['Subrat','Abhispa','Arpita','Anuradha','Namita'],
'Roll_No':[100,101,102,103,104],
'Subject':['Math','English','Science','History','Commerce'],
'Mark':[95,88,76,73,93]}
student_object = pd.DataFrame(student)
student_object
Output:
![](https://www.praudyog.com/wp-content/uploads/2023/01/282.jpg)
# Converting DataFrame To HDF File
read_students.to_hdf('C:/Users/SuSahoo/Blogs/Files/data.h5', key='read_students', mode='w')
Output:
![](https://www.praudyog.com/wp-content/uploads/2023/01/283.png)
# Reading The HDF File
pd.read_hdf('C:/Users/SuSahoo/Blogs/Files/data.h5', 'read_students')
Output:
![](https://www.praudyog.com/wp-content/uploads/2023/01/282.jpg)