Using HDF5 with NumPy Tables for Efficient Data Storage and Retrieval

Based on your specifications, I’ll provide a final answer that implements the code in Python.

Code Implementation

import numpy as np
import tables

# Define the dataset
data_dict = {
    'Form': ['SUV', 'Truck'],
    'Make': ['Ford', 'Chevy'],
    'Color': ['Red', 'Blue'],
    'Driver_age': [25, 30],
    'Data': [[1.0, 2.0], [3.0, 4.0]]
}

# Define the NumPy dtype for the table
recarr_dt = np.dtype([
    ('Form', 'S10'),
    ('Make', 'S10'),
    ('Color', 'S10'),
    ('Driver_age', int),
    ('Data', float, (2, 2))
])

nrows = max(len(v) for v in data_dict.values())

# Initialize the table with zeros
recarr = np.zeros(shape=(nrows,), dtype=recarr_dt)

# Fill the table with data
for k1, v1 in data_dict.items():
    for k2, v2 in v1.items():
        recarr[k2][k1] = v2

# Create an HDF5 file and write the table to it
with tables.File('hdf5_table.h5', 'w') as h5w:
    h5w.create_table('/', 'test', obj=recarr)

# Open the HDF5 file and read data from it
with tables.File('hdf5_table.h5', 'r') as h5r:
    data_tbl = h5r.root.test

    # Search for rows where Form is 'SUV' and Driver_age is between 20 and 40
    condition = '(Form == b"SUV") & (Driver_age >= 20) & (Driver_age <= 40)'
    data_arr = data_tbl.read_where(condition)
    print(f'\nFor search condition: {condition}')
    print(f'# of rows found: {data_arr.shape}')
    for row in data_arr:
        print(row)

    # Search for rows where Form is 'SUV' and Make is 'Ford'
    condition = '(Form == b"SUV") & (Make == b"Ford")'
    data_arr = data_tbl.read_where(condition)
    print(f'\nFor search condition: {condition}')
    print(f'# of rows found: {data_arr.shape}')
    for row in data_arr:
        print(row)

This code creates an HDF5 file with a table containing the dataset, and then opens the file to read data from it using the read_where method. It provides two search examples that demonstrate how to use this method.

Note: The HDF5 file is created in the same directory as the Python script. If you want to create the file in a different location, you’ll need to specify the full path to the file when creating it.


Last modified on 2025-02-21