Skip to content

Commit

Permalink
Update Intro VDS and SWMR pages
Browse files Browse the repository at this point in the history
  • Loading branch information
bmribler committed Apr 21, 2024
1 parent 4280c2b commit 6495dd7
Show file tree
Hide file tree
Showing 4 changed files with 85 additions and 140 deletions.
141 changes: 58 additions & 83 deletions documentation/hdf5-docs/advanced_topics/intro_SWMR.md
Original file line number Diff line number Diff line change
@@ -1,142 +1,117 @@
---
title: Introduction to Single-Writer\_Multiple-Reader (SWMR)
redirect\_from:

---
##\*\*\* Work-in-Progress \*\*\*

# Introduction to Single-Writer\_Multiple-Reader (SWMR)
Introduction to Single-Writer\_Multiple-Reader (SWMR)

Introduction to SWMR
The Single-Writer / Multiple-Reader (SWMR) feature enables multiple processes to read an HDF5 file while it is being written to (by a single process) without using locks or requiring communication between processes.
--------------------

The Single-Writer / Multiple-Reader (SWMR) feature enables multiple processes to read an HDF5 file while it is being written to (by a single process) without using locks or requiring communication between processes.

![tutr-swmr1.png](tutr-swmr1.png)

All communication between processes must be performed via the HDF5 file. The HDF5 file under SWMR access must reside on a system that complies with POSIX write() semantics.

The basic engineering challenge for this to work was to ensure that the readers of an HDF5 file always see a coherent (though possibly not up to date) HDF5 file.

The issue is that when writing data there is information in the metadata cache in addition to the physical file on disk:
The issue is that when writing data there is information in the metadata cache in addition to the physical file on disk:

However, the readers can only see the state contained in the physical file:
![tutr-swmr2.png](tutr-swmr2.png)

However, the readers can only see the state contained in the physical file:

![tutr-swmr3.png](tutr-swmr3.png)

The SWMR solution implements dependencies on when the metadata can be flushed to the file. This ensures that metadata cache flush operations occur in the proper order, so that there will never be internal file pointers in the physical file that point to invalid (unflushed) file addresses.

A beneficial side effect of using SWMR access is better fault tolerance. It is more difficult to corrupt a file when using SWMR.


Documentation
SWMR User's Guide

PDF

HDF5 Library APIs
Page:
H5F\_START\_SWMR\_WRITE — Enables SWMR writing mode for a file
Page:
H5DO\_APPEND — Appends data to a dataset along a specified dimension
Page:
H5P\_SET\_OBJECT\_FLUSH\_CB — Sets a callback function to invoke when an object flush occurs in the file
Page:
H5P\_GET\_OBJECT\_FLUSH\_CB — Retrieves the object flush property values from the file access property list
Page:
H5O\_DISABLE\_MDC\_FLUSHES — Prevents metadata entries for an HDF5 object from being flushed from the metadata cache to storage
Page:
H5O\_ENABLE\_MDC\_FLUSHES — Enables flushing of dirty metadata entries from a file’s metadata cache
Page:
H5O\_ARE\_MDC\_FLUSHES\_DISABLED — Determines if an HDF5 object has had flushes of metadata entries disabled
Tools
Page:
h5watch — Outputs new records appended to a dataset as the dataset grows
Page:
h5format\_convert — Converts the layout format version and chunked indexing types of datasets created with HDF5-1.10 so that applications built with HDF5-1.8 can access them
Page:
h5clear — Clears superblock status\_flags field, removes metadata cache image, prints EOA and EOF, or sets EOA of a file
Design Documents
-------------

### [SWMR User's Guide](https://docs.hdfgroup.org/hdf5/tn/HDF5_SWMR_User_Guide.pdf)

### HDF5 Library APIs

* [H5F\_START\_SWMR\_WRITE](https://docs.hdfgroup.org/hdf5/develop/group___s_w_m_r.html#ga159be34fbe7e4a959589310ef0196dfe) — Enables SWMR writing mode for a file
* [H5DO\_APPEND](https://docs.hdfgroup.org/hdf5/develop/group___h5_d_o.html#ga316caac160af15192e0c78228667341e) — Appends data to a dataset along a specified dimension
* H5P\_SET\_OBJECT\_FLUSH\_CB — Sets a callback function to invoke when an object flush occurs in the file
* H5P\_GET\_OBJECT\_FLUSH\_CB — Retrieves the object flush property values from the file access property list
* H5O\_DISABLE\_MDC\_FLUSHES — Prevents metadata entries for an HDF5 object from being flushed from the metadata cache to storage
* H5O\_ENABLE\_MDC\_FLUSHES — Enables flushing of dirty metadata entries from a file’s metadata cache
* H5O\_ARE\_MDC\_FLUSHES\_DISABLED — Determines if an HDF5 object has had flushes of metadata entries disabled

### Tools

* h5watch — Outputs new records appended to a dataset as the dataset grows
* h5format\_convert — Converts the layout format version and chunked indexing types of datasets created with HDF5-1.10 so that applications built with HDF5-1.8 can access them
* h5clear — Clears superblock status\_flags field, removes metadata cache image, prints EOA and EOF, or sets EOA of a file

### Design Documents

Error while fetching page properties report data:

Programming Model
-----------------

Please be aware that the SWMR feature requires that an HDF5 file be created with the latest file format. See H5P\_SET\_LIBVER\_BOUNDS for more information.

To use SWMR follow the the general programming model for creating and accessing HDF5 files and objects along with the steps described below.

SWMR Writer:
### SWMR Writer:

The SWMR writer either opens an existing file and objects or creates them as follows.

Open an existing file:

Call H5Fopen using the H5F\_ACC\_SWMR\_WRITE flag.
Begin writing datasets.
Periodically flush data.
Create a new file:
Call H5Fopen using the H5F\_ACC\_SWMR\_WRITE flag. Begin writing datasets. Periodically flush data. Create a new file:

Call H5Fcreate using the latest file format. Create groups, datasets and attributes, and then close the attributes. Call H5F\_START\_SWMR\_WRITE to start SWMR access to the file. Periodically flush data.

Call H5Fcreate using the latest file format.
Create groups, datasets and attributes, and then close the attributes.
Call H5F\_START\_SWMR\_WRITE to start SWMR access to the file.
Periodically flush data.
Example Code:
#### Example Code:

Create the file using the latest file format property:

fapl = H5Pcreate (H5P\_FILE\_ACCESS);
status = H5Pset\_libver\_bounds (fapl, H5F\_LIBVER\_LATEST, H5F\_LIBVER\_LATEST);
fid = H5Fcreate (filename, H5F\_ACC\_TRUNC, H5P\_DEFAULT, fapl);
[Create objects (files, datasets, ...). Close any attributes and named datatype objects. Groups and datasets may remain open before starting SWMR access to them.]
fapl = H5Pcreate (H5P\_FILE\_ACCESS); status = H5Pset\_libver\_bounds (fapl, H5F\_LIBVER\_LATEST, H5F\_LIBVER\_LATEST); fid = H5Fcreate (filename, H5F\_ACC\_TRUNC, H5P\_DEFAULT, fapl); \[Create objects (files, datasets, ...). Close any attributes and named datatype objects. Groups and datasets may remain open before starting SWMR access to them.\]

Start SWMR access to the file:

status = H5Fstart\_swmr\_write (fid);
Reopen the datasets and start writing, periodically flushing data:
status = H5Fstart\_swmr\_write (fid); Reopen the datasets and start writing, periodically flushing data:

status = H5Dwrite (dset\_id, ...); status = H5Dflush (dset\_id);

### SWMR Reader:

status = H5Dwrite (dset\_id, ...);
status = H5Dflush (dset\_id);
SWMR Reader:
The SWMR reader must continually poll for new data:

Call H5Fopen using the H5F\_ACC\_SWMR\_READ flag. Poll, checking the size of the dataset to see if there is new data available for reading. Read new data, if any.

Call H5Fopen using the H5F\_ACC\_SWMR\_READ flag.
Poll, checking the size of the dataset to see if there is new data available for reading.
Read new data, if any.
Example Code:
#### Example Code:

Open the file using the SWMR read flag:

fid = H5Fopen (filename, H5F\_ACC\_RDONLY | H5F\_ACC\_SWMR\_READ, H5P\_DEFAULT);
Open the dataset and then repeatedly poll the dataset, by getting the dimensions, reading new data, and refreshing:
fid = H5Fopen (filename, H5F\_ACC\_RDONLY | H5F\_ACC\_SWMR\_READ, H5P\_DEFAULT); Open the dataset and then repeatedly poll the dataset, by getting the dimensions, reading new data, and refreshing:

dset\_id = H5Dopen (...);
space\_id = H5Dget\_space (...);
while (...) {
status = H5Dread (dset\_id, ...);
status = H5Drefresh (dset\_id);
space\_id = H5Dget\_space (...);
}
dset\_id = H5Dopen (...); space\_id = H5Dget\_space (...); while (...) { status = H5Dread (dset\_id, ...); status = H5Drefresh (dset\_id); space\_id = H5Dget\_space (...); }

Limitations and Scope
---------------------

An HDF5 file under SWMR access must reside on a system that complies with POSIX write() semantics. It is also limited in scope as follows:

The writer process is only allowed to modify raw data of existing datasets by;

Appending data along any unlimited dimension.
Modifying existing data
The following operations are not allowed (and the corresponding HDF5 files will fail):
Appending data along any unlimited dimension. Modifying existing data The following operations are not allowed (and the corresponding HDF5 files will fail):

The writer cannot add new objects to the file.
The writer cannot delete objects in the file.
The writer cannot modify or append data with variable length, string or region reference datatypes.
File space recycling is not allowed. As a result the size of a file modified by a SWMR writer may be larger than a file modified by a non-SWMR writer.
The writer cannot add new objects to the file. The writer cannot delete objects in the file. The writer cannot modify or append data with variable length, string or region reference datatypes. File space recycling is not allowed. As a result the size of a file modified by a SWMR writer may be larger than a file modified by a non-SWMR writer.

Tools for Working with SWMR
---------------------------

Two new tools, h5watch and h5clear, are available for use with SWMR. The other HDF5 utilities have also been modified to recognize SWMR:

The h5watch tool allows a user to monitor the growth of a dataset.
The h5clear tool clears the status flags in the superblock of an HDF5 file.
The rest of the HDF5 tools will exit gracefully but not work with SWMR otherwise.
The h5watch tool allows a user to monitor the growth of a dataset. The h5clear tool clears the status flags in the superblock of an HDF5 file. The rest of the HDF5 tools will exit gracefully but not work with SWMR otherwise.

Programming Example
-------------------

A good example of using SWMR is included with the HDF5 tests in the source code. You can run it while reading the file it creates. If you then interrupt the application and reader and look at the resulting file, you will see that the file is still valid. Follow these steps:

Download the HDF5-1.10 source code to a local directory on a filesystem (that complies with POSIX write() semantics). Build the software. No special configuration options are needed to use SWMR.
Expand All @@ -149,6 +124,6 @@ In the other window (in the bin/ directory) run h5watch on the file created by u

Interrupt use\_append\_chunk while it is running, and stop h5watch.

Use h5clear to clear the status flags in the superbock of the HDF5 file (use\_append\_chunk.h5).
Use h5clear to clear the status flags in the superblock of the HDF5 file (use\_append\_chunk.h5).

View the file with h5dump. You will see that it is a valid file even though the application did not close properly. It will contain data up to the point that it was interrupted.
81 changes: 25 additions & 56 deletions documentation/hdf5-docs/advanced_topics/intro_VDS.md
Original file line number Diff line number Diff line change
@@ -1,109 +1,78 @@
---
title: Introduction to the Virtual Dataset - VDS
redirect\_from:

---
##\*\*\* Work-in-Progress \*\*\*

# Introduction to the Virtual Dataset - VDS
Introduction to the Virtual Dataset - VDS

The HDF5 Virtual Dataset (VDS) feature enables users to access data in a collection of HDF5 files as a single HDF5 dataset and to use the HDF5 APIs to work with that dataset.

For example, your data may be collected into four files:


![tutrvds-multimgs.png](tutrvds-multimgs.png)

You can map the datasets in the four files into a single VDS that can be accessed just like any other dataset:




![tutrvds-snglimg.png](tutrvds-snglimg.png)

The mapping between a VDS and the HDF5 source datasets is persistent and transparent to an application. If a source file is missing the fill value will be displayed.

See the Virtual (VDS) Documentation for complete details regarding the VDS feature.

The VDS feature was implemented using hyperslab selection (H5S\_SELECT\_HYPERSLAB). See the tutorial on Reading From or Writing to a Subset of a Dataset for more information on selecting hyperslabs.

Programming Model
To create a Virtual Dataset you simply follow the HDF5 programming model and add a few additional API calls to map the source code datasets to the VDS.
Programming Model To create a Virtual Dataset you simply follow the HDF5 programming model and add a few additional API calls to map the source code datasets to the VDS.

Following are the steps for creating a Virtual Dataset:

Create the source datasets that will comprise the VDS
Create the VDS: ‐ Define a datatype and dataspace (can be unlimited)
‐ Define the dataset creation property list (including fill value)
‐ (Repeat for each source dataset) Map elements from the source dataset to elements of the VDS:
Select elements in the source dataset (source selection)
Select elements in the virtual dataset (destination selection)
Map destination selections to source selections (see Functions for Working with a VDS)
Create the source datasets that will comprise the VDS Create the VDS: ‐ Define a datatype and dataspace (can be unlimited) ‐ Define the dataset creation property list (including fill value) ‐ (Repeat for each source dataset) Map elements from the source dataset to elements of the VDS: Select elements in the source dataset (source selection) Select elements in the virtual dataset (destination selection) Map destination selections to source selections (see Functions for Working with a VDS)

‐ Call H5Dcreate using the properties defined above
Access the VDS as a regular HDF5 dataset
Close the VDS when finished
‐ Call H5Dcreate using the properties defined above Access the VDS as a regular HDF5 dataset Close the VDS when finished

Functions for Working with a VDS
The H5P\_SET\_VIRTUAL API sets the mapping between virtual and source datasets. This is a dataset creation property list. Using this API will change the layout of the dataset to H5D\_VIRTUAL. As with specifying any dataset creation property list, an instance of the property list is created, modified, passed into the dataset creation call and then closed:
Functions for Working with a VDS The H5P\_SET\_VIRTUAL API sets the mapping between virtual and source datasets. This is a dataset creation property list. Using this API will change the layout of the dataset to H5D\_VIRTUAL. As with specifying any dataset creation property list, an instance of the property list is created, modified, passed into the dataset creation call and then closed:

dcpl = H5Pcreate (H5P\_DATASET\_CREATE);

src\_space = H5screate\_simple ...
status = H5Sselect\_hyperslab (space, ...
status = H5Pset\_virtual (dcpl, space, SRC\_FILE[i], SRC\_DATASET[i], src\_space);
dcpl = H5Pcreate (H5P\_DATASET\_CREATE);

dset = H5Dcreate2 (file, DATASET, H5T\_NATIVE\_INT, space, H5P\_DEFAULT, dcpl, H5P\_DEFAULT);

status = H5Pclose (dcpl);
There are several other APIs introduced with Virtual Datasets, including query functions. For details see the complete list of HDF5 library APIs that support Virtual Datasets
src\_space = H5screate\_simple ... status = H5Sselect\_hyperslab (space, ... status = H5Pset\_virtual (dcpl, space, SRC\_FILE\[i\], SRC\_DATASET\[i\], src\_space);

dset = H5Dcreate2 (file, DATASET, H5T\_NATIVE\_INT, space, H5P\_DEFAULT, dcpl, H5P\_DEFAULT);

Limitations
This feature requires HDF5-1.10.
The number of source datasets is unlimited. However, there is a limit on the size of each source dataset.
status = H5Pclose (dcpl); There are several other APIs introduced with Virtual Datasets, including query functions. For details see the complete list of HDF5 library APIs that support Virtual Datasets

Limitations This feature requires HDF5-1.10. The number of source datasets is unlimited. However, there is a limit on the size of each source dataset.

Programming Examples
Example 1
This example creates three HDF5 files, each with a one-dimensional dataset of 6 elements. The datasets in these files are the source datasets that are then used to create a 4 x 6 Virtual Dataset with a fill value of -1. The first three rows of the VDS are mapped to the data from the three source datasets as shown below:



Programming Examples Example 1 This example creates three HDF5 files, each with a one-dimensional dataset of 6 elements. The datasets in these files are the source datasets that are then used to create a 4 x 6 Virtual Dataset with a fill value of -1. The first three rows of the VDS are mapped to the data from the three source datasets as shown below:

![tutrvds-ex.png](tutrvds-ex.png)

In this example the three source datasets are mapped to the VDS with this code:

src\_space = H5Screate\_simple (RANK1, dims, NULL);
for (i = 0; i < 3; i++) {
start[0] = (hsize\_t)i;
/* Select i-th row in the virtual dataset; selection in the source datasets is the same. */
status = H5Sselect\_hyperslab (space, H5S\_SELECT\_SET, start, NULL, count, block);
status = H5Pset\_virtual (dcpl, space, SRC\_FILE[i], SRC\_DATASET[i], src\_space);
start[0] = (hsize\_t)i;
/* Select i-th row in the virtual dataset; selection in the source datasets is the same. */
status = H5Sselect\_hyperslab (space, H5S\_SELECT\_SET, start, NULL, count, block);
status = H5Pset\_virtual (dcpl, space, SRC\_FILE[i], SRC\_DATASET[i], src\_space);
}


After the VDS is created and closed, it is reopened. The property list is then queried to determine the layout of the dataset and its mappings, and the data in the VDS is read and printed.

This example is in the HDF5 source code and can be obtained from here:

C Example

For details on compiling an HDF5 application: [ Compiling HDF5 Applications ]
For details on compiling an HDF5 application: \[ Compiling HDF5 Applications \]

Example 2
This example shows how to use a C-style printf statement for specifying multiple source datasets as one virtual dataset. Only one mapping is required. In other words only one H5P\_SET\_VIRTUAL call is needed to map multiple datasets. It creates a 2-dimensional unlimited VDS. Then it re-opens the file, makes queries, and reads the virtual dataset.
Example 2 This example shows how to use a C-style printf statement for specifying multiple source datasets as one virtual dataset. Only one mapping is required. In other words only one H5P\_SET\_VIRTUAL call is needed to map multiple datasets. It creates a 2-dimensional unlimited VDS. Then it re-opens the file, makes queries, and reads the virtual dataset.

The source datasets are specified as A-0, A-1, A-2, and A-3. These are mapped to the virtual dataset with one call:

status = H5Pset\_virtual (dcpl, vspace, SRCFILE, "/A-%b", src\_space);


The %b indicates that the block count of the selection in the dimension should be used.

C Example

For details on compiling an HDF5 application: [ Compiling HDF5 Applications ]

For details on compiling an HDF5 application: \[ Compiling HDF5 Applications \]

Using h5dump with a VDS
The h5dump utility can be used to view a VDS. The h5dump output for a VDS looks exactly like that for any other dataset. If h5dump cannot find a source dataset then the fill value will be displayed.
Using h5dump with a VDS The h5dump utility can be used to view a VDS. The h5dump output for a VDS looks exactly like that for any other dataset. If h5dump cannot find a source dataset then the fill value will be displayed.

You can determine that a dataset is a VDS by looking at its properties with h5dump -p. It will display each source dataset mapping, beginning with Mapping 0. Below is an excerpt of the output of h5dump -p on the vds.h5 file created in Example 1.You can see that the entire source file a.h5 is mapped to the first row of the /VDS dataset:

Loading

0 comments on commit 6495dd7

Please sign in to comment.