Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docs for editing NWB files #1836

Merged
merged 17 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 161 additions & 0 deletions docs/gallery/advanced_io/plot_editing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
"""
.. _editing:

Editing NWB files
=================

This tutorial demonstrates how to edit NWB files in-place to make small changes to
existing containers. To add or remove containers from an NWB file, see
:ref:`modifying_data`. How and whether it is possible to edit an NWB file depends on the
storage backend and the type of edit.

.. warning::

Manually editing an existing NWB file can make the file invalid if you are not
careful. We highly recommend making a copy before editing and running a validation
check on the file after editing it. See :ref:`validating`.


Editing datasets
----------------
When reading an HDF5 NWB file, PyNWB exposes :py:class:`h5py.Dataset` objects, which can
be edited in place. For this to work, you must open the file in read/write mode
(``"r+"`` or ``"a"``).

First, let's create an NWB file with data:
"""
from pynwb import NWBHDF5IO, NWBFile, TimeSeries
from datetime import datetime
from dateutil.tz import tzlocal
import numpy as np

nwbfile = NWBFile(
session_description="my first synthetic recording",
identifier="EXAMPLE_ID",
session_start_time=datetime.now(tzlocal()),
session_id="LONELYMTN",
)

nwbfile.add_acquisition(
TimeSeries(
name="synthetic_timeseries",
description="Random values",
data=np.random.randn(100, 100),
unit="m",
rate=10e3,
)
)

with NWBHDF5IO("test_edit.nwb", "w") as io:
io.write(nwbfile)

##############################################
# Now, let's edit the values of the dataset

with NWBHDF5IO("test_edit.nwb", "r+") as io:
nwbfile = io.read()
nwbfile.acquisition["synthetic_timeseries"].data[:10] = 0.0


##############################################
# You can edit the attributes of that dataset through the ``attrs`` attribute:

with NWBHDF5IO("test_edit.nwb", "r+") as io:
nwbfile = io.read()
nwbfile.acquisition["synthetic_timeseries"].data.attrs["unit"] = "volts"

##############################################
# Changing the shape of dataset
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Whether it is possible to change the shape of a dataset depends on how the dataset was
# created. If the dataset was created with a flexible shape, then it is possible to
# change in-place. Creating a dataset with a flexible shape is done by specifying the
# ``maxshape`` argument of the :py:class:`~hdmf.backends.hdf5.h5_utils.H5DataIO` class
# constructor. Using a ``None`` value for a component of the ``maxshape`` tuple allows
# the size of the corresponding dimension to grow, such that is can be be reset arbitrarily long
# in that dimension. Chunking is required for datasets with flexible shapes. Setting ``maxshape``,
# hence, automatically sets chunking to ``True``, if not specified.
#
# First, let's create an NWB file with a dataset with a flexible shape:

from hdmf.backends.hdf5.h5_utils import H5DataIO

nwbfile = NWBFile(
session_description="my first synthetic recording",
identifier="EXAMPLE_ID",
session_start_time=datetime.now(tzlocal()),
session_id="LONELYMTN",
)

data_io = H5DataIO(data=np.random.randn(100, 100), maxshape=(None, 100))

nwbfile.add_acquisition(
TimeSeries(
name="synthetic_timeseries",
description="Random values",
data=data_io,
unit="m",
rate=10e3,
)
)

with NWBHDF5IO("test_edit2.nwb", "w") as io:
io.write(nwbfile)

##############################################
# The ``None``value in the first component of ``maxshape`` means that the
# the first dimension of the dataset is unlimited. By setting the second dimension
# of ``maxshape`` to ``100``, that dimension is fixed to be no larger than ``100``.
# If you do not specify a``maxshape``, then the shape of the dataset will be fixed
# to the shape that the dataset was created with. Here, you can change the shape of
# the first dimension of this dataset.


with NWBHDF5IO("test_edit2.nwb", "r+") as io:
nwbfile = io.read()
nwbfile.acquisition["synthetic_timeseries"].data.resize((200, 100))

##############################################
# This will change the shape of the dataset in-place. If you try to change the shape of
# a dataset with a fixed shape, you will get an error.
#
# .. note::
# There are several types of dataset edits that cannot be done in-place: changing the
# shape of a dataset with a fixed shape, or changing the datatype, compression,
# chunking, max-shape, or fill-value of a dataset. For any of these, we recommend using
# the :py:class:`pynwb.NWBHDF5IO.export` method to export the data to a new file. See
# :ref:`modifying_data` for more information.
#
# Editing groups
# --------------
# Editing of groups is not yet supported in PyNWB.
# To edit the attributes of a group, open the file and edit it using :py:mod:`h5py`:
bendichter marked this conversation as resolved.
Show resolved Hide resolved

import h5py

with h5py.File("test_edit.nwb", "r+") as f:
f["acquisition"]["synthetic_timeseries"].attrs["description"] = "Random values in volts"

##############################################
# .. warning::
# Be careful not to edit values that will bring the file out of compliance with the
# NWB specification.
#
# Renaming groups and datasets
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Rename groups and datasets in-place using the :py:meth:`~h5py.Group.move` method. For example, to rename
# the ``"synthetic_timeseries"`` group:

with h5py.File("test_edit.nwb", "r+") as f:
f["acquisition"].move("synthetic_timeseries", "synthetic_timeseries_renamed")

##############################################
# You can use this same technique to move a group or dataset to a different location in
# the file. For example, to move the ``"synthetic_timeseries_renamed"`` group to the
# ``"analysis"`` group:

with h5py.File("test_edit.nwb", "r+") as f:
f["acquisition"].move(
"synthetic_timeseries_renamed",
"/analysis/synthetic_timeseries_renamed",
)
28 changes: 5 additions & 23 deletions docs/gallery/general/add_remove_containers.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,31 +70,13 @@
# file path, and it is not possible to remove objects from an NWB file. You can use the
# :py:meth:`NWBHDF5IO.export <pynwb.NWBHDF5IO.export>` method, detailed below, to modify an NWB file in these ways.
#
# .. warning::
#
# NWB datasets that have been written to disk are read as :py:class:`h5py.Dataset <h5py.Dataset>` objects.
# Directly modifying the data in these :py:class:`h5py.Dataset <h5py.Dataset>` objects immediately
# modifies the data on disk
# (the :py:meth:`NWBHDF5IO.write <pynwb.NWBHDF5IO.write>` method does not need to be called and the
# :py:class:`~pynwb.NWBHDF5IO` instance does not need to be closed). Directly modifying datasets in this way
# can lead to files that do not validate or cannot be opened, so exercise caution when using this method.
# Note: only chunked datasets or datasets with ``maxshape`` set can be resized.
# See the `h5py chunked storage documentation <https://docs.h5py.org/en/stable/high/dataset.html#chunked-storage>`_
# for more details.

###############################################################################
# .. note::
#
# It is not possible to modify the attributes (fields) of an NWB container in memory.

###############################################################################
# Exporting a written NWB file to a new file path
# ---------------------------------------------------
# -----------------------------------------------
# Use the :py:meth:`NWBHDF5IO.export <pynwb.NWBHDF5IO.export>` method to read data from an existing NWB file,
# modify the data, and write the modified data to a new file path. Modifications to the data can be additions or
# removals of objects, such as :py:class:`~pynwb.base.TimeSeries` objects. This is especially useful if you
# have raw data and processed data in the same NWB file and you want to create a new NWB file with all of the
# contents of the original file except for the raw data for sharing with collaborators.
# have raw data and processed data in the same NWB file and you want to create a new NWB file with all the contents of
# the original file except for the raw data for sharing with collaborators.
#
# To remove existing containers, use the :py:class:`~hdmf.utils.LabelledDict.pop` method on any
# :py:class:`~hdmf.utils.LabelledDict` object, such as ``NWBFile.acquisition``, ``NWBFile.processing``,
Expand Down Expand Up @@ -200,7 +182,7 @@
export_io.export(src_io=read_io, nwbfile=read_nwbfile)

###############################################################################
# More information about export
# ---------------------------------
# For more information about the export functionality, see :ref:`export`
# and the PyNWB documentation for :py:meth:`NWBHDF5IO.export <pynwb.NWBHDF5IO.export>`.
#
# For more information about editing a file in place, see :ref:`editing`.
Loading