Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

settings: nvs: Fix first write issue with cache #69103

Conversation

PavelVPV
Copy link
Collaborator

Issue:

When the setting nvs cache is disabled and settings_nvs_save is called, the function reads all stored setting name entries from NVS until either finds the desired setting name entry or reaches the last stored setting name entry.

With the settings nvs cache enabled, settings_nvs_save runs through the cached setting name entries first. If the cached entry matches with the desired one, it immediately writes the new setting value to NVS that corresponds to the cached setting name entry.

However, if the setting name entry is not found in the cache (which is the case for a new entry), settings_nvs_save reads all stored setting name entries from NVS again. This means that even if the number of stored entries in the settings is less than the cache size, for each new setting entry to be stored settings_nvs_save will first run through the cache, then read all stored setting name entries from NVS and only then will pick the next free name id for this new setting name entry and will finally store the new setting entry.

This makes the cache ineffiсient for every new entry to be stored even when the cache size is always able to keep all setting entries that will be stored in NVS.

Use-case:

In the Bluetooth mesh there is a Replay Protection List which keeps sequence numbers of all nodes it received messages from. The RPL is stored persistently in NVS. The setting name entry is the source address of the node and the setting value entry is the sequence number. The common use case is when RPL is quite big (for example, 255 entries).

With the current settings nvs cache implementation, every time the node stores a new RPL entry in settings (which is the first received message from a particular source address), settings_nvs_save will always check the cache first, then also read all stored entries in NVS and only then will figure out that this is a new entry. With every new RPL entry to be stored this search time increases. This behavior results in much worse performance in comparison with when the corresponding entry was already stored. E.g. on nRF52840, with bare minimal mesh stack configuration, when the cache is bigger than number of stored entries or close to it, storing of 255 RPL entries takes ~25 seconds. The time of subsequent store of 255 RPL entires is ~2 seconds with the cache.

Solution:

This commit improves the behavior of the first write by bypassing the reading from NVS if the following conditions are met:

  1. settings_nvs_load was called,
  2. the cache was not overflowed (bigger than the number of stored entries).

As long as these 2 conditiones are met, it is safe to skip reading from NVS, pick the next free name id and write the value immediately.

@zephyrbot zephyrbot added the area: Settings Settings subsystem label Feb 16, 2024
@PavelVPV
Copy link
Collaborator Author

Without the fix (first write ~25 seconds):

*** Booting Zephyr OS build v3.6.0-rc2-113-gaa28f00370a7 ***
Initializing...
Bluetooth initialized
Mesh initialized
Self-provisioning with address 0x0001
Provisioned and configured!
*** storing over 255 RPL entries completed ***
total calculated: 24747ms, total measured: 24756ms
entry max: 348ms, entry min: 13ms, entry middle: 277ms
**********************************************
*** storing over 255 RPL entries completed ***
total calculated: 1678ms, total measured: 1686ms
entry max: 9ms, entry min: 4ms, entry middle: 4ms
**********************************************
*** storing over 255 RPL entries completed ***
total calculated: 1793ms, total measured: 1804ms
entry max: 99ms, entry min: 3ms, entry middle: 5ms
**********************************************

With the fix (first write ~4 seconds):

*** Booting Zephyr OS build v3.6.0-rc2-113-gaa28f00370a7 ***
Initializing...
Bluetooth initialized
Mesh initialized
Self-provisioning with address 0x0001
Provisioned and configured!
*** storing over 255 RPL entries completed ***
total calculated: 4121ms, total measured: 4127ms
entry max: 109ms, entry min: 13ms, entry middle: 18ms
**********************************************
*** storing over 255 RPL entries completed ***
total calculated: 1701ms, total measured: 1703ms
entry max: 19ms, entry min: 4ms, entry middle: 4ms
**********************************************
*** storing over 255 RPL entries completed ***
total calculated: 1780ms, total measured: 1787ms
entry max: 98ms, entry min: 4ms, entry middle: 5ms
**********************************************

@PavelVPV
Copy link
Collaborator Author

@Laczen, when a new entry is stored in settings, it is cached regardless of the hash collision (settings_nvs_cache_add always adds new entry). Until cache_next wraps around, provided that all settings were loaded before, the cache holds all stored entries. Therefore, the second can only happen if the cache was overflowed.

@PavelVPV
Copy link
Collaborator Author

when a new entry is stored in settings, it is cached regardless of the hash collision (settings_nvs_cache_add always adds new entry).

The same applies for loading because loading also uses settings_nvs_cache_add.

@PavelVPV
Copy link
Collaborator Author

What I think I should change is to set cf->loaded here before brake:

if (name_id == NVS_NAMECNT_ID) {
break;
}

Because in case of loading with callback which returns an error, setting_nvs_load will exit earlier:
if (load_arg && load_arg->cb) {
rc = load_arg->cb(name_key, len, read_cb, read_cb_arg,
load_arg->param);
} else {

ret = settings_call_set_handler(
name, rc2,
settings_nvs_read_fn, &read_fn_arg,
(void *)arg);
if (ret) {
break;
}

@Laczen
Copy link
Collaborator

Laczen commented Feb 17, 2024

@Laczen, when a new entry is stored in settings, it is cached regardless of the hash collision (settings_nvs_cache_add always adds new entry). Until cache_next wraps around, provided that all settings were loaded before, the cache holds all stored entries. Therefore, the second can only happen if the cache was overflowed.

You are right, that is why I removed my comment.

Issue:

When the setting nvs cache is disabled and `settings_nvs_save` is
called, the function reads all stored setting name entries from NVS
until either finds the desired setting name entry or reaches the last
stored setting name entry.

With the settings nvs cache enabled, `settings_nvs_save` runs through
the cached setting name entries first. If the cached entry matches with
the desired one, it immediately writes the new setting value to NVS
that corresponds to the cached setting name entry.

However, if the setting name entry is not found in the cache (which is
the case for a new entry), `settings_nvs_save` reads all stored setting
name entries from NVS again. This means that even if the number of
stored entries in the settings is less than the cache size, for each
new setting entry to be stored `settings_nvs_save` will first run
through the cache, then read all stored setting name entries from NVS
and only then will pick the next free name id for this new setting name
entry and will finally store the new setting entry.

This makes the cache ineffiсient for every new entry to be stored even
when the cache size is always able to keep all setting entries that will
be stored in NVS.

Use-case:

In the Bluetooth mesh there is a Replay Protection List which keeps
sequence numbers of all nodes it received messages from. The RPL is
stored persistently in NVS. The setting name entry is the source
address of the node and the setting value entry is the sequence number.
The common use case is when RPL is quite big (for example, 255 entries).

With the current settings nvs cache implementation, every time the node
stores a new RPL entry in settings (which is the first received message
from a particular source address), `settings_nvs_save` will always check
the cache first, then also read all stored entries in NVS and only then
will figure out that this is a new entry. With every new RPL entry to be
stored this search time increases. This behavior results in much worse
performance in comparison with when the corresponding entry was already
stored. E.g. on nRF52840, with bare minimal mesh stack configuration,
when the cache is bigger than number of stored entries or close to it,
storing of 255 RPL entries takes ~25 seconds. The time of subsequent
store of 255 RPL entires is ~2 seconds with the cache.

Solution:

This commit improves the behavior of the first write by bypassing the
reading from NVS if the following conditions are met:
1. `settings_nvs_load` was called,
2. the cache was not overflowed (bigger than the number of stored
entries).

As long as these 2 conditiones are met, it is safe to skip reading from
NVS, pick the next free name id and write the value immediately.

Signed-off-by: Pavel Vasilyev <[email protected]>
@PavelVPV PavelVPV force-pushed the fix_settings_nvs_first_write_cache_issue_public branch from 5d1548d to e94bb27 Compare February 17, 2024 20:30
#if CONFIG_SETTINGS_NVS_NAME_CACHE
uint16_t cached = 0;

cf->loaded = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will disable the optimization if settings_load_subtree_direct with a handler that breaks the traversal before reaching NVS_NAMECNT_ID is used. It's still better than the existing behavior, so it's not a blocking comment, but just letting you know.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, true. And also if an app loads a subtree (directly or not) several times, the cache will contain duplicated entries. The only way to make cache work correctly is to only use settings_load() unfortunately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add info on that behavior to documentation?
Should we log an issue for that to be solved at some point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding an issue I'm not certain. The cache kind of does what it is asked to do. An application should just be aware of this. I think documenting this behavior should be enough. I'll need to allocate a time to do this though.

@fabiobaltieri fabiobaltieri merged commit a35bb66 into zephyrproject-rtos:main Feb 26, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Settings Settings subsystem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants