settings: nvs: Fix first write issue with cache #69103

PavelVPV · 2024-02-16T20:45:19Z

Issue:

When the setting nvs cache is disabled and settings_nvs_save is called, the function reads all stored setting name entries from NVS until either finds the desired setting name entry or reaches the last stored setting name entry.

With the settings nvs cache enabled, settings_nvs_save runs through the cached setting name entries first. If the cached entry matches with the desired one, it immediately writes the new setting value to NVS that corresponds to the cached setting name entry.

However, if the setting name entry is not found in the cache (which is the case for a new entry), settings_nvs_save reads all stored setting name entries from NVS again. This means that even if the number of stored entries in the settings is less than the cache size, for each new setting entry to be stored settings_nvs_save will first run through the cache, then read all stored setting name entries from NVS and only then will pick the next free name id for this new setting name entry and will finally store the new setting entry.

This makes the cache ineffiсient for every new entry to be stored even when the cache size is always able to keep all setting entries that will be stored in NVS.

Use-case:

In the Bluetooth mesh there is a Replay Protection List which keeps sequence numbers of all nodes it received messages from. The RPL is stored persistently in NVS. The setting name entry is the source address of the node and the setting value entry is the sequence number. The common use case is when RPL is quite big (for example, 255 entries).

With the current settings nvs cache implementation, every time the node stores a new RPL entry in settings (which is the first received message from a particular source address), settings_nvs_save will always check the cache first, then also read all stored entries in NVS and only then will figure out that this is a new entry. With every new RPL entry to be stored this search time increases. This behavior results in much worse performance in comparison with when the corresponding entry was already stored. E.g. on nRF52840, with bare minimal mesh stack configuration, when the cache is bigger than number of stored entries or close to it, storing of 255 RPL entries takes ~25 seconds. The time of subsequent store of 255 RPL entires is ~2 seconds with the cache.

Solution:

This commit improves the behavior of the first write by bypassing the reading from NVS if the following conditions are met:

settings_nvs_load was called,
the cache was not overflowed (bigger than the number of stored entries).

As long as these 2 conditiones are met, it is safe to skip reading from NVS, pick the next free name id and write the value immediately.

PavelVPV · 2024-02-16T20:47:04Z

Without the fix (first write ~25 seconds):

*** Booting Zephyr OS build v3.6.0-rc2-113-gaa28f00370a7 ***
Initializing...
Bluetooth initialized
Mesh initialized
Self-provisioning with address 0x0001
Provisioned and configured!
*** storing over 255 RPL entries completed ***
total calculated: 24747ms, total measured: 24756ms
entry max: 348ms, entry min: 13ms, entry middle: 277ms
**********************************************
*** storing over 255 RPL entries completed ***
total calculated: 1678ms, total measured: 1686ms
entry max: 9ms, entry min: 4ms, entry middle: 4ms
**********************************************
*** storing over 255 RPL entries completed ***
total calculated: 1793ms, total measured: 1804ms
entry max: 99ms, entry min: 3ms, entry middle: 5ms
**********************************************

With the fix (first write ~4 seconds):

*** Booting Zephyr OS build v3.6.0-rc2-113-gaa28f00370a7 ***
Initializing...
Bluetooth initialized
Mesh initialized
Self-provisioning with address 0x0001
Provisioned and configured!
*** storing over 255 RPL entries completed ***
total calculated: 4121ms, total measured: 4127ms
entry max: 109ms, entry min: 13ms, entry middle: 18ms
**********************************************
*** storing over 255 RPL entries completed ***
total calculated: 1701ms, total measured: 1703ms
entry max: 19ms, entry min: 4ms, entry middle: 4ms
**********************************************
*** storing over 255 RPL entries completed ***
total calculated: 1780ms, total measured: 1787ms
entry max: 98ms, entry min: 4ms, entry middle: 5ms
**********************************************

PavelVPV · 2024-02-17T10:35:14Z

@Laczen, when a new entry is stored in settings, it is cached regardless of the hash collision (settings_nvs_cache_add always adds new entry). Until cache_next wraps around, provided that all settings were loaded before, the cache holds all stored entries. Therefore, the second can only happen if the cache was overflowed.

PavelVPV · 2024-02-17T10:37:40Z

when a new entry is stored in settings, it is cached regardless of the hash collision (settings_nvs_cache_add always adds new entry).

The same applies for loading because loading also uses settings_nvs_cache_add.

PavelVPV · 2024-02-17T10:44:02Z

What I think I should change is to set cf->loaded here before brake:

zephyr/subsys/settings/src/settings_nvs.c

Lines 137 to 139 in 5dad8d7

    
           if (name_id == NVS_NAMECNT_ID) { 
        
           	break; 
        
           }

Because in case of loading with callback which returns an error, setting_nvs_load will exit earlier:

zephyr/subsys/settings/src/settings.c

Lines 206 to 209 in 5dad8d7

    
           if (load_arg && load_arg->cb) { 
        
           	rc = load_arg->cb(name_key, len, read_cb, read_cb_arg, 
        
           			  load_arg->param); 
        
           } else {

zephyr/subsys/settings/src/settings_nvs.c

Lines 191 to 197 in 5dad8d7

    
           ret = settings_call_set_handler( 
        
           	name, rc2, 
        
           	settings_nvs_read_fn, &read_fn_arg, 
        
           	(void *)arg); 
        
           if (ret) { 
        
           	break; 
        
           }

Laczen · 2024-02-17T12:25:04Z

@Laczen, when a new entry is stored in settings, it is cached regardless of the hash collision (settings_nvs_cache_add always adds new entry). Until cache_next wraps around, provided that all settings were loaded before, the cache holds all stored entries. Therefore, the second can only happen if the cache was overflowed.

You are right, that is why I removed my comment.

Issue: When the setting nvs cache is disabled and `settings_nvs_save` is called, the function reads all stored setting name entries from NVS until either finds the desired setting name entry or reaches the last stored setting name entry. With the settings nvs cache enabled, `settings_nvs_save` runs through the cached setting name entries first. If the cached entry matches with the desired one, it immediately writes the new setting value to NVS that corresponds to the cached setting name entry. However, if the setting name entry is not found in the cache (which is the case for a new entry), `settings_nvs_save` reads all stored setting name entries from NVS again. This means that even if the number of stored entries in the settings is less than the cache size, for each new setting entry to be stored `settings_nvs_save` will first run through the cache, then read all stored setting name entries from NVS and only then will pick the next free name id for this new setting name entry and will finally store the new setting entry. This makes the cache ineffiсient for every new entry to be stored even when the cache size is always able to keep all setting entries that will be stored in NVS. Use-case: In the Bluetooth mesh there is a Replay Protection List which keeps sequence numbers of all nodes it received messages from. The RPL is stored persistently in NVS. The setting name entry is the source address of the node and the setting value entry is the sequence number. The common use case is when RPL is quite big (for example, 255 entries). With the current settings nvs cache implementation, every time the node stores a new RPL entry in settings (which is the first received message from a particular source address), `settings_nvs_save` will always check the cache first, then also read all stored entries in NVS and only then will figure out that this is a new entry. With every new RPL entry to be stored this search time increases. This behavior results in much worse performance in comparison with when the corresponding entry was already stored. E.g. on nRF52840, with bare minimal mesh stack configuration, when the cache is bigger than number of stored entries or close to it, storing of 255 RPL entries takes ~25 seconds. The time of subsequent store of 255 RPL entires is ~2 seconds with the cache. Solution: This commit improves the behavior of the first write by bypassing the reading from NVS if the following conditions are met: 1. `settings_nvs_load` was called, 2. the cache was not overflowed (bigger than the number of stored entries). As long as these 2 conditiones are met, it is safe to skip reading from NVS, pick the next free name id and write the value immediately. Signed-off-by: Pavel Vasilyev <[email protected]>

Damian-Nordic · 2024-02-23T12:47:01Z

subsys/settings/src/settings_nvs.c

+#if CONFIG_SETTINGS_NVS_NAME_CACHE
+	uint16_t cached = 0;
+
+	cf->loaded = false;


This will disable the optimization if settings_load_subtree_direct with a handler that breaks the traversal before reaching NVS_NAMECNT_ID is used. It's still better than the existing behavior, so it's not a blocking comment, but just letting you know.

Yeah, true. And also if an app loads a subtree (directly or not) several times, the cache will contain duplicated entries. The only way to make cache work correctly is to only use settings_load() unfortunately.

Should we add info on that behavior to documentation?
Should we log an issue for that to be solved at some point?

Regarding an issue I'm not certain. The cache kind of does what it is asked to do. An application should just be aware of this. I think documenting this behavior should be enough. I'll need to allocate a time to do this though.

zephyrbot added the area: Settings Settings subsystem label Feb 16, 2024

PavelVPV force-pushed the fix_settings_nvs_first_write_cache_issue_public branch from 5d1548d to e94bb27 Compare February 17, 2024 20:30

LingaoM approved these changes Feb 19, 2024

View reviewed changes

Laczen approved these changes Feb 19, 2024

View reviewed changes

PavelVPV requested review from carlescufi, nvlsianpu, de-nordic and Damian-Nordic February 22, 2024 12:44

Damian-Nordic approved these changes Feb 23, 2024

View reviewed changes

fabiobaltieri merged commit a35bb66 into zephyrproject-rtos:main Feb 26, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

settings: nvs: Fix first write issue with cache #69103

settings: nvs: Fix first write issue with cache #69103

PavelVPV commented Feb 16, 2024

PavelVPV commented Feb 16, 2024

PavelVPV commented Feb 17, 2024

PavelVPV commented Feb 17, 2024

PavelVPV commented Feb 17, 2024

Laczen commented Feb 17, 2024

Damian-Nordic Feb 23, 2024

PavelVPV Feb 23, 2024

de-nordic Feb 23, 2024

PavelVPV Feb 26, 2024

settings: nvs: Fix first write issue with cache #69103

settings: nvs: Fix first write issue with cache #69103

Conversation

PavelVPV commented Feb 16, 2024

PavelVPV commented Feb 16, 2024

PavelVPV commented Feb 17, 2024

PavelVPV commented Feb 17, 2024

PavelVPV commented Feb 17, 2024

Laczen commented Feb 17, 2024

Damian-Nordic Feb 23, 2024

Choose a reason for hiding this comment

PavelVPV Feb 23, 2024

Choose a reason for hiding this comment

de-nordic Feb 23, 2024

Choose a reason for hiding this comment

PavelVPV Feb 26, 2024

Choose a reason for hiding this comment