Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: RAK4631 not responding #5491

Open
cracky22 opened this issue Dec 2, 2024 · 99 comments
Open

[Bug]: RAK4631 not responding #5491

cracky22 opened this issue Dec 2, 2024 · 99 comments
Labels
bug Something isn't working

Comments

@cracky22
Copy link

cracky22 commented Dec 2, 2024

Category

Other

Hardware

Rak4631

Firmware Version

2.5.15.79da236

Description

I've already tried restarting, resetting and even re-flashing, but my RAK4631 with nrf52 chip doesn't hold 80 nodes. I can definitely identify the problem in connection with the nodes because I have created a channel where there are only 8 nodes and the device stays online there.

With over 80 nodes the rak crashes from time to time (between 6 and 9 hours) and the green LED lights up continuously.

What can I do??

Relevant log output

No response

@cracky22 cracky22 added the bug Something isn't working label Dec 2, 2024
@SimbimChimbetov
Copy link

This problem only on 2.5.15?

@cracky22
Copy link
Author

cracky22 commented Dec 4, 2024

This problem only on 2.5.15?

No, unfortunately not. I have already tried older fw versions and always encountered this problem. Do you know a working fw? Actually it works fine if I just change the channel Our place but I do not use the ones from the city

@SimbimChimbetov
Copy link

I'm not sure but maybe I have the same problem with v 2.5.11
I recently updated a Node on a Mountain with ober 100 nodes in reach, few days ago I lost Signal, last signal showed 88% Battery and the weather was fine.
Because it is really hard to access and I don't had the time to go there, I can't confirm, maybe it is just stolen.
I hope I have the time this Weekend.

@cracky22
Copy link
Author

cracky22 commented Dec 4, 2024

It would be interesting to know if it is the same for other nrf52 devices. I have a t1000-e but it works there. Could it be due to the different storage options such as ram and eeprom?

@markbirss
Copy link
Contributor

markbirss commented Dec 4, 2024

It would be interesting to know if it is the same for other nrf52 devices. I have a t1000-e but it works there. Could it be due to the different storage options such as ram and eeprom?

All nrf52 devices have only 28kb of littlefs storage available where preferences, ble parings and nodedb are stored (the littlefs block size will also waste some space)

As a measure to prevent this issue the nodedb (db.proto) size was reduced from 100 to 80 recently

#5346

image

you can confirm the size of the nodedb file running this test fw file

image

(https://discord.com/channels/867578229534359593/919642584480112750/1305904252626927729)

use "list-files-s140_nrf52_611_softdevice-1.0.0.4265ae9.uf2" for rak6431
the other for seeed boards with newer SoftDevice

if you are able to share the output this could help understand the issue further

@cracky22
Copy link
Author

cracky22 commented Dec 4, 2024

It would be interesting to know if it is the same for other nrf52 devices. I have a t1000-e but it works there. Could it be due to the different storage options such as ram and eeprom?

All nrf52 devices have only 28kb of littlefs storage available where preferences, ble parings and nodedb are stored (the littlefs block size will also waste some space)

As a measure to prevent this issue the nodedb (db.proto) size was reduced from 100 to 80 recently

#5346

image

you can confirm the size of the nodedb file running this test fw file

image

(https://discord.com/channels/867578229534359593/919642584480112750/1305904252626927729)

use "list-files-s140_nrf52_611_softdevice-1.0.0.4265ae9.uf2" for rak6431
the other for seeed boards with newer SoftDevice

if you are able to share the output this could help understand the issue further

I can't even open the uf2, can you put it here?

1 similar comment
@cracky22
Copy link
Author

cracky22 commented Dec 4, 2024

It would be interesting to know if it is the same for other nrf52 devices. I have a t1000-e but it works there. Could it be due to the different storage options such as ram and eeprom?

All nrf52 devices have only 28kb of littlefs storage available where preferences, ble parings and nodedb are stored (the littlefs block size will also waste some space)

As a measure to prevent this issue the nodedb (db.proto) size was reduced from 100 to 80 recently

#5346

image

you can confirm the size of the nodedb file running this test fw file

image

(https://discord.com/channels/867578229534359593/919642584480112750/1305904252626927729)

use "list-files-s140_nrf52_611_softdevice-1.0.0.4265ae9.uf2" for rak6431
the other for seeed boards with newer SoftDevice

if you are able to share the output this could help understand the issue further

I can't even open the uf2, can you put it here?

@cracky22
Copy link
Author

cracky22 commented Dec 4, 2024

So that means I connect the RAK to my computer and send you the logs?

1 similar comment
@cracky22
Copy link
Author

cracky22 commented Dec 4, 2024

So that means I connect the RAK to my computer and send you the logs?

@thebentern
Copy link
Contributor

thebentern commented Dec 4, 2024

Please note that this file will not work for the T1000-E, as it uses a different SoftDevice version
list-files-s140_nrf52_611_softdevice-1.0.0.4265ae9(1).uf2.zip

@cracky22
Copy link
Author

cracky22 commented Dec 4, 2024

So that means I connect the RAK to my computer and send you the logs?

.

@thebentern
Copy link
Contributor

I have had rotten luck reproducing this issue so far, but today I am trying an all day run of a RAK board connected to msh/US against MQTT (client proxy) to see if it triggers at all for me. >140 nodes witnessed so far.
image

@cracky22
Copy link
Author

cracky22 commented Dec 5, 2024

I have had rotten luck reproducing this issue so far, but today I am trying an all day run of a RAK board connected to msh/US against MQTT (client proxy) to see if it triggers at all for me. >140 nodes witnessed so far.
image

Hi, how do you get this output? Is there a tool that can log and save all the important information?

@cracky22
Copy link
Author

cracky22 commented Dec 5, 2024

Or is it just serial?

@thebentern
Copy link
Contributor

Or is it just serial?

It is just serial logs. I like to use tio (https://github.com/tio/tio) because it will re-attach to the device in the case of a failure or reboot.

I ran the RAK node for about 7 hours yesterday on the msh/US topic and picked up over 600 nodes with no crashes or failures. To rule out any issues with file corruption problems, have you tried a factory reset (or even just nodedb rese)?

@cracky22
Copy link
Author

cracky22 commented Dec 6, 2024

Or is it just serial?

It is just serial logs. I like to use tio (https://github.com/tio/tio) because it will re-attach to the device in the case of a failure or reboot.

I ran the RAK node for about 7 hours yesterday on the msh/US topic and picked up over 600 nodes with no crashes or failures. To rule out any issues with file corruption problems, have you tried a factory reset (or even just nodedb rese)?

How did you get 600 nodes? We're not talking about MQTT, are we?

@garthvh
Copy link
Member

garthvh commented Dec 6, 2024

Or is it just serial?

It is just serial logs. I like to use tio (https://github.com/tio/tio) because it will re-attach to the device in the case of a failure or reboot.
I ran the RAK node for about 7 hours yesterday on the msh/US topic and picked up over 600 nodes with no crashes or failures. To rule out any issues with file corruption problems, have you tried a factory reset (or even just nodedb rese)?

How did you get 600 nodes? We're not talking about MQTT, are we?

Yes, that is how you get to 600 nodes quickly, the topics are for mqtt

@cracky22
Copy link
Author

cracky22 commented Dec 7, 2024

@cracky22
Copy link
Author

cracky22 commented Dec 7, 2024

@cracky22
Copy link
Author

cracky22 commented Dec 7, 2024

image

@cracky22
Copy link
Author

I know that this is unnecessary, but it would be possible if there was an option in the firmware to query how much "memory" is available on the board and that could also be displayed in the Android app in a small graphic/text

@markbirss
Copy link
Contributor

I know that this is unnecessary, but it would be possible if there was an option in the firmware to query how much "memory" is available on the board and that could also be displayed in the Android app in a small graphic/text

the littlefs support specific for nrf52 dont currently have function to get free space
you could look at adding a android app feature request on the app for free memory
https://github.com/meshtastic/Meshtastic-Android/issues

@cracky22
Copy link
Author

Ok, I can do that. But what about my problem with the RAK? What can I do or how can I debug further

@markbirss
Copy link
Contributor

Ok, I can do that. But what about my problem with the RAK? What can I do or how can I debug further

Ok, are you able to capture log as the reboot/crash occur at all?
Listing of files still show db.proto size after crashed ? (or it this the already provided listing ?)

@cracky22
Copy link
Author

This is already the list.
No, I can't log it as it crashes as it takes between 6 and 9 hours

@cracky22
Copy link
Author

@tavdog hi, I just discovered your commit #5670 and updated it straight away - I hope this solves my problems with the RAK. Is there a possibility that the watchdog, for example, has a fixed memory area in which it logs data? would be very useful if my RAK crashes again. Unfortunately, I can't have it connected to the PC 24/7 and save the logs

@tavdog
Copy link
Contributor

tavdog commented Dec 29, 2024

@tavdog hi, I just discovered your commit #5670 and updated it straight away - I hope this solves my problems with the RAK. Is there a possibility that the watchdog, for example, has a fixed memory area in which it logs data? would be very useful if my RAK crashes again. Unfortunately, I can't have it connected to the PC 24/7 and save the logs

My fix only formats the filesystem when lfs assert is triggered. I don't think it will help your issue and if it does it will probably result is a wiped state.

@analogman2
Copy link

I am experiencing the same or similar issue on a RAK Wisblock 4631. The node crashed after a week and the last time I checked >150 nodes. I got to this issue from this issue #5648

DEBUG | ??:??:?? 1 Filesystem files:
DEBUG | ??:??:?? 1 prefs (directory)
DEBUG | ??:??:?? 1 channels.proto (147 Bytes)
DEBUG | ??:??:?? 2 module.proto (118 Bytes)
DEBUG | ??:??:?? 2 config.proto (169 Bytes)
DEBUG | ??:??:?? 2 db.proto.tmp (27377 Bytes)
DEBUG | ??:??:?? 2 adafruit (directory)
DEBUG | ??:??:?? 2 bond_prph (directory)
DEBUG | ??:??:?? 2 bond_cntr (directory)
DEBUG | ??:??:?? 2 Power::lipoInit lipo sensor is not ready yet
DEBUG | ??:??:?? 2 Use analog input 5 for battery level
INFO | ??:??:?? 2 Scan for i2c devices
DEBUG | ??:??:?? 2 Scan for I2C devices on port 1
INFO | ??:??:?? 2 No I2C devices found
DEBUG | ??:??:?? 2 acc_info = 0
INFO | ??:??:?? 2 S:B:9,2.5.16.f81d3b0
INFO | ??:??:?? 2 Build timestamp: 1733662250
DEBUG | ??:??:?? 2 Reset reason: 0x0
DEBUG | ??:??:?? 2 Set random seed 1102827228
INFO | ??:??:?? 2 Init NodeDB
ERROR | ??:??:?? 2 Could not open / read /prefs/db.proto
WARN | ??:??:?? 2 Devicestate 0 is old, discard
INFO | ??:??:?? 2 Install default DeviceState
DEBUG | ??:??:?? 2 Initial packet id 913518860
DEBUG | ??:??:?? 2 Partially randomized packet id 544149773
DEBUG | ??:??:?? 2 Use nodenum 0x9e76f91f
INFO | ??:??:?? 2 Load /prefs/config.proto
INFO | ??:??:?? 2 Loaded /prefs/config.proto successfully
INFO | ??:??:?? 2 Loaded saved config version 23
INFO | ??:??:?? 2 Load /prefs/module.proto
INFO | ??:??:?? 2 Loaded /prefs/module.proto successfully
INFO | ??:??:?? 2 Loaded saved moduleConfig version 23
INFO | ??:??:?? 2 Load /prefs/channels.proto
INFO | ??:??:?? 2 Loaded /prefs/channels.proto successfully
INFO | ??:??:?? 2 Loaded saved channelFile version 23
DEBUG | ??:??:?? 2 cleanupMeshDB purged 0 entries
DEBUG | ??:??:?? 2 Use nodenum 0x9e76f91f
INFO | ??:??:?? 2 Adding node to database with 1 nodes and 153976 bytes free!
DEBUG | ??:??:?? 2 Expand short PSK #1
INFO | ??:??:?? 2 Wanted region 1, using US
INFO | ??:??:?? 2 Save /prefs/db.proto
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
lfs warn:314: No more free space 224
ERROR | ??:??:?? 2 Error: can't encode protobuf io error
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: block < lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count
ERROR | ??:??:?? 2 LFS assert: head >= 2 && head <= lfs->cfg->block_count #75ish more times
ERROR | ??:??:?? 2 LFS asser

@esev
Copy link
Contributor

esev commented Jan 12, 2025

Just an observation: For the connection just prior to the LFS failures, there is no Client wants config log.

#[0m #[32m INFO  #[0m| 15:14:57 3360 [EInkDynamicDisplay] #[32m BLE Disconnected, reason = 0x8
#[0m #[34m DEBUG #[0m| 15:14:57 3360 [EInkDynamicDisplay] #[34m PhoneAPI::close()
#[0m #[34m DEBUG #[0m| 15:14:57 3361 [EInkDynamicDisplay] #[34m Async full-refresh complete
#[0m #[34m DEBUG #[0m| 15:14:57 3361 [RadioIf] #[34m Started Tx (id=0x211f2819 fr=0x3c to=0xff, WantAck=0, HopLim=2 Ch=0x8 encrypted rxtime=1736694894 rxSNR=6 rxRSSI=-86 priority=64)
#[0m #[34m DEBUG #[0m| 15:14:57 3361 [RadioIf] #[34m Packet TX: 509ms
#[0m #[34m DEBUG #[0m| 15:14:58 3361 [RadioIf] #[34m Completed sending (id=0x211f2819 fr=0x3c to=0xff, WantAck=0, HopLim=2 Ch=0x8 encrypted rxtime=1736694894 rxSNR=6 rxRSSI=-86 priority=64)
#[0m #[32m INFO  #[0m| 15:15:03 3367 #[32m BLE Connected to "android phone"
#[0m #[32m INFO  #[0m| 15:15:04 3367 #[32m BLE connection secured

@cracky22
Copy link
Author

@garthvh soo, the funny thing about the problem is that the RAK flashes all the time as long as you are connected in the serial console and have the tab active. as soon as several log messages accumulate (in the background) and the rak freezes (i.e. the LED lights up continuously), it looks as if it would bootloop, which is the problem i currently have. this means i have no connection on the cell phone. however, if i now go to the tab with the serial console on the pc and scroll down to the end of the logs with the mouse, the rak disconnects from this state. as soon as all logs are loaded, the android device reconnects and the rak starts flashing again

here are my logs
meshtastic-log-2025-01-13T18-17-54.737Z.log

@c0dexter
Copy link

The same issue had my friend on the custom built NRF node, he flashed modified 2.5.19 FW with limit notes list to 80 pcs. Now his node is stable, no issues after connecting and disconnecting BT, loading collected messages after re-connecting to node and so on. Few days without any freez and non-controlled resets or loosing configs.

@esev
Copy link
Contributor

esev commented Jan 14, 2025

@c0dexter Do you mean 2.5.18 FW? Installing 2.5.19 on NRF52 nodes can/will cause a different issue.

@c0dexter
Copy link

@esev no, I mean 2.5.19 - he compiled this version with his changes - he limited nodes.db to 80 items. Maybe this version has more issues, but for him is working more than 72h without unexpected behavior

@esev
Copy link
Contributor

esev commented Jan 14, 2025

Oh, I see. 2.5.19 was only released 20 hours ago. But I bet the build string in his version contains 2.5.19. The version released 20 hours ago really shouldn't be used.

What were his changes? The limit of 80 was introduced a couple of releases ago. In 2.5.13 IIRC.

@c0dexter
Copy link

@esev I will ask him tomorrow about his changes, I will let you know

@c0dexter
Copy link

c0dexter commented Jan 15, 2025 via email

@cracky22
Copy link
Author

cracky22 commented Jan 15, 2025

@c0dexter But what did he change the 80 nodes to?

@cracky22
Copy link
Author

cracky22 commented Jan 15, 2025

@garthvh @fifieldt @GUVWAF Interesting info, the RAK stays online for more than 3 days if it is not charging but just connected to the battery....

Edit: If that's relevant, I have the RAK connected to a solar panel. Soshine 6w

@c0dexter
Copy link

c0dexter commented Jan 15, 2025 via email

@cracky22
Copy link
Author

@c0dexter ok thanks. Can I limit the nodes myself, for example to 60?
I still have the freezing problem

@c0dexter
Copy link

c0dexter commented Jan 15, 2025 via email

@c0dexter
Copy link

c0dexter commented Jan 15, 2025 via email

@cracky22
Copy link
Author

During testing and looking for reason of issue with freezing node based on
NRF, he tried to limit node.DB because he thought that it could be a
potential problem with device stability if device will have too small
spacer in storage. That's why he limit node list to 80

Maybe it's not a solutions for solve the main problem but limitting nodes
to 80 on 2.5.19 FW showing in his case that node is still live

śr., 15 sty 2025, 07:50 użytkownik Martin Blieninger <
@.***> napisał:

@c0dexter https://github.com/c0dexter But what did he change the 80
nodes to?


Reply to this email directly, view it on GitHub
#5491 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AG6MBDZFFCXP6VW3HFQAK2L2KYAKPAVCNFSM6AAAAABS3TYQNKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJRG43TGMJTG4
.
You are receiving this because you were mentioned.Message ID:
@.***>

I can't see the 80 node limit in https://github.com/meshtastic/firmware/blob/master/variants%2Frak4631%2Fvariant.h

@thebentern
Copy link
Contributor

There is already an 80 node limit on NRF52 devices. It's defined in nrf52.ini, not in the variant.

Image

@cracky22
Copy link
Author

Hi Ben, can you please build me something that will automatically restart my node every 12 hours? Something that I can put into the firmware myself.. I would really like to temporarily stop my crashes..

@garthvh
Copy link
Member

garthvh commented Jan 15, 2025

It does not happen on iOS, we do not do workarounds where things reboot on timers, need to find the source of the bug.

@cracky22
Copy link
Author

yes, i know that and i appreciate that you guys fix most of the bugs, but i currently need a workaround solution to temporarily fix the problem. it's quite annoying to have to unplug the rak which is meant to be offside node in my case every 6-9 hours!
To my knowledge there is no firmware version where the problem does not occur, before 2.5 everything was fine

@esev
Copy link
Contributor

esev commented Jan 15, 2025

For folks with the lfs debug error messages, how often do you move your phone outside of Bluetooth range with your nodes without disconnecting first? I think the busyTx errors might be a separate issue.
See #5839 for the lfs debug issue. We may be real close to a fix.

@JimTheCowboy
Copy link

For folks with the lfs debug error messages, how often do you move your phone outside of Bluetooth range with your nodes without disconnecting first? I think the busyTx errors might be a separate issue. See #5839 for the lfs debug issue. We may be real close to a fix.

I am not sure which issue we have, "lfs debug" or "busyTx".

But I can answer your BT range question:
Most users won't care about disconnecting before walking off.

Some nodes are in a fixed position, the user comes and goes as he pleases.
Other nodes are in cars, the user moves about his house, in and out of range.
I have a portable node which I carry on my belt but have to leave in my car or backpack randomly during the day.

Each of those behaviors will trigger the error over enough elapsed time.

It's just not practical to disconnect it. imho

@esev
Copy link
Contributor

esev commented Jan 15, 2025

It's just not practical to disconnect it. imho

Definitely agree. That's just what we've diagnosed to be the trigger. There is a fix in the works that addresses the lfs debug error cases.

@JimTheCowboy
Copy link

yes, i know that and i appreciate that you guys fix most of the bugs, but i currently need a workaround solution to temporarily fix the problem. it's quite annoying to have to unplug the rak which is meant to be offside node in my case every 6-9 hours! To my knowledge there is no firmware version where the problem does not occur, before 2.5 everything was fine

before 2.5 everything was fine

With absolute certainty that is not true. The issue started way before 2.5, probably it just got worse afterwards.

One of our users has a T-echo that had no updates for at least 6 months, no issues at all.
His other T-echo which I updated with beta versions, started having issues.

Another user runs his RAK19007/4631 home node and reboots it everyday, because of the boot-loop/memory loss issue.
This has started around late spring/early summer 2024. but we haven't been able to pinpoint the FW version.
(And we stopped trying- sorry)

@esev
Copy link
Contributor

esev commented Jan 15, 2025

What is your current solution for getting out of the boot-loop?

@JimTheCowboy
Copy link

JimTheCowboy commented Jan 15, 2025

nrf factory reset...

but most of "our users" are just this: users.
They can't be bothered with python CLI and such, which I understand.

So we who do the updates and settings get the "nice feedback" )=

@esev
Copy link
Contributor

esev commented Jan 15, 2025

So here's my theory. Each of the lfs (LittleFS) errors causes the small file system on the devices to shrink by one block. As those errors keep occurring the file system gets too small to store the preferences/nodedb/etc files. If we can stop the flash write errors (which we've identified to be related to Bluetooth timeouts/disconnects), we can stop LFS from losing a block. #5839

@thebentern
Copy link
Contributor

With absolute certainty that is not true. The issue started way before 2.5, probably it just got worse afterwards.

Absolutely correct. Ironically, the intended set of fixes in late 2.4 / early 2.5 to add more atomic save operations actually seems to have exacerbated the issues by introducing more FS contention and less free space. I am optimistic based on some of the recent discoveries that we will improve the reliability dramatically.

@cracky22
Copy link
Author

cracky22 commented Jan 15, 2025

so my problem on the RAK is not only the lfs problem but also busyTX which is why I get a critical error

@esev
Copy link
Contributor

esev commented Jan 15, 2025

so my problem on the RAK is not only the lfs problem but also busyTX which is why I get a critical error

Is it possible the busyTX issue was fixed with #5820?

Edit: Oh! @c0dexter Did your custom firmware include that change?

@cracky22
Copy link
Author

cracky22 commented Jan 15, 2025

@esev so my custom firmware included the fix for this and it still occurs on the RAK...

edit: i used nrf erase before flashing the custom fw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests