Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal crash: boost "Invalid cross-device link" #843

Open
PhilRW opened this issue Aug 29, 2023 · 9 comments
Open

Fatal crash: boost "Invalid cross-device link" #843

PhilRW opened this issue Aug 29, 2023 · 9 comments

Comments

@PhilRW
Copy link

PhilRW commented Aug 29, 2023

  • Linux <hostname-redacted> 5.10.0-25-amd64 #1 SMP Debian 5.10.191-1 (2023-08-16) x86_64 GNU/Linux
  • Docker version 24.0.5, build ced0996
  • Using edge docker image from docker hub
  • /data is mounted volume

Program crashes with the following:

boost::filesystem::copy_file: Invalid cross-device link: "/dev/shm/codtrs5/9580-1693338509_852987500.wav", "/data/codtrs5/2023/8/29/9580-1693338509_852987500.wav"
0x7f443a0325d9: (gr::tagged_stream_block::check_topology(int, int)+0x2e49)
0x7f4439c2f24c: (std::rethrow_exception(std::__exception_ptr::exception_ptr)+0x7c)
0x7f4439c2f2b7: (std::terminate()+0x17)
0x7f4439c2f23e: (std::rethrow_exception(std::__exception_ptr::exception_ptr)+0x6e)
0x5595a221e941: (Call_Concluder::manage_call_data_workers()+0xeb1)
0x5595a2140604: (monitor_messages()+0x394)
0x5595a2134210: (main+0x740)
0x7f443987bd90: (__libc_init_first+0x90)
0x7f443987be40: (__libc_start_main+0x80)
0x5595a2137ab5: (_start+0x25)
@PhilRW
Copy link
Author

PhilRW commented Aug 29, 2023

Problem seems to be mitigated by setting transmissionArchive to false.

@taclane
Copy link
Contributor

taclane commented Aug 29, 2023

If you still want to keep transmission archives, the other option is to set tempDir to the same directory (or at least drive) as captureDir in the config file. Keeping both of those on the same device should avoid the issue, but you'll miss any benefit of recording all the individual transmissions to a tempfs instead of storage media.

There are a handful of boost library/kernel combos that can cause this, but it's ultimately related to a kernel issue that existed between linux 5.3 and 5.18. Boost created a workaround at some point, and it was fixed in the 6.x kernel, but some distros like debian 11 might still run into the "cross-device link" error.

Since this only really happens under a certain set of circumstances, it might even be best that transmissionArchive: true disables the use of a temp space. If you're keeping all those wavs, its not like the tempDir is saving any drive wear, it's just adding complexity.

@sally-yachts
Copy link

Just for posterity's sake I'd like to confirm taclane's findings. My main recorder ran the TR official docker image on a Debian 11 box with a backported 6.x kernel and still ran into this error. It was configured to archive transmissions and tempDir wasn't set - I configured it to use a directory on the same volume as the existing audio storage and I can now run newer code without problem.

@sally-yachts
Copy link

For more context, this still happens with the latest edge code on a fresh Debian 12 (bookworm) install with kernel 6.1.0-13. Would love any input on known working boost/kernel versions to address this as using something like shm for temp data keeps latency-sensitive IO off of storage altogether which enables a lot more flexibility in deployment.

This workaround also unfortunately triggered a corner case in concert with bad firmware from Samsung and caused two brand new SSDs to burn through their usable life in a couple months necessitating RMA.

@taclane
Copy link
Contributor

taclane commented Dec 6, 2023

It was a little convoluted to map out, but for those using transmissionArchive, the problem seems be along the lines of:

The current boost::filesystem::copy_file will error if BOTH:

  • boost < 1.76
  • linux kernel 5.3 or greater (6.x included)

But std::filesystem::copy_file will only error if:

  • linux kernel 5.3 or greater (6.x NOT included)

#886 should address this by checking the boost version, and attempting a std::filesystem::copy_file if detects that the boost library hasn't been updated yet. If the installed boost lib is new enough, it will use that instead, which should be a better workaround for anyone using kernel 5.3-5.18.

I just tried this with kernel 6.5 / libboost 1.74, and it prevented a previous error from occurring as the transmission wavs were copied out of the /dev/shm tempfs to disk.

@sally-yachts
Copy link

Pulled the latest docker image (edge tag that includes #886), let tempDir default back to shm, and it ran all night without an issue. Good stuff!

@taclane
Copy link
Contributor

taclane commented Dec 8, 2023

Cool!
All that's left is to pull in #887 to fix a typo for boost compatibility going forward (>1.76), and that should hopefully be the end of this issue.

@robotastic
Copy link
Owner

MERGED!! 3 Cheers to @taclane for squashing this bug 🎉

@sally-yachts
Copy link

Looks like there still might be a race condition hiding in the workaround somewhere; I get crashes about every 24-36hrs that seem to reference copying a transmission from temp to archive but the file already exists. The docker image still has boost 1.74 so I expect if we can bump that up to something newer than 1.76 then it'll probably defuse the landmine for good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants