Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysterious Crash #397

Closed
tsujan opened this issue Sep 6, 2016 · 30 comments
Closed

Mysterious Crash #397

tsujan opened this issue Sep 6, 2016 · 30 comments
Labels

Comments

@tsujan
Copy link
Member

tsujan commented Sep 6, 2016

Apparently it's related to libfm:

#0  0x00007fed179dd446 in strlen () from /usr/lib/libc.so.6
#1  0x00007fed1898901d in ?? () from /usr/lib/libfm.so.4
#2  0x00007fed18989e2e in fm_path_to_str () from /usr/lib/libfm.so.4
#3  0x00007fed18989e5b in fm_path_to_uri () from /usr/lib/libfm.so.4
#4  0x00007fed1898e463 in ?? () from /usr/lib/libfm.so.4
#5  0x00007fed180f3345 in ?? () from /usr/lib/libglib-2.0.so.0
#6  0x00007fed17747454 in start_thread () from /usr/lib/libpthread.so.0
#7  0x00007fed17a457df in clone () from /usr/lib/libc.so.6

I haven't found a way to reproduce it yet.

@tsujan
Copy link
Member Author

tsujan commented Oct 2, 2016

Yet another rare crash related to libfm (a coredump I saw a few minutes ago), whose trace I add for reference:

#0  0x00007f875be5804f in raise () from /usr/lib/libc.so.6
#1  0x00007f875be5947a in abort () from /usr/lib/libc.so.6
#2  0x00007f875c5ba515 in g_assertion_message () from /usr/lib/libglib-2.0.so.0
#3  0x00007f875c5ba5aa in g_assertion_message_expr () from /usr/lib/libglib-2.0.so.0
#4  0x00007f875c583d1e in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#5  0x00007f875ce4f9b4 in fm_mime_type_from_name () from /usr/lib/libfm.so.4
#6  0x00007f875ce4fb9c in fm_mime_type_from_file_name () from /usr/lib/libfm.so.4
#7  0x00007f875ce4b1a0 in ?? () from /usr/lib/libfm.so.4
#8  0x00007f875ce4b570 in fm_file_info_new_from_native_file () from /usr/lib/libfm.so.4
#9  0x00007f875ce59642 in ?? () from /usr/lib/libfm.so.4
#10 0x00007f875ce5ee0d in ?? () from /usr/lib/libfm.so.4
#11 0x00007f875c5bbd3e in ?? () from /usr/lib/libglib-2.0.so.0
#12 0x00007f875c5bb345 in ?? () from /usr/lib/libglib-2.0.so.0
#13 0x00007f875bc0f454 in start_thread () from /usr/lib/libpthread.so.0
#14 0x00007f875bf0d7df in clone () from /usr/lib/libc.so.6

@tsujan
Copy link
Member Author

tsujan commented Oct 2, 2016

I think (the new?) libfm has a bug:

#0  0x00007f7acdba6870 in fm_file_info_get_path () from /usr/lib/libfm.so.4
#1  0x00007f7acdba92ff in ?? () from /usr/lib/libfm.so.4
#2  0x00007f7acd5c61b4 in ?? () from /usr/lib/libgobject-2.0.so.0
#3  0x00007f7acd5e08ed in g_signal_emit_valist () from /usr/lib/libgobject-2.0.so.0
#4  0x00007f7acd5e0fdf in g_signal_emit () from /usr/lib/libgobject-2.0.so.0
#5  0x00007f7acdbba9f0 in ?? () from /usr/lib/libfm.so.4
#6  0x00007f7acd2f0d1a in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#7  0x00007f7acd2f10d0 in ?? () from /usr/lib/libglib-2.0.so.0
#8  0x00007f7acd2f117c in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#9  0x00007f7ace0ae57f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
   from /usr/lib/libQt5Core.so.5
#10 0x00007f7ace0580da in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) ()
   from /usr/lib/libQt5Core.so.5
#11 0x00007f7ace0605cc in QCoreApplication::exec() () from /usr/lib/libQt5Core.so.5
#12 0x000000000041caa4 in ?? ()
#13 0x00007f7accba1291 in __libc_start_main () from /usr/lib/libc.so.6
#14 0x000000000041cb0a in _start ()

Will investigate the situation.

tsujan added a commit to lxqt/libfm-qt that referenced this issue Oct 2, 2016
This is a shot in the dark to prevent rare crashes (lxqt/pcmanfm-qt#397).
@tsujan
Copy link
Member Author

tsujan commented Nov 14, 2016

Since I opened this issue, I've found 8 coredumps with the same trace as in #397 (comment) -- but nothing else. It's always about fm_mime_type_from_name(). The crash isn't visible, so it should occur on exiting. I haven't found a way of reproducing it but this time I'll dig deeper.

@tsujan
Copy link
Member Author

tsujan commented Nov 15, 2016

A wild guess: this may be related to the new C++ wrappers of libfm-qt (mimetype.h).
@PCMan, any suggestion?

EDIT1. mimetype.h isn't used.

EDIT2. My theory (may be wrong):

The g_assert() in question should be in ghash.cg_hash_table_lookup_node(). Somehow, the ref count of mime_hash (in fm-mime-type.c) becomes zero when fm_mime_type_from_name() is called by fm_file_info_new_from_native_file() (on exit?).

@PCMan
Copy link
Member

PCMan commented Nov 15, 2016

@tsujan Thank you for tracking this. I have no idea (yet).
It might help to do the following.

  1. export G_SLICE=always-malloc
  2. Use valgrind to monitor for memory errors
    G_SLICE is needed because internally libfm uses the slab allocator provided by glib, which is more memory efficient and faster than malloc when allocating small objects.
    The drawback is normal memory debugging tool might not catch gslice errors. By force using malloc() with this variable, some memory debugger like valgrind can help.

@PCMan
Copy link
Member

PCMan commented Nov 15, 2016

@tsujan BTW I think you might be right. This looks like ref counting issue of FmMimeType. :-(
If done correctly, smart pointers should help automate ref-countng, but it seems that I messed it up.
Busy this week. I'll find some time to check this part. Thanks a lot!

@PCMan PCMan self-assigned this Nov 15, 2016
@PCMan PCMan added the bug label Nov 15, 2016
@tsujan
Copy link
Member Author

tsujan commented Nov 15, 2016

@PCMan
The ref count of mime_hash should have become zero before fm_mime_type_from_name() is called but I couldn't find the cause. Looking forward to your solution.

@tsujan
Copy link
Member Author

tsujan commented Nov 17, 2016

@PCMan
A few minutes ago, another crash happened here, this time with the backtrace of #397 (comment). Is it possible that it's related to C++ wrappers? Both crashes are random and infrequent.

@PCMan
Copy link
Member

PCMan commented Nov 19, 2016

@tsujan I spent two hours on this but I'm not able to reproduce the crash. :-(
For the mime-type part, the ref count of "mime_hash" becomes zero only on program termination since the hash table is destroyed by libfm finalization handler. That means, when the program exits there might still be some other threads which have work in progress and referenced the mime type stuff.
A thread join operation can be added to libfm finalization process to solve this.
If this crash does not happen on program termination, then it's a memory corruption which causes the address of "mime_hash" been overwritten by something else.

@PCMan
Copy link
Member

PCMan commented Nov 19, 2016

@tsujan About the "fm_file_info_get_path" crash, I still don't have an idea. Debugging now.

@PCMan
Copy link
Member

PCMan commented Nov 19, 2016

@tsujan not able to reproduce the bug yet. :-(
What kind of operations are most frequent during your daily usage?

@tsujan
Copy link
Member Author

tsujan commented Nov 19, 2016

@PCMan
None of these crashes can be reproduced by me either. Both are random. They may happen once in a few days.

File creation and deletion are frequent in my daily usage (because of programming).

I'll remove all C++ wrappers to see if the crash happens without them and will report the result

P.S. It may take a while because of the random nature of crashes.

@tsujan
Copy link
Member Author

tsujan commented Nov 19, 2016

@PCMan
The crash wan't related to wrappers because I removed them and it happened again.

Although it may sound strange, the crash may be related to changing wallpaper. This time, I saw this line too:

Core was generated by `pcmanfm-qt --set-wallpaper ~/Pictures/Wallpapers/418.jpg'.

That explains why others don't see the crash because I have a startup bash script that changes wallpaper once an hour with pcmanfm-qt --set-wallpaper IMAGE, where IMAGE is always a valid jpeg image randomly chosen from a folder. Actually, I entered the above line in terminal and the wallpaper was changed correctly.

I don't know how changing wallpaper may cause a crash, especially a mimetype related one, but I'll investigate it.

@tsujan
Copy link
Member Author

tsujan commented Nov 19, 2016

At last found an easy way to make pcmanfm-qt crash: In a terminal, enter pcmanfm-qt --set-wallpaper CURRENT_WALLPAPER, where "CURRENT_WALLPAPER" is the wallpaper currently shown on desktop.

@ everyone, A confirmation of this crash would be appreciated.

@tsujan
Copy link
Member Author

tsujan commented Nov 20, 2016

@PCMan
This is my analysis after experimenting with different codes:

When the program exits immediately without doing anything, as is the case with setting a wallpapaer that's already set, mime_hash becomes zero but another thread calls fm_mime_type_from_name(), which creates a coredump.

This situation started to exist only after lxde/libfm@bfbd882. If libfm is downgraded to a commit before it, no crash will happen. More specifically, if g_file_query_info() is removed from the same commit, the crash will be fixed (but, of course, there will be no emblem). Somehow, g_file_query_info() causes a thread to continue after an immediate exit of pcmanfm-qt.

Anyway, there's no mechanism to guarantee that mime_hash doean't have a zero ref count when fm_mime_type_from_name() is called. IMO, g_file_query_info() just reveals a hitherto hidden issue.

EDIT1: I tested on an old computer and saw that if CPU was busy, coredumping might not happen. This shows that it's a matter of timing. IMHO, we need a way to call _fm_mime_type_finalize() only when there's no pending call.

EDIT2: I succeeded in fixing the crash by using a function "gboolean fm_mime_type_is_finalized()" in such a way that if it returns TRUE, no fm_mime_type functions will be used in "fm-file-info.c" (and GError will be set if needed). It also fixes the crash with pcmanfm-qt --quit. However, I see it as a dirty hack.

tsujan added a commit to lxde/libfm that referenced this issue Nov 21, 2016
If `_fm_file_info_finalize()` and `_fm_mime_type_finalize()` are called "immediately" after initialization, some FmFileInfo functions might still be called and so, a crash might happen (see lxqt/pcmanfm-qt#397 and especially, lxqt/pcmanfm-qt#397 (comment)). This commit is a hack rather than a nice solution.
PCMan pushed a commit to lxqt/libfm-qt that referenced this issue Nov 26, 2016
This is a shot in the dark to prevent rare crashes (lxqt/pcmanfm-qt#397).
PCMan pushed a commit to lxqt/libfm-qt that referenced this issue Nov 26, 2016
This is a shot in the dark to prevent rare crashes (lxqt/pcmanfm-qt#397).
@tsujan
Copy link
Member Author

tsujan commented Nov 26, 2016

@PCMan, could you replace lxde/libfm@2321195 with a more fundamental method? Please also see #397 (comment).

@ghost
Copy link

ghost commented Feb 10, 2017

Deleting folders/files sometimes crashes pcman (0.11.3 and 0.11.2 before that) on an Arch/Fluxbox system with Qt (5.8) theme set by qt5ct if that matters. I've seen the same issue with pcman-gtk as well.

Not sure if I've managed to get a proper backtrace but here it is:

#0  0x00007fc1face848d in poll () at /usr/lib/libc.so.6
#1  0x00007fc1f8380786 in  () at /usr/lib/libglib-2.0.so.0
#2  0x00007fc1f838089c in g_main_context_iteration ()
    at /usr/lib/libglib-2.0.so.0
#3  0x00007fc1f914904f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5
#4  0x00007fc1f90f289a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5
#5  0x00007fc1f90fade4 in QCoreApplication::exec() ()
    at /usr/lib/libQt5Core.so.5
#6  0x000000000041e8d8 in  ()
#7  0x00007fc1fac29291 in __libc_start_main () at /usr/lib/libc.so.6
#8  0x000000000041fa0a in _start ()

@tsujan
Copy link
Member Author

tsujan commented Feb 10, 2017

@AcarBurak Thanks! I think the single cause of all such crashes is what I explained at lxde/libfm#19. That PR could prevent most of them but a more fundamental approach is needed.

@ghost
Copy link

ghost commented Feb 10, 2017

Would you care to elaborate a bit about the kind of "more fundamental approach"? Just curious.

@tsujan
Copy link
Member Author

tsujan commented Feb 10, 2017

libfm should be finalized only when none of its functions is called or, said in another way, libfm functions should return immediately if libfm is finalized. There's no mechanism fo that yet.

lxde/libfm#19 just makes some functions return immediately when finalization is done. I've seen no crash in pcmanfm-qt with it but a crash happened in lximage-qt recently.

@ghost
Copy link

ghost commented Feb 10, 2017

All right, thank you. It's out of my very limited understanding. I was wondering if it's anything to do with a possible gtk cruft inheritance!

@tsujan
Copy link
Member Author

tsujan commented Feb 10, 2017

I was wondering if it's anything to do with a possible gtk cruft inheritance!

libfm itself (not libfm-gtkX) doesn't have any gtk dependency. It depends only on glib.

@ghost
Copy link

ghost commented Feb 10, 2017

Yes, I know, but glib is developed by GTK guys, is it not? Though almost everything depends on it.

@tsujan
Copy link
Member Author

tsujan commented Feb 10, 2017

Yes but it's so fundamental that it isn't (and, perhaps, can't be) affected by their annoying decisions, as far as I know

@pmattern
Copy link
Contributor

FWIW I've also seen sporadic crashes of PCManFM-Qt due to deleting files. Unfortunately I forgot to save the traces before they were automatically scrubbed from the disk so cannot provide any really helpful information.

The crashes never took place when the files were deleted from within PCManFM-Qt itself but only when they were deleted by another application, e. g. a command invoked in a terminal emulator, while PCManFM-Qt was displaying the corresponding folder. (@AcarBurak Is this the context you've been seeing the crashes in #451 as well or did the crashes take place when you were deleting from PCManFM-Qt's GUI itself?)
The problem involves FHS objects on local storage devices only and is hence probably different from the one involving network shares as discussed in #364.

Btw. a similar problem has already been addressed in #70.

@ghost
Copy link

ghost commented Feb 11, 2017

Crashes occur randomly when I delete something from pcman gui itself.

@ghost
Copy link

ghost commented Feb 11, 2017

Seemingly the same issue with the pcman gtk version:

https://sourceforge.net/p/pcmanfm/bugs/964/

@pmattern
Copy link
Contributor

At last found an easy way to make pcmanfm-qt crash: In a terminal, enter pcmanfm-qt --set-wallpaper CURRENT_WALLPAPER, where "CURRENT_WALLPAPER" is the wallpaper currently shown on desktop.

@ everyone, A confirmation of this crash would be appreciated.

Running libfm 1.2.5, lxqt/libfm-qt@e3e2839 and 1d38bbd on Arch Linux I can not reproduce this crash. Invoking pcmanfm-qt --set-wallpaper <absolute or relative path to graphics file currently used as wallpaper> simply does not seem to have any effect at all.
Tested with several files at various locations both in $HOME and /usr/share in a virtual machine with two cores at 3.5GHz, both of which had nothing to do (as resources seem to matter).

@tsujan
Copy link
Member Author

tsujan commented Apr 11, 2017

I think all such problems are fixed by lxqt/libfm-qt#63 but I leave this issue open for now.

@tsujan
Copy link
Member Author

tsujan commented Apr 29, 2017

Fixed by C++ 11 port (the latest libfm-qt and pcmanfm-qt).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants