I stumbled on the following software situation once more on 2017-12-24. The windows of this file manager did not respond any more as expected. Their status lines displayed the message “Ordnerinhalt wird geladen …” (Loading folder contents...). elfring@Sonne:~> ps -ef|grep Thunar elfring 3346 3336 0 10:49 tty2 00:00:02 Thunar --sm-client-id 2e621412a-b89a-4db2-9ba2-bd3cc0e1d1ce --daemon … elfring@Sonne:~> strace -p 3346 … futex(0x55df5419db28, FUTEX_WAIT_PRIVATE, 2, NULL elfring@Sonne:~> gdb -p 3346 … (gdb) bt #0 0x00007fbafaec7fd9 in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007fbafb43febc in g_mutex_lock_slowpath (mutex=mutex@entry=0x55df5419db28) at gthread-posix.c:1313 #2 0x00007fbafb440702 in g_mutex_lock (mutex=mutex@entry=0x55df5419db28) at gthread-posix.c:1337 #3 0x000055df5230aa49 in thunar_thumbnailer_queue_async_reply (proxy=0x55df53804550, handle=2, error=0x0, user_data=0x55df53eb63a0) at thunar-thumbnailer.c:617 #4 0x000055df5230a63e in thunar_thumbnailer_proxy_queue_async_callback (proxy=0x55df53804550, call=<optimized out>, user_data=0x55df5418e790) at ../thunar/thunar-thumbnailer-proxy.h:36 #5 0x00007fbafbcc98a2 in complete_pending_call_and_unlock (connection=connection@entry=0x55df534fa1e0, pending=0x55df543e4da0, message=message@entry=0x55df543e4310) at dbus-connection.c:2331 #6 0x00007fbafbccd21f in dbus_connection_dispatch (connection=connection@entry=0x55df534fa1e0) at dbus-connection.c:4652 #7 0x00007fbafbf124f5 in message_queue_dispatch (source=source@entry=0x55df534fc090, callback=<optimized out>, user_data=<optimized out>) at dbus-gmain.c:90 #8 0x00007fbafb3faf97 in g_main_dispatch (context=0x55df534818b0) at gmain.c:3148 #9 0x00007fbafb3faf97 in g_main_context_dispatch (context=context@entry=0x55df534818b0) at gmain.c:3813 #10 0x00007fbafb3fb1d0 in g_main_context_iterate (context=0x55df534818b0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3886 #11 0x00007fbafb3fb4e2 in g_main_loop_run (loop=0x55df54142fc0) at gmain.c:4082 #12 0x00007fbafd308ad7 in gtk_main () at /usr/lib64/libgtk-x11-2.0.so.0 #13 0x000055df522b3799 in main (argc=<optimized out>, argv=<optimized out>) at main.c:312 The directory contents were correctly displayed when I started another instance of the application “thunar 1.6.13-124.1” on my openSUSE Tumbleweed system. But a software component got stuck after my login for this XFCE session. How often can it happen that the call of the macro “_thumbnailer_lock” will be reached in the implementation of the function “thunar_thumbnailer_queue_async_reply” before the corresponding statement “_thumbnailer_unlock” will be completely executed? https://git.xfce.org/xfce/thunar/tree/thunar/thunar-thumbnailer.c?id=89de3451891c36b1b86cb626de820d11e0f4bd9e#n265
I think I have finally understood the problem here. I suppose is it a use after free. So here is what I think happens: 1. The Queue method of Tumbler is called: https://gitlab.xfce.org/xfce/thunar/-/blob/112a0753c90a0fe5259f540cf0d6b5fc262bca82/thunar/thunar-thumbnailer.c#L475 2. It returns 0, which is wrong 3. This causes the `job->handle` field to remain at 0: https://gitlab.xfce.org/xfce/thunar/-/blob/112a0753c90a0fe5259f540cf0d6b5fc262bca82/thunar/thunar-thumbnailer.c#L350 4. Next, some other job starts, but does not receive a handle yet. This results in 2 jobs having their `job->handle` field set to 0 5. During this time, the job with handle 0 finishes executing, so it gets deallocated (`handle` function argument here is 0): https://gitlab.xfce.org/xfce/thunar/-/blob/112a0753c90a0fe5259f540cf0d6b5fc262bca82/thunar/thunar-thumbnailer.c#L873 But instead of real job with the handle 0, the one started on step 4 is found (one without a handle assigned), so it gets deallocated 6. Enqueuing of the second job finally finishes: https://gitlab.xfce.org/xfce/thunar/-/blob/112a0753c90a0fe5259f540cf0d6b5fc262bca82/thunar/thunar-thumbnailer.c#L327 But this job is already deallocated on step 4 7. So already freed memory is accessed and everything goes wrong I am not sure if these steps are exactly correct. Here is how it looks in the debugger: after setting these breakpoints in GDB (line numbers as of Thunar commit 112a0753c90a0fe5259f540cf0d6b5fc262bca82): > break thunar-thumbnailer.c:330 > command > silent > printf "thunar_thumbnailer_queue_async_reply: %p\n", job > continue > end > break thunar-thumbnailer.c:865 > command > silent > printf "thunar_thumbnailer_thumbnailer_finished: %p %d\n", job, job->handle > continue > end > break thunar-thumbnailer.c:306 > command > silent > printf "thunar_thumbnailer_free_job: %p\n", job > continue > end This is a possible output: > thunar_thumbnailer_thumbnailer_finished: 0x555555a12e40 0 > thunar_thumbnailer_free_job: 0x555555a12e40 > thunar_thumbnailer_queue_async_reply: 0x555555a12e40 > > (thunar:1339130): GLib-GObject-WARNING **: 00:09:57.296: invalid uninstantiatable type '(null)' in cast to 'ThunarThumbnailer' > > (thunar:1339130): thunar-CRITICAL **: 00:09:57.296: thunar_thumbnailer_queue_async_reply: assertion '(((__extension__ ({ GTypeInstance *__inst = (GTypeInstance*) ((thumbnailer)); GType __t = ((thunar_thumbnailer_get_type ())); gboolean __r; if (!__inst) __r = (0); else if (__inst->g_class && __inst->g_class->g_type == __t) __r = (!(0)); else __r = g_type_check_instance_is_a (__inst, __t); __r; }))))' failed I am not really sure not to reproduce the issue, but what I did was repetitively navigating between folders with many files in Thunar and terminating Tumbler; this way the chance of reproducing was fairly high. I think that just making Tumbler follow the specification should be enough to fix this issue, so here is the bug report (fix included): https://bugzilla.xfce.org/show_bug.cgi?id=16814 Hopefully, all of this makes some sense.
(In reply to afdw from comment #1) > 2. It returns 0, which is wrong Thanks for your discovery. > 4. Next, some other job starts, but does not receive a handle yet. I interpreted the implementation of the function “thunar_thumbnailer_queue_async_reply” in the way that there is a specific lock type applied. I imagined that a deadlock can only happen then if this function is executed once more before an unlock would be performed. * Does the aspect “reentrancy” matter here? * Is there a need to use recursive locks? Is a job queue management relevant for the affected use case?
(In reply to Markus Elfring from comment #2) > (In reply to afdw from comment #1) > > 4. Next, some other job starts, but does not receive a handle yet. > > I interpreted the implementation of the function > “thunar_thumbnailer_queue_async_reply” in the way that there is a specific > lock type applied. > I imagined that a deadlock can only happen then if this function is executed > once more before an unlock would be performed. > * Does the aspect “reentrancy” matter here? > * Is there a need to use recursive locks? I do not think that locking is really the problem here. https://gitlab.xfce.org/xfce/thunar/-/blob/112a0753c90a0fe5259f540cf0d6b5fc262bca82/thunar/thunar-thumbnailer.c#L317-334 : > static void > thunar_thumbnailer_queue_async_reply (GObject *proxy, > GAsyncResult *res, > gpointer user_data) > { > ThunarThumbnailerJob *job = user_data; > ThunarThumbnailer *thumbnailer; > GError *error = NULL; > guint handle; > > _thunar_return_if_fail (THUNAR_IS_THUMBNAILER_DBUS (proxy)); > _thunar_return_if_fail (job != NULL); > > thumbnailer = THUNAR_THUMBNAILER (job->thumbnailer); > > _thunar_return_if_fail (THUNAR_IS_THUMBNAILER (thumbnailer)); > > _thumbnailer_lock (thumbnailer); But the time we got to the `_thumbnailer_lock (thumbnailer);` line, the use after free has already occurred, because the `user_data` is might be a dangling pointer by now, so `thumbnailer = THUNAR_THUMBNAILER (job->thumbnailer);`, which means accessing `((ThunarThumbnailerJob *) user_data)->thumbnailer`, is already wrong. Here is how this pointer behaves in the debugger (trying to get the type name and trying to inspect data behind it): > (gdb) p g_type_name(((GTypeInstance *) job)->g_class->g_type) > $1 = (const gchar *) 0x0 > (gdb) p *((ThunarThumbnailerJob *) user_data) > $2 = {thumbnailer = 0x7ffff7eb09c0, cancelled = 0, lazy_checks = 1, > files = 0x555555c01700 = {0x8, <error reading variable> Compare that to a good pointer (other argument of this function): > (gdb) p g_type_name(((GTypeInstance *) res)->g_class->g_type) > $3 = (const gchar *) 0x7ffff73f1279 "GTask" > (gdb) p *((GTask *) res) > $4 = {parent_instance = {g_type_instance = {g_class = 0x55555567fd90 [g_type: None]}, > ref_count = 2, qdata = 0x0}, source_object = 0x55555585ce20, > source_tag = 0x7ffff730e8b0 <g_dbus_proxy_call_internal>, name = 0x0, task_data = 0x0, > task_data_destroy = 0x0, context = 0x55555568c800, creation_time = 341128963267, > priority = 0, cancellable = 0x0, callback = 0x5555555f76f0 <thunar_thumbnailer_queue_async_reply>, > callback_data = 0x5555559ffae0, task_func = 0x0, lock = {p = 0x0, i = {0, 0}}, > cond = {p = 0x0, i = {0, 0}}, thread_cancelled = 0, thread_complete = 0, return_on_cancel = 0, > completed = 0, had_error = 0, result_set = -1, ever_returned = -1, check_cancellable = -1, > synchronous = 0, blocking_other_task = 0, error = 0x0, result = {pointer = 0x555555a4cec0, > size = 93824997445312, boolean = 1436864192}, result_destroy = 0x7ffff7306dc0 <reply_data_free>} I have seen 3 outcomes of the bug occurring randomly: 1. Just warning in the output 2. Freeze while loading folder contents 3. Segmentation fault So problems with locking can occur (as this code can lock on some random data from memory), but I think it is not the root cause of the issue.
Is bug 15211 related to this issue?
(In reply to afdw from comment #3) > So problems with locking can occur (as this code can lock on some random > data from memory), but I think it is not the root cause of the issue. Is this software designed in the way that multiple jobs (like generation of thumbnails) can be executed in parallel?
(In reply to Theo Linkspfeifer from comment #4) > Is bug 15211 related to this issue? Most likely this one and https://bugs.launchpad.net/ubuntu/+source/thunar/+bug/1832992 are exactly the same!
(In reply to Markus Elfring from comment #5) > (In reply to afdw from comment #3) > > So problems with locking can occur (as this code can lock on some random > > data from memory), but I think it is not the root cause of the issue. > Is this software designed in the way that multiple jobs (like generation of > thumbnails) can be executed in parallel? Yes, I think it does. In fact, the generation of thumbnails is delegated with asynchronous requests to a DBus service that can have multiple implementations. One from Xfce is Tumbler, and I think the root cause of the problem lies in it.
(In reply to afdw from comment #7) Does the file manager check for questionable job handles (according to involved services)?
(In reply to Markus Elfring from comment #8) > (In reply to afdw from comment #7) > Does the file manager check for questionable job handles (according to > involved services)? No, I do not think it does, it just trusts that thumbnailing service works as intended and that incoming handles are correct. Maybe adding more checks is not a bad idea?
Reading the comments on Bug #16814, I guess this Bug can be closed as a duplicate ? (In reply to afdw from comment #9) > Maybe adding more checks is not a bad idea? If you have an idea on what best to check to stay safe, patches would be very welcome !
(In reply to alexxcons from comment #10) > Reading the comments on Bug #16814, I guess this Bug can be closed as a > duplicate ? They are not exactly duplicates of each other. There are 2 problems here: 1. Tumbler returns 0 handles 2. Thunar does not check for them, which leads to undefined behavior > (In reply to afdw from comment #9) > > Maybe adding more checks is not a bad idea? > If you have an idea on what best to check to stay safe, patches would be > very welcome ! Merge Request: https://gitlab.xfce.org/xfce/thunar/-/merge_requests/4
(In reply to afdw from comment #11) > Merge Request: https://gitlab.xfce.org/xfce/thunar/-/merge_requests/4 Thanks alot, much appreciated !
Anton Danilkin referenced this bugreport in commit 98394ef960884af83b801824757a9a69b9bd8da0 Add checks for 0 handles (Bug #14122) https://gitlab.xfce.org/xfce/thunar/commit/98394ef960884af83b801824757a9a69b9bd8da0
Anton Danilkin referenced this bugreport in commit 480c0663af64753131182a37c7896988586b607c Add checks for 0 handles (Bug #14122) https://gitlab.xfce.org/xfce/thunar/commit/480c0663af64753131182a37c7896988586b607c
Pushed to master and 4.14 branch. To be released in 1.9.0 and 1.8.15 So I gues the thunar part is fine now. Thank alot for working on it, afdw !
When I kill tumblerd and navigate to a folder containing an image file, I get this log output: ThunarThumbnailer: got 0 handle (Queue) ThunarThumbnailer: got 0 handle (Error or Ready) ThunarThumbnailer: got 0 handle (Finished) Is this supposed be written to the session log every time tumblerd is (re)started?
(In reply to Theo Linkspfeifer from comment #16) > When I kill tumblerd and navigate to a folder containing an image file, I > get this log output: > > ThunarThumbnailer: got 0 handle (Queue) > ThunarThumbnailer: got 0 handle (Error or Ready) > ThunarThumbnailer: got 0 handle (Finished) > > Is this supposed be written to the session log every time tumblerd is > (re)started? Yes .. though isnt the text prefixed with "Warning" or "Error" ? Guess we should do that. These messges should vanish after tumbler got patched: Bug #16814 https://bugzilla.xfce.org/show_bug.cgi?id=16814