Statuses

Debugging a reference count bug

In Bluefish, gtk+, open source, Programming, Ubuntu on February 5, 2012 by oli4444 Tagged:

Last days I have been debugging some weird reports. They all show the same characteristics:

  • the users are on Ubuntu 11.10
  • they use bluefish compiled against gtk 3.2 (so not the bluefish package that is provided by Ubuntu, but a newer one)
  • in the Bluefish run the sort function of a GtkTreeModelSort is called after the GtkTreeModelSort should have been finalized and free’ed.

First I used gobject-list.c from http://people.gnome.org/~mortenw/gobject-list.c to see all refs and unrefs on all GtkTreeModelSort objects in Bluefish (luckily there is only 1 used in Bluefish).This showed that there was indeed a GtkTreeModelSort with lots of references left after it should have been finalized. I tried the same thing on Fedora 16 (also gtk-3.2), but it can only be reproduced on Ubuntu 11.10.I tried to get backtraces with gobject-list (which uses libunwind for that) but those backtraces turned out to be useless.

Luckily I received some help on IRC #gtk+ from Company and alex. The first idea was to use systemtap, but since there is no useful kernel for systemtap available for Ubuntu I had to use something more low tech suggested by Company:  I set a breakpoint on gtk_tree_model_sort_new to retrieve the pointer of the GtkTreeModelSort. Once I got that pointer I could set a breakpoint on g_object_ref and g_object_unref with a condition on this pointer. Then I created an automatic backtrace on each breakpoint:

break g_object_ref if object == 0x123123123
commands
bt
c
end

I configured gdb to log everrything to a file, and did a bluefish run. This resulted in a 2.1 Mb logfile with backtraces. This log also showed there were more refs than unrefs.

In this logfile there were a lot of similar backtraces, with an identical function doing a ref and an unref. I wrote a short python script to parse the backtraces and skip all ‘valid pairs’

After this step I had only 15 backtraces left. And from these backtraces the leaking references were easily identified.

Because I was unsure if this is a Ubuntu specific bug or a generic gtk bug the resulting bugreport can be found both at https://bugzilla.gnome.org/show_bug.cgi?id=669376 and at https://bugs.launchpad.net/bugs/926889

Now I am wondering if this approach would work for any reference count leaking problem. I guess the most difficult issue is to find the value of the pointer that is leaking if you have many objects of the same type.. Any suggestions how to do this?

Advertisements

8 Responses to “Debugging a reference count bug”

  1. In general you can use valgrind –show-leaks=yes (or similar options) to identify the stack trace that allocates the object you are after. And then you have to find a place in that code that only gets run once.

    And I think this method is quite potent. I used it for fixing https://bugzilla.gnome.org/show_bug.cgi?id=664137 and that’s a rather different refcount bug.

  2. Hi, Danielle Madeley also created a tool for this purpose a while back:
    http://blogs.gnome.org/danni/2011/02/17/ld_preload-gobject-lifetime-debugging-tool/

  3. You mentioned the launchpad bug in GNOME Bugzilla. It would be nice to do the reverse as well.

  4. Due to -Bsymbolic use, people neew to rebuild their libs to use refdbg :/ which otheriwse would be the easier way to get traces. On the gdb exersice and the filtering – I did the same (and blogged about this a few times on planet gnome:
    http://buzztard.git.sourceforge.net/git/gitweb.cgi?p=buzztard/buzztard;a=blob;f=tests/refcount.gdb;h=789c44ee65dfa37b2a121ce3b09133ee5a548da0;hb=HEAD
    http://buzztard.git.sourceforge.net/git/gitweb.cgi?p=buzztard/buzztard;a=blob;f=tests/refcountfilter.pl;h=167954086876b3d1b1e63648f8fbccf6b01a26f9;hb=HEAD

    It would be good to have a working solution.

  5. A great debugging case study. I am curious about the remaining 15 backtraces. Was there just one function responsible for the leaking reference?

    • the other 15 where mostly the leaking references (so multiple identical backtraces that all leaked a reference count), and the normal widget unref functions when closing the application.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: