Statuses

a weird for loop

In Bluefish, Programming on November 5, 2013 by oli4444

Today I had the weirdest debugging session ever. A bug https://bugzilla.gnome.org/show_bug.cgi?id=704108 was reported that I could reproduce on Fedora 19, but not on Fedora 18, not on OSX, not on Ubuntu 12.04 (tried both 32bits and 64bits), and not on Ubuntu 13.10.

The bug report mentioned that the highlighting engine did not end single line comments on the first newline. First I tracked this down to a difference in the compiled DFA table. After further debugging I found that a variable in the language file compiler was not set correctly. This variable, only_symbols was defined in a loop:

gboolean only_symbols = TRUE;
gint j;
for (j = 0; j <=127 ; j++) {
  if (characters[j] == 1 && !character_is_symbol(st,context, j)) {
    only_symbols = FALSE;
    break;
  }
}

what I found: j not only loops from 0 to 127, but continues far beyond that number!?!? The function character_is_symbol() is a simple array lookup.

I even added an extra check inside the loop

if (j > 127) {
   g_assert_not_reached();
}

but although j reached values of 10000, this assertion was never reached????

The fix for the bug was to reverse the loop:
for (j = 127; j >=0 ; j--)

I am still completely baffled by this bug. How is this possible? Does this have to do with loop vectorization? Is this a bug in gcc? Is this a bug in my code? What is going on?

edit: thanks for your comments. It most likely was a combination of a 1-over array size (undefined) with the aggressive loop optimization that the latest gcc has (which then seems to remove the condition in the loop). Problem solved, code fix in svn!

About these ads

12 Responses to “a weird for loop”

  1. It seems (from looking at the source code) that “characters” has 127 elements. You loop reaches 127 which accesses the 128th element which is invalid and results in undefined behavior.
    There have been some changes to gcc (I think version 4.7) which result in weird behavior when it detects undefined cases.

    • Thanks! I spent so many hours staring at the screen, but I didn’t check the array size. Comitted this fix already.

      I hope gcc will warn about this sort of situations in the future.

      • So is is a compiler bug here. Clearly the compiler shouldn’t have stopped at removing the loop. It should have inserted code to reformat the hard drive but only after mailing his browser history to all his contacts.

  2. Compile (the old code) without optimization (or loop optimizations) and see if the error is still reproduciable…

  3. Can’t see anything obvious… have you tried running it a debugger, or adding printf() statements inside the loop to monitor the value of j on each iteration?

  4. Try with -Wextra, -Wall does not show all warnings. -Wextra includes -Wtype-limits and might help.

  5. Could you try changing gint to guint?

  6. gcc 4.8 has -Waggressive-loop-optimizations which I think is on by default which should help with situations like this. If the array size is known at compile time then using G_N_ELEMENTS rather than hard coding loop limits can be useful. In general compilers are only getting more agressive at exploiting undefined behaviour. There are some good posts about this on John Regehr’s blog – http://blog.regehr.org/archives/213, the clang blog also has a series ‘what every programmer should know about undefined behaviour’

  7. “code fix in svn!” is the most depressing part of this otherwise very interesting story

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: