Solving the gcc 4.4 strict aliasing problems

Publish date: February 4, 2010
Tags: bugs c c++ coding hacking

A couple of days ago Jeff Stedfast ran into some problems with gcc 4.4, strict aliasing and optimizations. Being a geeky sort of person, I found the problem really interesting, not only because it shows just how hard it is to write a good, clear standard, even when you’re dealing with highly technical (and supposedly unambiguous) language, but also because I never did “get” the aliasing rules, so it was a nice excuse to read up on the subject.

Basically, the standard says that you can’t do this:

int a = 0x12345678; short *b = (short *)&a;

I’m forcing a cast here, and since the types are not compatible, they can’t be “alias” of each other, and therefore I’m breaking strict-aliasing rules. Note that if you compile this with -O2 -Wall, it will not warn you that you’re breaking the rules, even though -O2 activates -fstrict-aliasing and -Wall is supposed to complain about everything (right??). Apparently, this is by design, though why would anyone not want warnings in -Wall for something that will obviously break code is beyond me. If you want to be told that you’re not playing by the rules, make sure you build with -Wstrict-aliasing=2, which will say:

line 2 - warning: dereferencing type-punned pointer will break strict-aliasing rules

So now you know you’re being naughty. Of course, if you did try to access the variable, even just with -Wall it will complain at you - this more complete snippet will give you several warnings with -Wall:

int a = 0x12345678;
short *b = (short *)&a;
b[1] = 0;
if (a == 0x12345678)
  printf ("error\n");
else
  printf ("good\n");

line 3 - warning: dereferencing pointer ‘({anonymous})’ does break strict-aliasing rules

The problem gets ugly when you’re dealing with structs and pointers to them - then -Wall is completely silent about possible issues, and only -Wstrict-aliasing=2 will work, like in this little snippet:

typedef struct type {
  struct type *next;
  int val;
} Type;

...

Type *t1, *t2, *t3;
t1 = t2 = NULL;
t1 = (Type*) &t2;
int i;
for (i = 0; i < 2; i++) {
  t3 = malloc (sizeof (Type));
  t1->next = t3;
  t1 = t3;
}
if (!t2)
  printf ("error\n");
else
  printf ("good\n");

This doesn’t emit any warnings on -Wall because the loop makes it slightly fuzzy for gcc to tell whether things are getting assigned or not. -O2 will optimize away the assignment to t1 on line 3, which will make things not work later on.

So how to fix this? The attribute may_alias allows a type to bypass the aliasing rules, just like character types do (character types are allowed to alias any other type, according to the c99 standard). Changing the definition of Type to the following will make the compiler happy:

typedef struct type {
  struct type *next;
  int val;
} **__attribute__((__may_alias__))** Type;

One final note: if you mix up code with aliased types and non-aliased types, gcc will not enforce aliasing optimizations on your non-aliased-possibly-broken code… i.e., if you define this type two times, one with the attribute, one without, and then do the loop above with both types (separately mind you, with separate variables, the code just happens to be in the same method), the non-aliased type won’t fail. Aren’t optimizations fun?

Update: People have pointed out that the first statement short *b = (short *)&a; is totally legal and has nothing to do with aliasing.

Yes, that’s true, I should have been more precise. The statement is perfectly legal. It’s when you try to access the data via the pointer that was assigned on that line that breaks the standard. So when your code blows up, it blows up accessing the data, but that’s not the cause, that’s the consequence. The cause of said explosion is that optimizations + strict-aliasing look at that (totally legal) statement and say “oh, dude, come on, this is bogus” and throw it away while munching on scooby snacks. Well, not sure about that last part.

Anyways, where was I? Oh yes, so, two things: if you don’t want to change your code, you can use may_alias , gcc will say “that’s so awesome” and everyone will make merry. Or something. The second thing is, and let me add a little emphasis to this part, because I’m sometimes a bit too subtle, and apparently some things should be said very clearly: when a statement is perfectly legal, and yet it IS removed via a combination of default flags with NO warnings whatsoever, something is WRONG, and in my opinion, the problem here is lack of warnings.

And that, as someone said, is that. Or not, whatever tickles your fancy. Hmmmm, tickles…