Home > General, Programming > Why software people need to understand hardware

Why software people need to understand hardware

Today I was presented with a very interesting bug that shows with a very simple example why software people need to know how CPUs deal with the data and the ways that compilers can’t do everything for us.

The piece of code was very simple, but I’ll simplify it even more for the purposes of this example. Imagine we have something like this:

void myFunc () {
    uint64_t result;
    int32_t x = 495616;
    int32_t y = 8192;

    result =  x * y;
    printf("Result1 = %llu, Result2=%u\n", result, x * y);
}

That certainly looks simple enough, however, the result of running that code will look something like this:

Result1 = 18446744073474670592, Result2=4060086272

See the problem there? They’re supposed to be the same value! The correct value is the one in Result2, and that number certainly fits into a 32-bit field, so it’s not like we overflowed or got truncated. Let’s take a look at the hexadecimal values:

Result1 = 0xFFFFFFFFF2000000, Result2=0xF2000000

Now, if you know how conversions are done by the electronics and the compiler, after seeing those values, you know exactly what happened. But, if you’re not familiar with that, then you’re at a total lose here and you’ll probably never figure it out. This bug is being caused by sign extension.

Whenever you deal with signed numbers, when you convert to a bigger size, the sign is “extended,” or copied, to make sure the sign is preserved. That’s what’s causing all those ones in our Result1 print. So, even though our values x and y were small enough to fit in a signed int, the result of their multiplication wasn’t, it would only fit in an unsigned int. The number was so big that the most significant bit (MSB for those who still remember their computer architecture courses) was a 1, and since we were using signed values, that was supposed to be the sign. Since the compiler can’t read minds, it decided to preserve the sign and we end up with a 64-bit value that makes no sense in our application.

The fix is incredibly simple: use uint64_t for x and y. If that’s not possible, which could be the case if you are getting x and y as a result of other functions that you can’t modify, then the following will provide some protection:

result = (uint64_t)x * (uint64_t)y

However, that will only help as long as your values of x and y are small enough to not have a 1 in their MSB. If your numbers will be big and there’s no way to get those values as anything other than signed numbers, it’ll be time to get your hands dirty with some assembly to make sure that sign is not extended by the compiler.

So, the moral of the story is: don’t forget your computer architecture classes!

  1. March 17th, 2011 at 22:17 | #1

    Haha, I know the bug you’re talking about!

  1. No trackbacks yet.