Friday, 19. February 2010
What does it mean for the byte order to be little-endian versus big-endian? Basically, it’s how the computer interprets multi-byte values. For example, we know that a single byte can take on the values of 0-255 inclusive. The arrangement of the bits within that single byte is pretty standard.
(1)2^7 + (0)2^6 + (0)2^5 + (0)2^4 + (1)2^3 + (0)2^2 + (0)2^1 + (1)2^0 = 137
Looking at the above and only looking at the ones and zeroes we have 10001001 = 137 in decimal.
We usually have the location of the smallest bit (least significant bit) on the right going to the largest bit (most significant bit) on the left. So this is for a single byte. What about multi-byte values?
First lets discuss some memory addressing issues. When you have a pointer that points to a memory address: char* p = ……; Something like that (we are assuming a char is a single byte), then p points to a single byte.
If we increment it like this: ++p, then p will point to the next byte in memory. It’s important to note that addresses in memory are the location of a single byte. You can never address bits. Although by using bit shift operators and such, you can manipulate the bits within that single byte.
Let’s expand on this a little bit. Lets say we have an array of characters:
So we have four characters contiguously stored in memory. Each character in the char has a number associated with the character as defined in the ASCII table. a = 0×61, b = 0×62, c = 0×63, d=0×64. I’m not going to cover hex notation since you should already be familiar with it, but just keep in mind that two hex digits represents a single byte. So 0x?? is a single byte.
So in memory, we have something like this
Address & Values
Note that all ASCII is, is a converter from english letters to numbers. The important thing to keep in mind is that the ascii letters are just numbers or hex values, and these hex values are what is stored in memory since hex values are equivilent to binary. Endianness is just a way to interpret the bytes within a multi-byte value. These multibyte values are, for example, short (16-bits), int (32-bit), etc.
I’m mainly concerned with intel architecture here. Which by the way uses little-endianness. PowerPC on the other hand uses big-endianness. So what is the difference? The difference lies in how the architecture interprets these values in memory. Let’s look at an example by taking our four byte array and converting it to an int (32-bit = 4 bytes).
We don’t actually have to do anything since we interpret these four bytes however we want. We can interpret them as the letters a,b,c,d or as an int, or as something else. So lets figure out what this “int” is. After all, the entire purpose of this article is to discuss the differences between little and big endian. The thing to be careful though is the fact that even though our four bytes are contiguous in memory as the values 0×61, 0×62, 0×63, 0×64, interpreting these four bytes as an int will be different with respect to both little endian and big endian. So lets convert it to little endian first.
Little Endian: Littlest address = littlest bit
memory addresses: 0×70,0×80,0×90,0xA0
memory contents: 0×61, 0×62, 0×63, 0×64
number interpreted as an “int” in hex using little endian ordering: 0×64636261
conversion from the hex value into a decimal value: 1684234849
Note: even though the bytes in memory are 61,62,63,64 as the memory addresses increase, on a little endian CPU, it interprets them from the highest address being the most significant and the lowest address being the least significant.
Big Endian: littlest address = biggest bit OR biggest address = littlest bit
memory addresses: 0×70, 0×80, 0×90, 0xA0
memory contents: 0×61, 0×62, 0×63, 0×64
memory contents interpreted as an “int” in hex using big endian ordering: 0×61626364
conversion from the hex value into a decimal value: 1633837924
The decimal values are NOT the same, which means, even though the bytes might be stored exactly the same in memory, the CPU will grab the 4 bytes and based on these four bytes use little endian or big endian to determine what the 4-byte hex value is, and thus what the 4-byte decimal value is.
So, as you can see, endianness is just a matter of interpretation. After all, we started with the letters a,b,c,d contiguously in memory.
I don’t make any claims as to the accuracy of this article, but if something is incorrect, please let me know.