|home| |posts| |projects| |cv| |bookmarks| |github|

Size of structs matters

Here's a simple question: given the following definition of a struct, what is its size?

struct S {
    char a;
    uint32_t b;
    char c;
};

If your answer was 6(1 byte for a, 4 bytes for b and 1 byte for c), you were wrong.

The correct answer is 12.

Why is that?

Alignment and size

Every type has a natural alignment and a size.

For example(in most cases):

Now, size needs no explanation. But what is alignment?

To explain alignment we need to realize that CPUs work with words not bytes and that different CPU architectures have different word size(for x86 word size is 4, for x86_64 word size is 8).

What does that mean?

It means, in case of our struct above, that, for example if we want to access the field b on a 32 bit CPU, the CPU would take 2 cycles to load that value.

Here's why:

To access b we load the first word(first 4 bytes) which contain the byte for a and the first 3 bytes of b. Then, to finish loading b, we also need to load the second word which has the last byte of b.

To fix this two step load, alignment comes into the picture.

Let's take each field of the stuct:

With the padding added, if we want to load the field b, we can do it in one CPU cycle, since it's aligned to the CPU word.

So, in reality, the struct will look something like this:

struct S {
    char a;
    char padding[3]; // 3 bytes padding
    uint32_t b;
    char c;
};

Now the size of the struct is 9, so where are the other 3 bytes comming from?

Like said before, every type has a natural alignment and a size. Structs are also types so the alignment rule applies to them also.

How do we determine the alignment of a struct?

It's the maximum alignment of its fields. In our case is 4, since 4(alignment of b) is the biggest alignment of all fields.

This means that in order to have all instances of our struct with alignment 4, we have to make sure that the size of our struct is a multiple of 4.

Right now, like seen above, our struct has size 9. So we need to add 3 more bytes to bring it to 12, which is a multiple of 4.

In the end, the struct looks like this:

struct S {
    char a;
    char padding[3]; // 3 bytes padding
    uint32_t b;
    char c;
    char padding2[3]; // 3 bytes padding
};

Why should I care about all this?!

Many ask this question. At first sight, it's just unnecessary information.

But let's see what happens if we change the order of declaration for the fields.

From this(which we established results in size=12):

struct S {
    char a;
    uint32_t b;
    char c;
};

To this:

struct S {
    uint32_t b;
    char a;
    char c;
};

What's the difference? The difference is 4 because now the size of the struct is 8.

Why? Because now we start with b which now doesn't need any padding before it, then we have a and c which also don't need any padding, and we only need 2 bytes of padding at the end to make the struct aligned to 4.

(Sidenote: this is a best practice to define struct fields in decreasing order of their alignment)

Again the question: why should I care? 4 bytes is not that big of a deal.

That's true, if you only have one instance of the struct in memory, 4 bytes worth of savings don't matter.

But what if, for example, you have 1000 instances of that struct?

Then the difference would be almost 4KB.

Again, you may say: 4KB? That's nothing for modern computers, most have gigabytes of memory and a 4KB saving is nothing.

And now, finally, we reach the point of this whole post.

If you want your code to be fast, memory(i.e. RAM) is the new bottleneck in modern computers (assuming you don't do other really stupid things). Its size doesn't really matter.

The size that really matters is the size of the CPU caches. And for CPU caches, a 4KB saving is worth a lot if we consider their size.

Typically, size of L1 is in KB, size of L2 and L3 is in MB.

For example, here are the sizes of the CPU caches on my laptop:

  L1d:                   256 KiB
  L1i:                   256 KiB
  L2:                    4 MiB
  L3:                    8 MiB

To conclude, if you want your code to go fast, make your data structures as small as possible so that more of them can fit into the CPU caches and therefore avoiding cache misses.