Why interrupt affinity with multiple cores is not such a good thing

One of the features of x86 architecture is ability to spread interrupts evenly among multiple cores. Benefits of such configuration seems to be obvious. Interrupts consume CPU time and by spreading them on all cores we avoid bottle-necks.

I’ve written an article explaining this mechanism in greater detail. Yet let me remind you how it works in two words.

Every x86 motherboard has a chip called IO-APIC. This is a device that controls interrupt delivery within your system. It knows how many CPUs are in your system and can direct various interrupts to various CPUs. It uses so called local APIC-ID as an identifier of the processor.

It has two modes of operation. In one mode it sends interrupts from certain device to single, predefined core. This mode of operation called fixed/physical mode. In another mode, it can deliver interrupts from certain device to multiple cores. The later mode called logical/low priority interrupt delivery mode.

When in logical mode, IO-APIC can deliver interrupts to up to eight cores. Source of this limitation is the size of the bitmask register that tells what CPUs should receive the interrupts. The bitmask is only eight bits long.

When considering the round robin-type of interrupt delivery mode (the logical mode), I cannot stop thinking about how it degrades performance.

You see, having burden of interrupt handling spread among multiple cores may solve some bottle-necks, but it creates a problem.

Consider network interface card for example. Lets say we have a TCP connection to some host out there. When packet arrives, the network card issues an interrupt and IO-APIC directs it to one of the cores. Next, the core handing the packet should fetch the TCP connection objects from the memory to its cache.

IO-APIC does not guarantee that next packet that belongs to the connection will be handled by the same core. So, it is likely that two cores will have to work with TCP connection object. Both of them will have to fetch its content into their cache. This will cause cache coherency problems (cache misses). And as you can learn from the article I’ve written on misaligned memory accesses, accessing memory that is not in cache can take up to 30 times more time than accessing cached RAM.

Moreover, assuming TCP connection object is properly protected using synchronization techniques, one of the cores will inevitably have to wait for the other, adding unnecessary delay to packet processing.

My point is that round-robin style interrupt delivery can be quiet nasty on performance. It is much better to deliver interrupts from certain device to given core.

Luckily, smp_affinity interface, that I mentioned in my old article, allows you to bind interrupts from certain device to certain (single) core.

On some computers IO-APIC does not support logical delivery mode. This can be because of buggy BIOS or too many CPUs. On such computers physical interrupt delivery mode is the only thing that works, so binding single interrupt to single core is the only choice and the only thing you can do is switch the core from one to another.

My point is that round-robin style interrupt delivery can:

  1. Malfunction
  2. Cause performance degradation.

So, when it still might become useful, you may ask? It depends on what you do with your computer. Usually, you don’t need round-robin style interrupt delivery. You only need it if you know that your computer receives lots of interrupts and you have real-time applications.

In this case, scheduler (which has no idea about interrupts) can schedule thread that requires lots of CPU time to run on core that serves interrupts. Since interrupts has higher priority, the thread will receive less CPU time. In case of real-time application it may result in reduced responsiveness.

Even so, you can still assign all interrupts to one core and use thread affinity techniques to make sure that your application doesn’t use that core.

Did you know that you can receive periodical updates with the latest articles that I write right into your email box? Alternatively, you subscribe to the RSS feed!

Want to know how? Check out
Subscribe page

13 Comments

  1. arkon says:

    Hey there,

    I was wondering, tell me if I’m wrong, that if you direct the interrupts to a single specific core from the network device (for instance), then still the other processors can be locked waiting for some network object, so I don’t see how it’s better. So either way other processors will end up waiting for the synch objects in order to access some data.

    However, it seems it might help cache coherency, but I am not sure.

  2. @arkon
    Other cores won’t be locked, waiting for some network object, because all work with network objects will be done on single core. This eliminates possibility of race.

  3. arkon says:

    @Alexander Sandler
    But how can you say or justify that single core handling network is better than a few? In other words, you are saying that you prefer not being locked on the object, rather than handling concrete data?

    I don’t see a solid proof, I think it should be measured by its throughput. Unless, I don’t understand something clearly.

  4. @arkon
    My point is simple. It is probably better to handle small number of packets per second on single core rather then on multiple cores. This is because handling packets on multiple cores will cause cache misses.
    So, indeed throughput plays a major role here.

  5. [...] Why interrupt affinity with multiple cores is not such a good thing Author: admin Filed Under Category: PF_RING, TNAPI Article Comments: No Comments [...]

  6. Bhushan says:

    Hi Alex,

    As you mentioned here throughput plays a major role here. So if in a system throughput is high along with some of the processes, then irq handling on some of the cores along with process affinity would help or not? I am also not sure what would be default process affinity of all the process, would all these processess served by cpu0.

  7. @Bhushan
    I think you should really reread the post. Also, take a look at related posts.

  8. Mike Waychison says:

    RR interrupt delivery for received network packets can also cause congestion collapse of open streams. TCP assumes in its design that the underlying network will deliver packets _in order_. Spraying packets randomly across cores in a machine will look like intermittent failure on the network path and the only way TCP knows to handle the situation is to collapse the congestion window.

  9. […] cache 里面的内容慢 30x;并且对于流媒体来说,还可能造成 congestion collapse。另外,由于 APIC 的问题,可能无法实现上面的要求,比如,我想将 irq 60 […]

Leave a Reply


8 + = sixteen