Why interrupt affinity with multiple cores is not such a good thing

Posted on September 17, 2009, 2:44 pm, by Alexander Sandler, under Blog, System Administrator Articles.

One of the features of x86 architecture is ability to spread interrupts evenly among multiple cores. Benefits of such configuration seems to be obvious. Interrupts consume CPU time and by spreading them on all cores we avoid bottle-necks.

I’ve written an article explaining this mechanism in greater detail. Yet let me remind you how it works in two words.

Every x86 motherboard has a chip called IO-APIC. This is a device that controls interrupt delivery within your system. It knows how many CPUs are in your system and can direct various interrupts to various CPUs. It uses so called local APIC-ID as an identifier of the processor.

It has two modes of operation. In one mode it sends interrupts from certain device to single, predefined core. This mode of operation called fixed/physical mode. In another mode, it can deliver interrupts from certain device to multiple cores. The later mode called logical/low priority interrupt delivery mode.

When in logical mode, IO-APIC can deliver interrupts to up to eight cores. Source of this limitation is the size of the bitmask register that tells what CPUs should receive the interrupts. The bitmask is only eight bits long.

When considering the round robin-type of interrupt delivery mode (the logical mode), I cannot stop thinking about how it degrades performance.

You see, having burden of interrupt handling spread among multiple cores may solve some bottle-necks, but it creates a problem.

Consider network interface card for example. Lets say we have a TCP connection to some host out there. When packet arrives, the network card issues an interrupt and IO-APIC directs it to one of the cores. Next, the core handing the packet should fetch the TCP connection objects from the memory to its cache.

IO-APIC does not guarantee that next packet that belongs to the connection will be handled by the same core. So, it is likely that two cores will have to work with TCP connection object. Both of them will have to fetch its content into their cache. This will cause cache coherency problems (cache misses). And as you can learn from the article I’ve written on misaligned memory accesses, accessing memory that is not in cache can take up to 30 times more time than accessing cached RAM.

Moreover, assuming TCP connection object is properly protected using synchronization techniques, one of the cores will inevitably have to wait for the other, adding unnecessary delay to packet processing.

My point is that round-robin style interrupt delivery can be quiet nasty on performance. It is much better to deliver interrupts from certain device to given core.

Luckily, smp_affinity interface, that I mentioned in my old article, allows you to bind interrupts from certain device to certain (single) core.

On some computers IO-APIC does not support logical delivery mode. This can be because of buggy BIOS or too many CPUs. On such computers physical interrupt delivery mode is the only thing that works, so binding single interrupt to single core is the only choice and the only thing you can do is switch the core from one to another.

My point is that round-robin style interrupt delivery can:

Malfunction
Cause performance degradation.

So, when it still might become useful, you may ask? It depends on what you do with your computer. Usually, you don’t need round-robin style interrupt delivery. You only need it if you know that your computer receives lots of interrupts and you have real-time applications.

In this case, scheduler (which has no idea about interrupts) can schedule thread that requires lots of CPU time to run on core that serves interrupts. Since interrupts has higher priority, the thread will receive less CPU time. In case of real-time application it may result in reduced responsiveness.

Even so, you can still assign all interrupts to one core and use thread affinity techniques to make sure that your application doesn’t use that core.

Bookmark: digg, del.icio.us, reddit, stumbleupon, technorati, twitter, google, yahoo, facebook
Tags: affinity, core, CPU, interrupt, motherboard, network, performance, tcp
Comment (RSS) | Trackback

Did you know that you can receive periodical updates with the latest articles that I write right into your email box? Alternatively, you subscribe to the RSS feed!

Want to know how? Check out
Subscribe page

15 Comments

arkon says:

October 27, 2009 at 3:39 am

Hey there,

I was wondering, tell me if I’m wrong, that if you direct the interrupts to a single specific core from the network device (for instance), then still the other processors can be locked waiting for some network object, so I don’t see how it’s better. So either way other processors will end up waiting for the synch objects in order to access some data.

However, it seems it might help cache coherency, but I am not sure.

Reply to this comment
Alexander Sandler says:

October 29, 2009 at 5:08 pm

@arkon
Other cores won’t be locked, waiting for some network object, because all work with network objects will be done on single core. This eliminates possibility of race.

Reply to this comment
arkon says:

October 29, 2009 at 8:49 pm

@Alexander Sandler –
But how can you say or justify that single core handling network is better than a few? In other words, you are saying that you prefer not being locked on the object, rather than handling concrete data?

I don’t see a solid proof, I think it should be measured by its throughput. Unless, I don’t understand something clearly.

Reply to this comment
Alexander Sandler says:

November 1, 2009 at 10:01 am

@arkon
My point is simple. It is probably better to handle small number of packets per second on single core rather then on multiple cores. This is because handling packets on multiple cores will cause cache misses.
So, indeed throughput plays a major role here.

Reply to this comment
ntop » IRQ Balancing says:

December 14, 2009 at 12:22 am

[…] Why interrupt affinity with multiple cores is not such a good thing Author: admin Filed Under Category: PF_RING, TNAPI Article Comments: No Comments […]

Reply to this comment
Sharon Bruton says:

January 28, 2010 at 11:33 pm

Just keep posting good stuff.

Reply to this comment
Bhushan says:

November 2, 2010 at 8:50 am

Hi Alex,

As you mentioned here throughput plays a major role here. So if in a system throughput is high along with some of the processes, then irq handling on some of the cores along with process affinity would help or not? I am also not sure what would be default process affinity of all the process, would all these processess served by cpu0.

Reply to this comment
Alexander Sandler says:

November 15, 2010 at 7:07 pm

@Bhushan
I think you should really reread the post. Also, take a look at related posts.

Reply to this comment
Mike Waychison says:

December 26, 2010 at 8:32 am

RR interrupt delivery for received network packets can also cause congestion collapse of open streams. TCP assumes in its design that the underlying network will deliver packets _in order_. Spraying packets randomly across cores in a machine will look like intermittent failure on the network path and the only way TCP knows to handle the situation is to collapse the congestion window.

Reply to this comment
Alexander Sandler says:

December 27, 2010 at 10:33 am

@Mike Waychison
Good point. Thanks.

Reply to this comment
SMP IRQ affinity | /dev/pawwa says:

July 28, 2013 at 8:53 pm

[…] http://www.alexonlinux.com/why-interrupt-affinity-with-multiple-cores-is-not-such-a-good-thing […]

Reply to this comment
10G(82599EB) 测试优化(中断处理) | Jasey Wang says:

December 16, 2013 at 6:27 am

[…] cache 里面的内容慢 30x；并且对于流媒体来说，还可能造成 congestion collapse。另外，由于 APIC 的问题，可能无法实现上面的要求，比如，我想将 irq 60 […]

Reply to this comment
anonymous says:

May 10, 2014 at 11:46 pm

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4985c32ee4241d1aba1beeac72294faa46aaff10

Reply to this comment
Is it a good practice to set interrupt affinity and io handling thread affinity to the same core? – program faq says:

February 9, 2018 at 8:52 am

[…] went through why-interrupt-affinity-with-multiple-cores-is-not-such-a-good-thing, and learnt that NIC irq affinity should be set to a different core other then the core that […]

Reply to this comment
writing or writting says:

November 16, 2018 at 3:57 am

writing or writting

blog topic

Reply to this comment