SMP affinity and proper interrupt handling in Linux

Introduction

Hardware interrupts has always been expensive. Somehow these small pieces of software consume so much CPU power and hardware and software engineers has always been trying to change this state of affairs. Some significant progress has been made. Still hardware interrupts consume lots of CPU power.

You will rarely see effects of interrupt handling on desktop systems. Take a look at your /proc/interrupts file. This file enlists all of your hardware devices and how many interrupts received by each and one of them on each CPU. If you are on a regular desktop system, you will see that number of interrupts that your computer handles is relatively small. Even powerful servers handling millions of packets per second handle only tens of thousands of interrupts per second. Yet these interrupts consume CPU power and handling them properly undoubtedly helps to improve system’s performance.

But really, what can we do about interrupts?

There are many things that can be done. Many Linux distributions ship with kernel that include modifications that significantly improve the situation. Technologies, such as NAPI, reduce number of interrupts and interrupt handling overhead so dramatically, that modern server probably wont be able to sustain a 1Gbps Ethernet link. NAPI is part of kernel for quiet some time. Other things include interrupt coalescence.

In this article I would like to address one of the most powerful techniques to optimize interrupt handling.

SMP affinity

The SMP affinity or processor affinity term has quiet broad meaning and requires an explanation. The word affinity addresses proximity of a certain task to certain processor within multi-processor system. I.e. when processor X runs process Y, they are affine to each other. The processor has parts of process’s memory in cache, thus constantly moving the process to different processor when scheduling it, would probably mean less effective scheduling.

As far as interrupts concerned, SMP affinity refers to a question what processor handles certain interrupt. On the contrary to the processes, binding interrupts to certain CPU will most likely cause performance degradation and here’s why. Interrupt handlers are usually very small in size. Interrupt’s memory footprint is relatively small, thus keeping interrupt on certain CPU will not improve cache hits. Instead, multiple interrupts will keep one of the cores overloaded while others remain relatively free. Scheduler has no idea about this state of affairs. It assumes that our interrupt handling core is as busy as any other core. As a result, you may face bottle necks as one of the processes or threads will occasionally work on core that has only 90% of its power available.

Things may be even worse because often core 0 by default handles all interrupts. On busy systems all interrupts may consume as much as 30% of core’s 0 power. Because we assume that all cores are equally powerful, we may find ourselves in a situation where our software system will effectively use only 70% of total CPU power.

Who’s responsible

APIC or Advanced Programmable Interrupt Controller has been integral part of all modern x86 based systems for many years – both SP (single-processor) and MP. This component is responsible for delivering interrupts. It also decides what interrupt goes where, in terms of cores.

By default APIC delivers ALL interrupts to core 0.This is the reason why /proc/interrupts will look like this on vast majority of modern Linux systems:

         CPU0     CPU1     CPU2     CPU3
  0:   123357        0        0        0   IO-APIC-edge  timer
  8:        0        0        0        0   IO-APIC-edge  rtc
 11:        0        0        0        0  IO-APIC-level  acpi
169:        0        0        0        0  IO-APIC-level  uhci_hcd:usb1
177:        0        0        0        0  IO-APIC-level  qla2xxx
185:        0        0        0        0  IO-APIC-level  qla2xxx
193:    12252        0        0        0  IO-APIC-level  ioc0
209:        0        0        0        0  IO-APIC-level  uhci_hcd:usb2
217:      468        0        0        0  IO-APIC-level  eth0
225:      285        0        0        0  IO-APIC-level  eth1
NMI:      120       66       76       45
LOC:   123239   123220   123187   123065
ERR:        0
MIS:        0

See anything suspicious? Well, CPU0 handling all hardware interrupts. All of them. This is the situation that you see on a system with misconfigured interrupt SMP affinity.

Simple solution for the problem

Solution for this problem has been around pretty much since the introduction of the APIC. It has several interrupt delivery and destination modes. Physical and logical. Fixed and low priority. Etc. The important fact is that it is capable of delivering interrupts to any of the cores and even do load balancing between them.

Its configuration is limited to first eight cores. I.e. if you have more than eight cores, don’t expect any core higher than 7 to receive interrupts.

By default it operates in physical/fixed. This means that it will deliver certain interrupt to certain core. You already know that by default it is core 0. The thing is that you can easily change core that receives certain interrupt.

For each and every IRQ number in the first column in /proc/interrupts file, there’s a sub-directory in /proc/irq/. That directory contains a file named smp_affinity. Using this file you can change what core handles that interrupt. Reading from this file produces a hexadecimal number which is a bitmask with a single bit for each core. When certain bit is set, APIC will deliver the interrupt to corresponding core.

Let’s see an example…

#
# cat /proc/interrupts
         CPU0     CPU1     CPU2     CPU3
  0: 19599546        0        0        0   IO-APIC-edge  timer
  8:        0        0        0        0   IO-APIC-edge  rtc
 11:        0        0        0        0  IO-APIC-level  acpi
169:        0        0        0        0  IO-APIC-level  uhci_hcd:usb1
177:        0        0        0        0  IO-APIC-level  qla2xxx
185:        0        0        0        0  IO-APIC-level  qla2xxx
193:    95337        0        0        0  IO-APIC-level  ioc0
209:        0        0        0        0  IO-APIC-level  uhci_hcd:usb2
217:   100778        0        0        0  IO-APIC-level  eth0
225:    56651        0        0        0  IO-APIC-level  eth1
NMI:      466      393      422      372
LOC: 19600453 19600434 19600401 19600279
ERR:        0
MIS:        0
#
#
# echo "2" > /proc/irq/217/smp_affinity
# cat /proc/interrupts
         CPU0     CPU1     CPU2     CPU3
  0: 19606722        0        0        0   IO-APIC-edge  timer
  8:        0        0        0        0   IO-APIC-edge  rtc
 11:        0        0        0        0  IO-APIC-level  acpi
169:        0        0        0        0  IO-APIC-level  uhci_hcd:usb1
177:        0        0        0        0  IO-APIC-level  qla2xxx
185:        0        0        0        0  IO-APIC-level  qla2xxx
193:    95349        0        0        0  IO-APIC-level  ioc0
209:        0        0        0        0  IO-APIC-level  uhci_hcd:usb2
217:   101027       49        0        0  IO-APIC-level  eth0
225:    56655        0        0        0  IO-APIC-level  eth1
NMI:      466      393      422      372
LOC: 19607629 19607610 19607577 19607455
ERR:        0
MIS:        0
#

As we can see, once we enter the magical command, CPU1 begins receiving interrupts from eth0, instead of CPU0. The echo command that changed the state of affairs is especially interesting. It is “2” that we’re echoing into the file. Writing “4” to the file, would cause eth0 interrupt be handled by CPU2, instead of CPU1. As I already mentioned, it is a bitmask where one bit correspond to single CPU.

How about writing “3” into the file. In theory, this should cause APIC to divert interrupts to CPU0 and CPU1. Unfortunately, things are a little more complicated here. It all depends on whether APIC works in physical “destination mode” and low priority “delivery mode”. If it is so, than you most likely would not be seeing CPU0 handling all interrupts. This is because when kernel configures APIC to work in physical/low priority modes, it automatically tells APIC to load balance interrupts between first eight cores.

So if on your system CPU0 handles all interrupts by default, this probably means that APIC configured ambiguously.

Ultimate solution

First of all, unfortunately there is no choice but to replace the kernel. Software that configures APIC is part of the kernel and if we want to change things we have no choice but to fix things in kernel. Things related to APIC are not configurable, so we have absolutely no choice. The only question is, replace kernel with what?

I tested this with OpenSuSE 10.2 that comes with kernel 2.6.18. Installing kernel 2.6.24.3 (the latest at the moment) with OpenSuSE’s default kernel configuration (/proc/config.gz) fixes the problem. With this kernel, things look like this, right from the start:

# cat /proc/interrupts
         CPU0     CPU1     CPU2     CPU3
  0:   728895   728796   728624   728895  IO-APIC-edge     timer
  8:        0        0        0        0  IO-APIC-edge     rtc
 11:        0        0        0        0  IO-APIC-fasteoi  acpi
 16:        0        0        0        0  IO-APIC-fasteoi  uhci_hcd:usb1
 19:        0        0        0        0  IO-APIC-fasteoi  uhci_hcd:usb2
 24:    14090    14090    14327    14056  IO-APIC-fasteoi  ioc0
 49:        7        9        7        8  IO-APIC-fasteoi  qla2xxx
 50:        8       12       11       10  IO-APIC-fasteoi  qla2xxx
 77:     2849     2759     2841     2827  IO-APIC-fasteoi  eth0
 78:    25072    25138    24996    24980  IO-APIC-fasteoi  eth1
NMI:        0        0        0        0
LOC:  2915270  2915256  2915228  2915092
ERR:        0

Looks good isn’t it? All cores handle interrupts, thus working with maximum efficiency. Now how about getting this result with just any kernel version? It appears to be doable.

There’s a kernel configuration option that stands in our way and once removed you will get similar situation with probably any kernel  newer than 2.6.10. The option is CONFIG_HOTPLUG_CPU. It adds support for hotplugable CPUs. It appears that having this option off, makes kernel configure APIC properly.

Actually  it is quiet understandable. You see, APIC has to be told what processors should receive interrupts. You need additional piece of code that tells APIC how to handle processor removals – processor removal is one of the things that CONFIG_HOTPLUG_CPU allows you to do. I assume that this functionality was missing from earlier kernel and got inside in 2.6.24.3.

Conclusion

We saw that we can achieve really nice results by doing some modifications to kernel configuration. On a very busy system, doing this small configuration change can boost server’s productivity by large margin.

I hope you will find this information useful and use techniques I described in this article.

Did you know that you can receive periodical updates with the latest articles that I write right into your email box? Alternatively, you subscribe to the RSS feed!

Want to know how? Check out
Subscribe page

66 Comments

  1. Jim M says:

    How about the MSI/MSI-X interrupts? Is the information you provided only for “INTA”? Isn’t the goal of the MSI/MSI-X to transfer the irqs around a server?
    Thanks,
    Jim

    • @Jim M
      Jim, to be honest I don’t know :-) MSI interrupts delivered directly to local APIC (which sits right in your core) and you can configure where goes each interrupt. Apparently this has nothing to do with interrupt load balancing that IO-APIC does. On the other hand, I saw that once the IO-APIC load balancing works well, MSI interrupts being load balanced too.
      I think my best advice to you would be go on and check.

  2. Harm-Jan says:

    I saw in another post that irqbalanced can also solve this problem, as it has on my SMP system. How do you stand with respect to this tool? (see: http://www.uwsg.iu.edu/hypermail/linux/kernel/0302.2/1861.html)
    Should I use this tool, or rebuild my kernel?

    • @Harm
      I am familiar with this tool. It uses the same technique as I described in the article – i.e. writes into /proc/irq/<irq number>/smp_affinity. As far as I know, and please don’t try to catch me here because things evolve and I could have missed something, this tool is good when you have power safe mode that turns off certain CPU, thus interrupts should be routed to remaining CPU. I.e. this tool is more for desktop machines rather than servers. As such, I believe it does its job well. However, because it uses same technique as I described it may need CONFIG_HOTPLUG_CPU kernel configuration trick to work.

  3. Pavel says:

    Thank you very much for your article. I replaced 2.6.22 kernel with 2.6.28 and suddenly all interrupts were processed by CPU0 and performance was drastically degraded. While googling for the cause I found your article and now things are clear for me.

    However, I have a question:
    It is possible to disable CONFIG_HOTPLUG_CPU only if power management support is disabled as well. This means disabling ACPI support too. Is it a correct thing to do?

    Thanks in advance.

  4. @Pavel
    I am surprised to hear that you have problems with 28. I had kernel 24 working good, so I expected 28 work as well. Can you please post here if you managed to solve this problem and if so, how? Thanks!
    As for your question, ACPI and CPU hotplug are almost entirely unrelated to each other so you can safely disable one and keep the other.

  5. Pavel says:

    Well, I didn’t try irqbalance or /proc/irq//smp_affinity trick on 28, I’ll give it a try tomorrow.

    About kernel config – there is no option in menuconfig to enable ACPI and disable HOTPLUG_CPU :(
    As soon as I enable “power management” section under which ACPI resides, HOTPLUG_CPU is enabled automatically and cannot be deselected :(

  6. @Pavel
    Please give me a day or two to check this out. Is this a vanilla kernel 2.6.28 we’re talking about? I mean official Linux kernel you get from http://www.kernel.org?

  7. Pavel says:

    Thank you for your readiness to spend some time on my problem! Please don’t be in a hurry, this is not an urgent question :)

    Yes, I’m using vanilla kernel:
    http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.28.7.tar.bz2

    Thanks in advance.

  8. @Pavel
    Pavel, I’ve just installed kernel 2.6.28-7 and it works well for me. I am using default OpenSuSE 10.2 configuration (copied it from /proc/config.gz). I have both ACPI settings and HOTPLUG_CPU enabled. Interrupts sharing works properly – i.e. all CPUS handling all interrupts.

  9. Pavel says:

    Thank you very much for your tests.
    Alexander, are you sure your system is not running irqbalance(d) and interrupts are distributed by kernel?
    If irqbalance or irqbalanced are not running indeed, could you please share your kernel config with me?

  10. @Pavel
    I explicitly disabled irqbalance. Grab the config file here: http://www.alexonlinux.com/config-2.6.28.7
    Note that it has both CONFIG_ACPI and CONFIG_HOTPLUG_CPU set to ‘y’. Please let me know how it is going.

  11. Pavel says:

    Once again, thank you :)
    I hope to be able to try new kernel this weekend.

  12. Pavel says:

    Alexander, using your config kernel is distributing interrupts over different CPUs, guess I’ll have to examine my kernel config closely to find what I did wrong.

    Thank you!

  13. @Pavel
    I am glad to hear that. Please post here your findings – this might help someone in the future :D

  14. yuanbor says:

    I use the same Linux-2.6.25 kernel and the same config file. But, the appearance is not the same in AMD platform and Intel. In the Intel machine, interrupts is distributed over all CPUs while only distributed over the first one in the AMD machine. I don’t know the reason.

  15. @yuanbor
    To be honest I don’t know how to help you. As far as I know, interrupt load balancing does work in kernel 2.6.25. I suggest that you ask vendor of the AMD platform if they know anything about this issue.

  16. Praveen says:

    Hi
    My kernel is 2.6.21 and driver is e1000 .adapter intel 82542 .
    I have disabled NAPI from my driver then I find that all the interrupts(eth0) are falling on a single cpu .If I try to divert it over other cpus by writing in /proc/irq/n/smp_affinity system gets crashed.though this is by default ffff. but it reaches to only one cpu at one time.I tried your option but didnot work in my file I have CONFIG_HOTPLUG .so I turned it off. Let me know is this a normal behaviour?

  17. Praveen says:

    I think two interrupt handler for one interface running over many cpus when NAPI is disabled can create some contention so I think all the interrupts are intentionally getting to a single cpu.is it correct interpretation?

  18. Originally Posted By PraveenHi
    My kernel is 2.6.21 and driver is e1000 .adapter intel 82542 .
    I have disabled NAPI from my driver then I find that all the interrupts(eth0) are falling on a single cpu .If I try to divert it over other cpus by writing in /proc/irq/n/smp_affinity system gets crashed.though this is by default ffff. but it reaches to only one cpu at one time.I tried your option but didnot work in my file I have CONFIG_HOTPLUG .so I turned it off. Let me know is this a normal behaviour?

    No. It is not normal behavior. Yet I cannot tell you for sure because I’ve never worked with e1000.
    In my opinion you should leave everything as is. Remember that serving interrupts on multiple cores causes cache misses which can be very expensive. So, I honestly don’t think you’ll get any performance boost by configuring this round-robin interrupt delivery scheme.
    This entire think is more suitable for servers.

    As for your second question, the answer is no. NAPI has nothing to do with which core receives the interrupts.

  19. Praveen says:

    Thanks for reply
    If one packet will be handled by one core and other packet will be on other core then respective cores will contain their handled packet data—then I with my limited knowlege donot know how cache miss will occur.like core1 will contain packet1 data core2 will contain packet2 data and so on. so upper layer will receive data from the corresponding cores—-Pls let me know the cache miss scenario in this case

    In fact This issue is for servers where one core is getting all the interrupts . If driver is NAPI then I understand the reason but when it is disabled then why all the interrupts are reaching to a single cpu–that is bothering me.I assume that one packet is processed by one cpu and other should be on different cpus—-in this way load should be disbursed. but in my case when I try to distribute load system crashes down.

  20. @Praveen
    First, I owe you an apology. I confused e1000 with eeePC (the netbook). I do work with e1000 and well familiar with it. My suggestion to leave this whole thing alone is also a result of the same misunderstanding.

    Anyway,
    1. There is no problem with two completely unrelated packets. But what if two packets belong to the same TCP connection. Then two cores will attempt to access single connection descriptor. This will cause cache miss.
    How severe the cache miss is? It is a question of how much traffic your computer receives and what else your computer does. If it handles moderate number of packets and is not very busy, then it is better to leave things as they are (even though this is not a netbook LOL). If you are really pushing it in terms of CPU consumption, IO and the rest, then it is better to find a way to spread those interrupts among all cores.

    2. NAPI is a feature that reduces number of interrupts. When your computer receives X packets per second, without NAPI it will have to handle X interrupts. With NAPI number of interrupts will be smaller. I.e. Without NAPI, single interrupt delivers single packet. With NAPI, single interrupt delivers many packets. So, when all interrupts land on one CPU0 it does not matter if you have NAPI or not. This is because even with NAPI, CPU0 will have to handle the packet. Bottom line is that NAPI does not affect how your interrupts being delivered. It only affects number of interrupts.

    As for the issue itself. First of all, e1000 has nothing to do with how interrupts being delivered. Component that decides what CPU should handle certain interrupt called IO-APIC. It is on your motherboard. Ability of your computer to spread interrupts among all cores depends on IO-APIC.
    I can’t tell you why the trick I described in the article does not work for you. One of the possible reasons is because you have too many cores. When IO-APIC configured to spread interrupts among all cores, it can handle up to eight cores. If you have more than eight cores, kernel will not configure IO-APIC to spread interrupts. Thus the trick I described in the article will not work.
    Otherwise it may be caused by buggy BIOS or even buggy hardware.

  21. nitr0 says:

    2.6.30 kernel – disabling HOTPLUG_CPU does nothing for me :( Amost all interrupts are on cpu1 (LAN uses MSI), so still chance to use both cores in my situation – to use bonding :(
    IRQ stats on test machine:

    # cat /proc/interrupts
    CPU0 CPU1
    0: 13320 33974 IO-APIC-edge timer
    1: 1 7 IO-APIC-edge i8042
    4: 0 2 IO-APIC-edge
    7: 1 0 IO-APIC-edge
    8: 1 111 IO-APIC-edge rtc0
    9: 0 0 IO-APIC-fasteoi acpi
    14: 188 3289 IO-APIC-edge pata_amd
    15: 0 0 IO-APIC-edge pata_amd
    17: 88 750 IO-APIC-fasteoi eth1
    20: 0 0 IO-APIC-fasteoi ohci_hcd:usb3
    21: 0 0 IO-APIC-fasteoi ehci_hcd:usb2
    22: 0 0 IO-APIC-fasteoi ehci_hcd:usb1
    23: 0 0 IO-APIC-fasteoi ohci_hcd:usb4
    26: 0 0 PCI-MSI-edge ahci
    27: 952 1733073 PCI-MSI-edge eth0
    NMI: 0 0 Non-maskable interrupts
    LOC: 16823 17504 Local timer interrupts
    SPU: 0 0 Spurious interrupts
    RES: 5590 4581 Rescheduling interrupts
    CAL: 39 20 Function call interrupts
    TLB: 810 784 TLB shootdowns
    TRM: 0 0 Thermal event interrupts
    THR: 0 0 Threshold APIC interrupts

  22. […] written an article explaining this mechanism in greater detail. Yet let me remind you how it works in two […]

  23. nitr0 says:

    Yes, I know about cache misses and other bad things. But I really need to use all cores to process interrupts – just because there is a shaper and pptp server on that machine, and both of them needs enough CPU resources (I have up to 50 kpps on receive and on transmit).

    P.S. And how about 2.4 kernels?

  24. @nitr0
    Actually, I am writing another post that presents another alternative to smp affinity, in case it doesn’t work. So stay tuned.

    I don’t know about kernel 2.4, but I doubt that something that doesn’t work with kernel 2.6 would work in 2.4. Give it a shot and see if it helps.

  25. […] SMP affinity and proper interrupt handling in Linux […]

  26. Chris says:

    The kernel option which appears to disable IRQ sharing among CPUs is CONFIG_CGROUPS

    Setting it off allows IRQs to be shared.

    I have not yet investigated much further than that yet

  27. allexch says:

    Hi, I’ ve got a question about “Its configuration is limited to first eight cores. I.e. if you have more than eight cores, don’t expect any core higher than 7 to receive interrupts.”

    I have HP server with 16 cores (2 proc *8 corese each) and 8 NICs. I am able to assign interrupts from all af this 8 NICs to 8 cores, so at least 6 cores are not used. I tried to find out how to overcome this restriction, but I was not successful. So I do realy interested if there is any way to start using other cores in my OS?
    ——————————
    HP proliant Dl360 G6
    Linux version 2.6.31-gentoo-r6
    ——————————

  28. Chris says:

    You can set affinity to cpus higher than 8:

    root@clusterdb-tp-02 ~ # egrep ‘CPU|66:’ /proc/interrupts
    CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
    66: 1194464 0 0 0 0 0 0 0 42 0 0 0 0 0 0 1323 IO-APIC-level aacraid

  29. Chris says:

    @Chris – echo 8000 > /proc/irq/66/smp_affinity

  30. @allexch
    It depends on mode in which io-apic works. There are two modes. In logical mode, io-apic can spread interrupts among multiple cores. This mode is limited to 8 cores. So, if you have more than 8 cores, io-apic will not work in logical mode and switch to physical mode.
    In physical mode, io-apic will deliver every interrupt to a single core. In this mode, io-apic is limited to 256 cores.

  31. @Chris
    I see three of your comments now, but I struggle to understand how any of them has to do with other questions in comments and the theme of this article :-) Can you please elaborate a bit?

  32. allexch says:

    So if I am going to install 2*10G NICs I will certainly have problems? because in logical mode 4 cores to one 10G NIC is too little, and in that instance there’s no sence in physical mode, am I right?

  33. Originally Posted By allexch

    So if I am going to install 2*10G NICs I will certainly have problems?

    No. It doesn’t matter how many NICs you have. It’s all about number of cores.

    because in logical mode 4 cores to one 10G NIC is too little, and in that instance there’s no sence in physical mode, am I right?

    At your place I would not be concerned about io-apic at all. 10G NICs tend to support MSI-X. This means you will have multiple interrupt vectors per NIC. So, stick to physical mode and redirect each interrupt vector to different core. This way you’ll spread weight of interrupt handling among multiple cores.
    For more details, see this:
    http://www.alexonlinux.com/msi-x-the-right-way-to-spread-interrupt-load
    and this:
    http://www.alexonlinux.com/why-interrupt-affinity-with-multiple-cores-is-not-such-a-good-thing

  34. pkun says:

    Hello. Thanks for the article.
    We have some mystic with IRQ balancing. We check a several computers for the IRQ balancing. The same linux distributive (Ubuntu server edition, Ubuntu desktop) use IRQ balancing on some computers and don’t use on the others. We cannot find fundamental differencies beetween these platforms. For example we have two computers with Gigabyte motherboards (different models with AWARD bios) with the same APIC controller (v01 GBT GBTUACPI 42302E31 GBTU 01010101). The IRQ balancing works on first computer and doesn’t work on the other. The kernels is the same (without any changes in .config).

    Then we try to build custom vanilla kernel 2.6.33.3. We set maximum CPU number to 8 (the kernel check if max CPU number bigger than 8 and set physical APIC mode), disable CGROUPs, disable HOTPLUG_CPU (and ACPI too. It has hard dependence) and it is useless. The dmesg says the APIC in flat mode but it has not IRQ balancing.

    Another computer has 8 cpus, ASUSTEK motherboard with AMI bios (different manufacturer and bios) and don’t have IRQ balancing too.

    Is it so widely-spreaded problem with bios or hardware? We have not another ideas about it.

  35. allexch says:

    to pkun
    Hi. I have got the same problem, I can’t spread Interrupts between cores.
    Instances:
    1)
    Xeon 5550, 2CPU’s with 4 core each, and NC522SFP+ NIC (10Ge)
    2.6.31-gentoo-r10
    2.6.32-gentoo-r7
    2.6.34-rc7-git2

    2)
    Core I7 with 4 cores, Inlel 82576 NIC with igb 2.1.9 driver
    2.6.32-gentoo-r7
    None of this instances does not allow me to spread interrupts, I tried both legasy mode, msi and msi-x. :(

  36. pkun says:

    It’s bad there is the same problem on the modern system (i7). :( One of my thought was that it’s a problem of ancient hardware/bios and it was fixed later.

    Today I tried ubuntu with 2.6.28 kernel and debian with 2.6.26 kernel. It’s useless.

    The experiments (on our hardware) don’t show any dependence on MSI or non-MSI interrupt sources. The systems with good IRQ balance used MSI devices succesfully. On bad systems (all interrupts is on CPU0) we disable MSI and MSI devices become devices with common interrupt mechanism. So it was useless for IRQ balance too…

    Probably I’ll try 2.6.24 vanilla kernel. Alexander Sandler used it for the article.

  37. allexch says:

    to pkun
    I have 3 more HP servers with Xeon 5550 with 2.6.31-r6 – smp_affinity works.

    I tried this(2.6.31-r6) on new severs, mentioned above, IT does not work. I tried plenty of patches like : https://patchwork.kernel.org/patch/73766/, they did not help me. So I am full of curiosity what is wrong? Is it new motherboard, not fresh bios or what? (I hardly believe now that it depends on kernel version). But I am looking forward for your test result with 2.6.24 vanilla kernel.

  38. pkun says:

    Yes, 2.6.24.7 doesn’t work. This kernel is rather old so I had some problems with compilation and testing (environment is newer). But I get system with loaded drivers and /proc/interrupts. There is no changes in IRQ balancing.

    So I think it’s not a kernel problem or it’s very very :) internal kernel problem don’t dependent on CONFIG_…
    I use apic=debug show_lapic=all kernel parameters and cannot find fundamental differencies in dmesg on “good” and “bad” systems. The both use IOAPIC for interrupt routing. Both use flat APIC mode, the loaded routing tables is very similar.

    At the same time it’s very strange for me so many platforms have hardware/bios/apic bugs. It’s about 1/2 or 1/3 of all systems. I understand our tests and allexch’s tests is not anough for statistics but there is too many “broken” systems. The worst we have not criterion to choose “good” platform to buy. It’a a lottery.

    Is it a Linux problem? Do you try some other systems on broken platforms? May be BSD or even Windows.

  39. allexch says:

    I guess it is not a Linux problem, but a hardware! May be some problems with interrupts on motherboard slot, but i have no idea how to investigate it.

  40. @pkun
    @allexch
    Hi Guys.

    Sorry it took me some time to get back to you – vacation :-)
    There’s very little that can be done to address the problem you’re facing. If it helps I can find where kernel decides how to handle interrupts, but IMHO, debugging this thing is by far the worst way to address the problem. I’d go to mobo vendor technical support, but chances they’ll help with a problem of this kind are futile.
    IMHO the only thing that you can do is to try other way to spread interrupt load – I have a couple of posts on the subject in Related Posts above.

    @pkun
    Note that i7 comes with new version APIC controller. It’s called x2APIC. See here: http://www.intel.com/Assets/pdf/manual/318148.pdf
    It has some advantages over xAPIC – it can direct interrupts to up to 32 cores simultaneously. But it has to be supported by the kernel (recent kernels support it so no problems here) and enabled.

  41. Douglas says:

    Hello Alexander,

    Great Post! It really answered a lot of questions that I had.

    Thanks,

    Douglas M.

  42. […] going to be loaded up on CPU0 and that was probably the source of my issue. He pointed me at this article to help translate to my […]

  43. […] the process for managing multi cpu threading. See http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux for the answers you seek on how to lower it down, but basically its the way the system handles […]

  44. Han says:

    Great Post. Help me understand the logic of IRQ balance a lot.

    There is still some problem with your solution.
    I have a 8-core machine, when running a web performance test tool, all interrupts on eth0 (has IO-APIC support) went to CPU0, so I applied your solution, it does work, the interrupts are distributed to all CPUs, but the distribution is not equal, most of the interrupts still go to one core, on my server, it’s the last CPU core. Any clue?

    Thanks,
    Han

  45. […] …. Yes, I believe it's doable via what is called SMP affinity. Try to read: http://www.alexonlinux.com/smp-affin…dling-in-linux Good […]

  46. Sysadmin says:

    @Harm-Jan – Yes, you are right. I confirm that irqbalance daemon does its job very well. I was experienced troubles with high CPU load in one of my servers until I noticed that irqbalance died. When looking into /proc/interrupts I saw that almost all interrupts were on core0 (among total 16 cores). When I restarted irqbalance, all troubles went away.

  47. @Han
    Note that it depends on load on your network cards. More packets arriving means more interrupts so the load may not be even. It also depends on type of traffic. What matters here is number of packets and not necessarily incoming traffic.

  48. Gustav says:

    Hi Alexander,

    i have a server with 24 cores and several network cards. The NICs are in mode PCI-MSI-edge. All interrupts are handled by CPU0. Kernel is 2.6.32-32-preempt. I want to redirect the interrupts caused by eth1 to CPU1, eth2 to CPU2, eth3 to CPU3 and so on. But when i do “sudo echo 2 > smp_affinity” the permission is denied and i want to edit the smp file with vim but i get “smp_affinity” E667: Fsync failed (And i do have enough disk space). I don’t get it.

    regards

Leave a Reply

Prove you are not a computer or die *