MSI-X – the right way to spread interrupt load

Posted on November 18, 2009, 9:46 am, by Alexander Sandler, under Blog, Programming Articles, System Administrator Articles.

When considering ways to spread interrupts from one device among multiple cores, I can’t not to mention MSI-X. The thing is that MSI-X is actually the right way to do the job.

Interrupt affinity, which I discussed here and here, has a fundamental problem. That is inevitable CPU cache misses. To emphasise this, think about what happens when your computer receives a packet from the network. Packet belongs to some connection. With interrupt affinity the packet would land on core X, while the chances are that previous packet on the same TCP connection has landed on core Y (X ≠ Y).

Handing the packet would require kernel to load TCP connection object into X’s cache. But, this is so ineffective. After all, the TCP connection object is already in Y’s cache. Wouldn’t it be better to handle second packet on core Y as well?

This is the problem with interrupt affinity. From one point of view we want to spread interrupts to even the load on cores. From another point of view, doing simple round robin isn’t enough. The little fella that decides where each interrupt goes, should be able to look into the packet and depending on what TCP connection it belongs to, send the interrupt to core that handles all packets that belong to this connection.

Ideally, NICs should be able to:

Look into packets and identify connections.
Direct interrupt to core that handles the connection.

Apparently, this functionality already here. Devices that support MSI-X do exactly this.

Meet MSI-X

MSI-X is an extension to MSI. MSI replaces good old pin based interrupt delivery mechanism.

Each IO-APIC chip (x86 permits up to 5) has 24 legs, each connected to one or more devices. When IO-APIC receives an interrupt, it redirects the interrupt to one of the local-APICs. Each local-APIC connected to a core that receives an interrupt.

MSI provides a kind of protocol for interrupt delivery. Instead of raising signal on pins, PCI cards send a message over MSI and IO-APIC translates the message into right interrupt. Theoretically this means that each device can have number of interrupt vectors. In reality, plain MSI does not support this, but MSI-X does.

Modern high-end network cards that support MSI-X, implement multiple tx-rx queues. Each queue tied up to an interrupt vector and each NIC has plenty of them. I checked Intel’s 82575 chipset. With igb driver compiled properly, it has up to eight queues, four rx and four tx. Broadcom’s 5709 chipset provides eight queues (and eight interrupt vectors), each handling both rx and tx.

In kernel 2.6.24, kernel developers introduced new member of struct sk_buff called queue_mapping. This member tells incoming NIC driver what queue to use when transmitting the packet.

Before transmitting the packet, kernel decides what queue to use for this packet (net/core/dev.c:dev_queue_xmit()). It uses two techniques to do so. First, kernel can ask NIC driver to provide a queue number for the packet. This functionality, however, is optional in NIC drivers and at the moment both Intel and Broadcom drivers don’t provide it. Otherwise, kernel uses a simple hashing algorithm that produces 16 bit number from two ip addresses and (in case of TCP or UDP) two port numbers. All this happens in function named simple_tx_hash() in net/core/dev.c.

When receiving packets, things are even easier because NIC firmware and the driver decide what queue to use to introduce the packet to the kernel.

Using this simple technique kernel and modern NIC’s can verify that packets that belong to certain connection land on certain queue. Using interrupt affinity binding techniques you can bind certain interrupt vector to certain core (writing to smp_affinity, etc). Thus you can spread interrupts among multiple cores and yet make sure there are no cache misses.

Bookmark: digg, del.icio.us, reddit, stumbleupon, technorati, twitter, google, yahoo, facebook
Comment (RSS) | Trackback

Did you know that you can receive periodical updates with the latest articles that I write right into your email box? Alternatively, you subscribe to the RSS feed!

Want to know how? Check out
Subscribe page

28 Comments

telenne barz says:

May 17, 2010 at 12:06 pm

Hi Alex !

Once again, here’s a nice article… Thanks for sharing your knowledges.

For outbound packets, the Kernel builds a hash based on IP addresses and port numbers (source & destination, I suppose ?) in order to bind the corresponding flow to a given TX queue. I was wondering if the hash is build in the same manner for inbound packets / RX queues ?

What I understand is that the driver is in charge of binding a given ingress flow to a given RX queue. Does that mean that the sysadmin cannot configure it a posteriori (with ethtool for instance) ?

“Using interrupt affinity binding techniques you can bind certain interrupt vector to certain core” : can you please give us further details on how to to setup that ? Does that mean that each queue will appear as a particular device under /proc/interrupts ?

Finally, did you hear about the TNAPI and PF_RING patches of Lucas Deri (http://www.ntop.org/TNAPI.html) ? If the MSI-X feature is already implemented in the concerned drivers (Intel igb, igbx), I don’t catch what is the benefit of the TNAPI patch. What is your opinion about this ?

Telenn

Reply to this comment
ninez says:

July 3, 2010 at 6:23 pm

MSI-X is great, and is also now used by defualt in the linux kernel.

2.6.33 and onwards!

no more sharing irq on my laptop.

great article!

Reply to this comment
Alexander Sandler says:

July 4, 2010 at 10:38 pm

@ninez
Thanks for sharing your experience and for a warm comment. Please come again!

Reply to this comment
marvniek says:

August 25, 2010 at 9:02 pm

Great article Alex, thanks a lot

Reply to this comment
Antonio Kennemuth says:

April 15, 2011 at 9:00 am

I have not checked in here for a while as I thought it was getting boring, but the last few posts are good quality so I guess I¡¦ll add you back to my everyday bloglist. You deserve it friend

Reply to this comment
Joshua Lindsay says:

April 28, 2011 at 1:49 am

Are you aware if they make any plugins to help with Web optimization? I’m trying to get my weblog to rank for some targeted search phrases but I’m not seeing encouraging gains. In case you know of any make sure you let me know. It would mean a lot

Reply to this comment
epiphone les paul says:

May 14, 2011 at 7:45 am

Nice post. I was checking continuously this blog and I am impressed! Extremely helpful info particularly the last part I care for such info much. I was looking for this certain information for a very long time. Thank you and best of luck.

Reply to this comment
David Gray says:

September 2, 2011 at 8:16 am

It’s a shame you don’t have a donate button! I’d most certainly donate to this outstanding blog! I guess for now i’ll settle for book-marking and adding your RSS feed to my Google account. I look forward to fresh updates and will talk about this blog with my Facebook group. Chat soon!

Reply to this comment
Alexander Sandler says:

September 18, 2011 at 10:41 am

@DavidÂ Gray
Thanks

Reply to this comment
สิว says:

December 16, 2011 at 11:02 am

Thank for your articles, I will share your link in my facebook.
This is very nice for me.

Reply to this comment
Mitra (India) says:

January 30, 2012 at 1:24 pm

Thanks for the knowledge, god bless you.

Reply to this comment
Alexander Sandler says:

February 5, 2012 at 11:17 pm

Thank you.

Reply to this comment
InnoDB异步IO(AIO)实现详解 | 深入MySQL内核 says:

April 24, 2012 at 3:52 am

[…] http://www.alexonlinux.com/msi-x-the-right-way-to-spread-interrupt-load MSI-X the right way to spread interrupte […]

Reply to this comment
Madhav says:

November 22, 2012 at 7:07 pm

Very informative.

Reply to this comment
Anil says:

February 6, 2013 at 6:57 am

Quite an informative article. Thanks!
While the network stack has been modified to take advantages of the multiple queues provided by devices, is there something similar planned for storage side traffic as well?
Something similar to what other OSes (VMWare, Windows have to offer).

Reply to this comment
SMP IRQ affinity | /dev/pawwa says:

July 28, 2013 at 9:12 pm

[…] http://www.alexonlinux.com/msi-x-the-right-way-to-spread-interrupt-load […]

Reply to this comment
Why is ksoftirqd using 100% of the CPU? - Just just easy answers says:

September 6, 2013 at 5:06 pm

[…] You can read more here http://www.alexonlinux.com/msi-x-the-right-way-to-spread-interrupt-load […]

Reply to this comment
adelaide hills b&b says:

June 27, 2014 at 1:26 pm

adelaide hills b…

MSI-X – the right way to spread interrupt load – Alex on Linux

Reply to this comment
PCEngines APU.2C4 :: Sicherheit, TCP/IP says:

October 4, 2016 at 7:31 pm

[…] Berechnung von Prüfsummen sowohl für IPv4 wie auch IPv6 sowie Unterstützung von Message Signal Interrupt Extension (MSI-X), um die Verarbeitung von Datenpaketen auf Mehrkern-Systemen durch Parallelverarbeitung zu […]

Reply to this comment
msi laptop linux says:

January 27, 2017 at 1:11 am

[…] MSI-X – the right way to spread interrupt load […]

Reply to this comment
Silver Service Taxi Melbourne Airport says:

July 2, 2018 at 11:56 am

Silver Service Taxi Melbourne where you will ride in latest luxury and vehicles by professional drivers within cheap prices,Travel safe,comfort and fast.

Reply to this comment
- Melbourne Taxi says:
  
  November 14, 2018 at 10:29 am
  
  Luxury Taxi SEO, Contact me for SEO
  
  Reply to this comment
Airport Taxi Melbourne Airport says:

September 25, 2018 at 10:20 am

Airport TAXI & Silver Service Taxi Melbourne vehicles have you locked in.. You will ride in latest luxury and also luxury vehicles by professional drivers within cheap prices,Travel safe,comfort and fast.

Reply to this comment
Silver Service Melbourne says:

October 26, 2018 at 11:46 am

My Silver Service offers high-class silver Cabs and chauffeur services for airport pickup, special events occasion, sightseeing tour, and winery tour and parcel delivery. For taxi or chauffeur service to airport or any destination, simply book your ride with us.

Reply to this comment
duck life says:

November 20, 2018 at 6:05 am

Thanks for sharing this great. Keep sharing more useful and conspicuous stuff like this. Thank you so much

Reply to this comment
Taxi Melbourne says:

December 31, 2018 at 7:07 am

Nice Post! Loved to read it

Reply to this comment
Tunbrige wells taxi says:

May 29, 2019 at 8:15 am

Thanks for sharing very helpful for me

Reply to this comment
Dandenong Taxi says:

July 9, 2019 at 12:04 pm

Nice information about MSI-X! It is very helpful for us. Thanks for sharing this amazing post.

Reply to this comment