pthread spinlocks

Continuing my previous post, I would like to talk about relatively new feature in glibc and pthreads in particular. I am talking about spinlocks.

Quiet often you want to protect some simple data structures from simultaneous access by two or more threads. As in my previous post, this can even be a simple integer. Using mutexes and semaphores (critical sections if you wish) to protect simple integer is an overkill and here’s why.

Modifying or reading value of a single integer requires quiet often as few as two or three CPU instructions. On the other hand, acquiring semaphore is a huge operation. It involves at least one system call translated into thousands of CPU instructions. Same when releasing the semaphore.

Things are a little better with mutexes, but still far from being perfect. Posix mutexes in Linux implemented using futexes. Futex stands for Fast-Usermode-muTEX. The idea behind futex is not to do system call when locking unlocked futex. Waiting for locked futex would still require system call because this is how processes wait for something in Linux. Yet locking unlocked futex can be done without asking kernel to do things for you. Therefore, locking futex is, at least in some cases, is very fast.

The problem is that in rest of the cases mutex in Linux is slow as semaphore. And as with semaphores, spending thousands of CPU cycles just to protect a single integer is definitely an overkill.

This is exactly the problem spinlocks solve. Spinlock is another synchronization mechanism. It works in the same manner as mutexes. I.e. only one thread can have it locked at the same time. However there’s a difference.

When thread tries to lock locked spinlock, it won’t sleep waiting for the spinlock to get unlocked. It will do busy wait, i.e. spinning in a while loop. This is why its called spinlock.

Locking spinlock takes only tens of CPU cycles. One important thing with spinlocks is to hold it for short period of time. Don’t be surprised when your program will begin consuming too much CPU, all because one of your threads holds some spinlock for too long. To avoid this, try to avoid executing big chunks of code while holding a spinlock. And by all means avoid doing I/O while holding spinlock.

Now for the easy part.

First, to make things work, you have to include pthread.h. pthread_spin_init() initializes the spinlock. pthread_spin_lock() locks one and pthread_spin_unlock() unlocks it. All as with pthread mutexes. And there are manual pages for each and every one of them of course.

Did you know that you can receive periodical updates with the latest articles that I write right into your email box? Alternatively, you subscribe to the RSS feed!

Want to know how? Check out
Subscribe page

6 Comments

  1. Cyril says:

    Well, this is wrong.
    It’s only true for SMP machine (the one with REAL multiple processor, and not hyperthreading).

    If you only have a single CPU (or HT), then whatever you do, only one thread at a time can actually modify a memory value.
    So while you’re spinlocking, you’re simply consuming CPU power for nothing, as, actually get the spinlock, you’ll have to wait for a kernel based context switch, and hoping you’re not elected back.

    This means that, depending on kernel scheduler :
    Thread A
    take spin lock
    write value to protected int

    Thread B
    want to take spin lock
    (use 100% cpu for the whole thread time slice)
    Here, the kernel can decide to reelect thread B, as it’s the thread doing the more work
    Thread B
    want to take spin lock
    (use 100% cpu for the whole thread time slice)
    and so on…

    This shame is very very dangerous, that’s why in linux kernel, spinlock is a macro (not a function) that get disabled on SUP machine.

  2. Frédérik says:

    Well I have run the test code on a 2 dual core CPU,so 4 cores (dual Intel Xeon E5420 @2.50GHz) running linux 2.6.22. Then the result is that Mutex are faster.

    With mutex:
    Consumer TID 14146
    Consumer TID 14147
    Result – 4.739984

    With Spinlock:
    Consumer TID 14464
    Consumer TID 14465
    Result – 5.785072

    With -O2 optimisation flags the difference is reduced but mutex are still faster.

  3. peterix says:

    When you do IPC (setting PTHREAD_PROCESS_SHARED and putting the spinlock into shared memory to control access to some shared resource), the scheduler becomes your enemy.

    Say you have a dual-core CPU and two processes use a spinlock to regulate access to a shared memory segment (sharing an int for example). Then comes a third CPU-hungry process and suddenly your performance goes to hell with latency going from microseconds to seconds.

    So, beware. Spinlocks are of limited usefulness even on SMP machines. The kernel can afford that, because it has no ‘completely fair’ scheduler to deal with :/

  4. Mike says:

    Ngargh! This post is ancient but the commenters compel me to reply!

    The point of using spinlocks is never to hold them very long, e.g. for modifying a small bit of memory like Alex said. Then release them again. You shouldn’t even make system calls or anything while holding them. They are extremely hot potatoes that you do not want to hold.

    Cyril: After thread A gets the spinlock, it should also be at 100% cpu until it releases it again. It should also hold it for very little time; much less than a timeslice. If by luck its timeslice ends and B is scheduled, B may spin for a timeslice, it’s true. But B should be no busier than A. Thread A should be hustling while it has that lock.

    peterix: Same as above. You’re right, they don’t play nice with schedulers. The idea is not to let it get to the scheduler most of the time. Grab and release the lock quickly so that the holding thread rarely gets descheduled while holding it. Don’t make syscalls which could deschedule the thread while holding it.

    If the contending threads are constantly rapidly taking and releasing the spinlock, such that they’re likely to be interrupted while holding the lock, you’re spending too much time on synchronization anyway. It would be better to batch the communications somehow, then use mutexes that can safely be held for longer times. And maybe use spinlocks for a short, uncommon metadata transfers outside the batch.

  5. […] this article I would like to continue subject I started in my previous two posts (post 1 and post2). Question I am trying to answer is what is the most efficient, yet safe way of accessing simple […]

Leave a Reply


8 × four =