pthread mutex vs pthread spinlock

Update 06/16/2009: On several occasions, commentators to this article pointed out that this post is somewhat incomplete. Therefore, before I continue, I would like to make some things clear.

Before considering the code below, please note that spinlocks are totally useless on uni-processor computers. This is due to a nature of spinlocks, which I will describe in future posts.

Also, it is not the intent of this article to show that spinlocks better than mutexes in all circumstances. Spinlocks perform better than mutexes in this particular setup, on multi-processor computer. It does not mean that spinlocks are good for you. Apply your own judgment when considering using materials presented in this article for your needs.

In this article I would like to demonstrate you how spinlocks are more efficient than mutexes. I’ll demonstrate a program that does the following.

It builds a list of integers and then starts two threads, each removing entries from the list. Since both threads work with the same list at the same time, list has to be protected with synchronization mechanism of some kind.

Depending on whether the program has been compiled with USE_SPINLOCK or not, it will use a spinlock or a mutex to protect the list. In any case, it will display number of seconds that has passed from the moment threads has been started, until they finish.

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>
#include <sys/time.h>

#include <list>

#define LOOPS 10000000

using namespace std;

list<int> the_list;

#ifdef USE_SPINLOCK
pthread_spinlock_t spinlock;
#else
pthread_mutex_t mutex;
#endif

pid_t gettid() { return syscall( __NR_gettid ); }

void *consumer(void *ptr)
{
    int i;

    printf("Consumer TID %lu\n", (unsigned long)gettid());

    while (1)
    {
#ifdef USE_SPINLOCK
        pthread_spin_lock(&spinlock);
#else
        pthread_mutex_lock(&mutex);
#endif

        if (the_list.empty())
        {
#ifdef USE_SPINLOCK
            pthread_spin_unlock(&spinlock);
#else
            pthread_mutex_unlock(&mutex);
#endif
            break;
        }

        i = the_list.front();
        the_list.pop_front();

#ifdef USE_SPINLOCK
        pthread_spin_unlock(&spinlock);
#else
        pthread_mutex_unlock(&mutex);
#endif
    }

    return NULL;
}

int main()
{
    int i;
    pthread_t thr1, thr2;
    struct timeval tv1, tv2;

#ifdef USE_SPINLOCK
    pthread_spin_init(&spinlock, 0);
#else
    pthread_mutex_init(&mutex, NULL);
#endif

    // Creating the list content...
    for (i = 0; i < LOOPS; i++)
        the_list.push_back(i);

    // Measuring time before starting the threads...
    gettimeofday(&tv1, NULL);

    pthread_create(&thr1, NULL, consumer, NULL);
    pthread_create(&thr2, NULL, consumer, NULL);

    pthread_join(thr1, NULL);
    pthread_join(thr2, NULL);

    // Measuring time after threads finished...
    gettimeofday(&tv2, NULL);

    if (tv1.tv_usec > tv2.tv_usec)
    {
        tv2.tv_sec--;
        tv2.tv_usec += 1000000;
    }

    printf("Result - %ld.%ld\n", tv2.tv_sec - tv1.tv_sec,
        tv2.tv_usec - tv1.tv_usec);

#ifdef USE_SPINLOCK
    pthread_spin_destroy(&spinlock);
#else
    pthread_mutex_destroy(&mutex);
#endif

    return 0;
}

Lets see how this little program works in greater detail. main() starts with initialization of a mutex or a spinlock, in lines 67-71. Next it initializes the list. Number of entries it will push into the list depends on LOOPS definition from line 10. Finally, in lines 77-81 it measures current time and spawns two threads. Then it waits for both threads to finish their job.

The thread, between lines 24 and 56, runs in a loop and pops entries from the list. Once the list is empty, thread exits. Each iteration of the loop it locks the mutex/spinlock and then unlocks it (lines 32-36 and 51-55).

Once both threads are over, main() will measure the time again and print difference between two measurements. This difference is how we measure effectiveness. The less time it took to run, the better.

Lets run the code and see how it performs, first with mutex and then with spinlock.

~/lab/mutex-vs-spinlock --> g++ -Wall -pthread main.cc
~/lab/mutex-vs-spinlock --> ./a.out
Consumer TID 19602
Consumer TID 19603
Result - 7.99794
~/lab/mutex-vs-spinlock --> g++ -DUSE_SPINLOCK -Wall -pthread main.cc
~/lab/mutex-vs-spinlock --> ./a.out
Consumer TID 19610
Consumer TID 19611
Result - 2.587424
~/lab/mutex-vs-spinlock -->

As we can see, the picture is clear. We can see that using spinlock, it took 2.5 seconds to complete the test. However when used mutex, it took almost 8 seconds to complete. This is 3 times more time.

This proves that the spinlock is more effective in term of CPU consumption and speed.

Did you know that you can receive periodical updates with the latest articles that I write right into your email box? Alternatively, you subscribe to the RSS feed!

Want to know how? Check out
Subscribe page

58 Comments

  1. oleggal says:

    Thanks for interesting article. But I think it worth to be mention, that the results are relevant to multi-core CPUs.
    I run the tests on single CPU ARM board and got the following results:
    # ./test_mutex_arm
    Consumer TID 343
    Consumer TID 344
    Result – 26.367577
    # ./test_spinlock_arm
    Consumer TID 347
    Consumer TID 348
    Result – 27.14807

    Here we can see that the spinlock version takes more time.

  2. @oleggal
    Thanks for pointing this out. You are absolutely right. I should’ve mention this – I work with SMP machines all of the time and forgot that there are other kinds of computers out there.

    The truth is that spinlocks have no meaning on single-processor machines. This is because on SP computers, two threads cannot run at the same time for real. It is simulated by scheduler and the operating system, using preempts.
    Preempt in itself implies context switch and putting the thread asleep. But this breaks the reason why we wanted to use spinlocks in the first place – we wanted to avoid context switch.

    What intrigues me is the fact that, according to your test, spinlock was actually slower than mutex. I’d expect them to perform equally.

  3. pigiron says:

    “This proves that the spin lock is more effective in term of CPU consumption and speed.”

    That statement caused my eyebrows to go up a bit.

    While I can believe that a spin lock may be faster than a mutex in some circumstances (and slower in others), having a “more effective” CPU consumption puzzled me, and I don’t see where the test case even measures that resource.

    I always thought a spin lock does what it says by simply spinning until an event occurs, but the spec leaves a very wide interpretation on how this function can be implemented. So I cruised through the latest glibc code and it appears that assumption may be correct in my case. Now I wondered how a thread that is spinning can consume less CPU than one that is sleeping. Perhaps if the lock becomes available to the spinning thread in fewer CPU cycles than the code path length to perform a context switch, get off and back on the system’s run queue, etc., then this may be the case.

    But after running the test cases multiple times using /usr/bin/time, that somewhat inaccurate utility reports that the spin lock indeed consumes about 30% more CPU resources on my puny little Intel Atom 330 SMP box. So perhaps this is another case of how difficult it is to make absolute statements in this crazy and infinitely variable world of computers ;)

  4. Originally Posted By pigiron
    I always thought a spin lock does what it says by simply spinning until an event occurs, but the spec leaves a very wide interpretation on how this function can be implemented. So I cruised through the latest glibc code and it appears that assumption may be correct in my case. Now I wondered how a thread that is spinning can consume less CPU than one that is sleeping. Perhaps if the lock becomes available to the spinning thread in fewer CPU cycles than the code path length to perform a context switch, get off and back on the system’s run queue, etc., then this may be the case.

    Spinlocks are indeed more effective than mutexes when it takes more time to do the context switch and return from it than grab a spinlock, do the thing and release the spinlock. Moreover, this is the only case when spinlocks are actually useful. Otherwise, using a spinlock is totally pointless.

    But after running the test cases multiple times using /usr/bin/time, that somewhat inaccurate utility reports that the spin lock indeed consumes about 30% more CPU resources on my puny little Intel Atom 330 SMP box. So perhaps this is another case of how difficult it is to make absolute statements in this crazy and infinitely variable world of computers ;)

    This is a result of a mistake I made. I had to mention this. You see, spinlocks are ineffective on single CPU machines. This is from one of my comments to this article:

    The truth is that spinlocks have no meaning on single-processor machines. This is because on SP computers, two threads cannot run at the same time for real. It is simulated by scheduler and the operating system, using preempts.
    Preempt in itself implies context switch and putting the thread asleep. But this breaks the reason why we wanted to use spinlocks in the first place – we wanted to avoid context switch.

    Perhaps this entire subject deserves a new article :-)
    Silly me. This is a SMP machine.

    This is indeed a mystery and I have no idea what could be the problem. Do you mind to share how exactly you did the test?

  5. pigiron says:

    There is no problem. I was just digging into this statement in the article:

    “This proves that the spinlock is more effective in term of CPU consumption…”

    And I think we may be in agreement that it’s not necessarily true. Faster, possibly; less CPU consumption, possibly (and probably) not.

    Anyway, I simply used the “time” command (see the “time” man page) to measure the CPU consumption of your program(s) on one of my systems. You have to enter “/usr/bin/time ./a.out”, otherwise bash will use it’s built-in (and inferior) time command. But that work’s also. It just presents less info.

  6. @pigiron
    Hmm… I don’t get it. I can’t reproduce your results. Here what I get on dual Xeon 3GHz system.

    With mutex:
    7.68user 7.56system 0:08.74elapsed 174%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+78435minor)pagefaults 0swaps

    With spinlock:
    6.64user 0.43system 0:04.53elapsed 155%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+78440minor)pagefaults 0swaps

    And this is what I get on quad Xeon 3GHz – a little closer to your Atom 330 (it reports 4 cpus as well).

    With mutex:
    9.98user 11.91system 0:12.60elapsed 173%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+78468minor)pagefaults 0swaps

    With spinlock:
    12.05user 0.35system 0:07.04elapsed 176%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+78468minor)pagefaults 0swaps

    Note that in the last test, user time was indeed longer for spinlock. Yet with spinlock it didn’t spent any time in kernel, while with mutex it was in kernel most of the time.
    This behavior is natural for both of them. Spinlocks don’t do system calls, thus the spinlock test remains in userspace and doesn’t sleep. Mutexes test on the other hand falls asleep.

    The reason why I continue this argument and why I did these tests is because I am sure that my original statement is true. Spinlocks supposed to be more CPU efficient than mutexes.

  7. pigiron says:

    Chuckle. I get the following:

    With mutex:

    9.90user 6.13system 0:13.56elapsed 118%CPU (0avgtext+0avgdata 0maxresident)k
    32inputs+0outputs (0major+39338minor)pagefaults 0swaps

    With spinlock:

    24.49user 0.28system 0:14.49elapsed 170%CPU (0avgtext+0avgdata 0maxresident)k
    32inputs+0outputs (0major+39339minor)pagefaults 0swaps

    Using:

    gcc-libs 4.4.0
    glibc 2.10.1
    kernel 2.6.29.4

    On that lightly loaded Atom system; which actually shows a longer “wall clock” time for the spinlock… go figure.

    I only used cut and paste to create/compile your test cases… so there was no cheating involved :)

    So I tried again but added the displaying of the number of context switches:

    With mutex:

    /usr/bin/time -f “%Uuser %Ssystem %Eelapsed %PCPU %cFCSW %wVCSW” ./a.out
    Consumer TID 21754
    Consumer TID 21755
    Result – 9.784326
    10.22user 7.04system 0:13.93elapsed 123%CPU 142FCSW 632642VCSW

    With spinlock:

    /usr/bin/time -f “%Uuser %Ssystem %Eelapsed %PCPU %cFCSW %wVCSW” ./a.out
    Consumer TID 21800
    Consumer TID 21801
    Result – 10.299240
    24.56user 0.21system 0:14.50elapsed 170%CPU 117FCSW 2VCSW

    This doesn’t explain the differences either (but notice the large number of context switches for the faster mutex), but that’s not too shocking since there are an infinite number of things that can affect performance between (and in) systems.

    For instance, there’s an excellent article by Ulrich Drepper here:

    http://people.redhat.com/drepper/cpumemory.pdf

    that goes deep into memory management and cache lines. It’s well worth the long and technical read if you’re interested in some “down and dirty”.

    This whole subject is why I’ll be happy when the Linux kernel folks finalize traces in the kernel. That’s usually my main method to figure out these types of performance issues/differences in the *nix kernel guts I worked in.

  8. Originally Posted By pigiron
    With mutex:

    /usr/bin/time -f “%Uuser %Ssystem %Eelapsed %PCPU %cFCSW %wVCSW” ./a.out
    Consumer TID 21754
    Consumer TID 21755
    Result – 9.784326
    10.22user 7.04system 0:13.93elapsed 123%CPU 142FCSW 632642VCSW

    With spinlock:

    /usr/bin/time -f “%Uuser %Ssystem %Eelapsed %PCPU %cFCSW %wVCSW” ./a.out
    Consumer TID 21800
    Consumer TID 21801
    Result – 10.299240
    24.56user 0.21system 0:14.50elapsed 170%CPU 117FCSW 2VCSW

    This doesn’t explain the differences either (but notice the large number of context switches for the faster mutex), but that’s not too shocking since there are an infinite number of things that can affect performance between (and in) systems.

    I guess you’re right. Things are not as simple as I thought. I will investigate it further, but I don’t have a netbook just yet, so it will have to wait.
    If you have some thoughts as for why our results are so different, please spit it out.

    For instance, there’s an excellent article by Ulrich Drepper here:

    http://people.redhat.com/drepper/cpumemory.pdf

    that goes deep into memory management and cache lines. It’s well worth the long and technical read if you’re interested in some “down and dirty”.

    Actually this paper inspired me to write an article on the subject. Perhaps you’ll find it interesting: http://www.alexonlinux.com/aligned-vs-unaligned-memory-access :-)

  9. anon says:

    This is really bad advice. Spinlocks are only useful on SMP machines, and they are only useful in the very limited case where the mutex being locked will ‘very soon’ be freed by a separate core, such that blocking is faster than context switching. A spinlock basically blocks the current core and waits on the other one to let go of the mutex. It makes no sense to spinlock when you have only one core; you need a context switch in order for the other thread to do its work.

    The reason spinlocks are worse than mutexes on single processor machines (i.e. the Atom processors, ARM, or really any processor more than a few years old) is that while the spinlock is spinning on a mutex, it has the only processor blocked, so nothing is happening; your CPU is basically livelocked. After a short while the thread times out and you get a context switch to the other thread which can complete its operation. If you were using a mutex, the context switch would happen right away, wasting no time waiting on something that is impossible to happen.

    This is why spinlocks are rarely used outside of operating system kernels; it is almost never the case that you will get a real-life scenario as contrived as your above example that would benefit from it. If you replace all mutexes with spinlocks, they’ll usually just time out and context switch anyway. Even if they don’t, the time spent spinning might be better spent on another thread waiting to do work. And you don’t even know that the operating system will properly spread out your threads so that the thread spinning is on a different core than the one holding the lock.

    This is also why for e.g. Linux you get a totally separate kernel if you have an SMP processor. It’s compiled with different code (of which spinlocks are one aspect); spinlocks are only useful *sometimes* on SMP, and never useful without an SMP processor. Linux can do this because it specifically controls which cores execute which of its own threads.

    In the more general sense, I often wish people would have a lot less hubris in giving such general optimization advice. If there were a magic solution to making something faster, it would surely be in the kernel and in your compiler and libraries already. These guys have thought about this stuff a lot more than we have. Unfortunately, optimization always requires a lot of thought, analysis, and understanding. I don’t mean to discourage you; I understand that you have good intentions. Giving optimization advice into algorithms rather than abstraction layers is something that is much more worthwhile.

    • Marc says:

      Hi there,

      you are completely right (anon), thinking about multithreading on a single cpu. But in the area of supercomputing, spinlocks are a blessing. Modeling data dependencies with locks can be really slow, if a cpu goes idle because of mutexlocks. For example solving a time-based equation and using the solution for two other computations, while the new timestep is calculated, is much slower with mutexlocks. The two dependent cores for the other calculations doing system-tasks while the timestep calculating core is waiting for their solutions to check the error.
      In general for fast computations on multicore shared memory systems spinlocks are the stuff.

  10. Originally Posted By anon
    Before I begin answering this, I must say that I tend to introduce myself, especially before I go criticizing someone’s work, no matter how much I dislike it. This is because I myself hate don’t like anonymous criticism.

    This is really bad advice. Spinlocks are only useful on SMP machines, and they are only useful in the very limited case where the mutex being locked will ‘very soon’ be freed by a separate core, such that blocking is faster than context switching. A spinlock basically blocks the current core and waits on the other one to let go of the mutex. It makes no sense to spinlock when you have only one core; you need a context switch in order for the other thread to do its work.

    I already apologized for this, but I guess you didn’t read comments. Yes, spinlocks are indeed useless on UP machines – see first pair of comments to this post. I’ll update the article to mention this.

    The reason spinlocks are worse than mutexes on single processor machines (i.e. the Atom processors, ARM, or really any processor more than a few years old) is that while the spinlock is spinning on a mutex, it has the only processor blocked, so nothing is happening; your CPU is basically livelocked. After a short while the thread times out and you get a context switch to the other thread which can complete its operation. If you were using a mutex, the context switch would happen right away, wasting no time waiting on something that is impossible to happen.

    This is why spinlocks are rarely used outside of operating system kernels; it is almost never the case that you will get a real-life scenario as contrived as your above example that would benefit from it. If you replace all mutexes with spinlocks, they’ll usually just time out and context switch anyway. Even if they don’t, the time spent spinning might be better spent on another thread waiting to do work. And you don’t even know that the operating system will properly spread out your threads so that the thread spinning is on a different core than the one holding the lock.

    People don’t use spinlocks because they’ve became available in user-mode only recently. There’s virtually no documentation on them. This is the reason for this post (and other posts I’ve written and will write).

    spinlocks are only useful *sometimes* on SMP, and never useful without an SMP processor. Linux can do this because it specifically controls which cores execute which of its own threads.

    *sometimes*? You usually use mutexes to protect data structures. You usually make sure that access to the data structure is fast. The better your data structure, the more useful spinlocks become. I am wondering if *sometimes* is a good answer to a question: do you make your data structures optimized?

    In the more general sense, I often wish people would have a lot less hubris in giving such general optimization advice. If there were a magic solution to making something faster, it would surely be in the kernel and in your compiler and libraries already. These guys have thought about this stuff a lot more than we have. Unfortunately, optimization always requires a lot of thought, analysis, and understanding. I don’t mean to discourage you; I understand that you have good intentions. Giving optimization advice into algorithms rather than abstraction layers is something that is much more worthwhile.

    I must apologize again. I didn’t explain what are the origins of this article. One day, I was experimenting with spinlocks. I decided to prove that spinlocks are indeed faster than mutex in this scenario. I did it. The results really excited me so I wrote this post.

    There’s one thing that I forgot to write, though. That is, use your freaking brain! No, spinlocks are not faster than mutexes. Simply because watermelon is not tastier than pumpkin. But you see, this article is a mere proof that watermelon is tastier than pumpkin when this is true. Spinlocks are faster than mutexes when they should be faster.

    As for abstraction/algorithms stuff. I think there are no us and them. It’s all about what you want. If you want your program to be faster, you think about how to make it faster.
    Spinlocks are not an abstraction. Spinlocks are an infrastructure that waits for you to get out of that box where incredibly smart guys write libraries and kernels, while modest and simple guys, such as you are, just do classes and other *simple* stuff. Spinlocks are just a tool. Not a tool for alpha programmers and guys whose name is Linus. A tool for everybody. Thinking otherwise is plain wrong.

  11. Andre Goddard Rosa says:

    Hi, Alexander!

    Passing by here to say that it’s clear from your article that spinlocks are not a silver bullet, but sometimes can give reasonable improvements on performance depending on your workload and how the competing threads are set regarding CPU affinity. And of course the size (time to release the lock) of the critical section is critical for this.

    Thanks!

  12. Jimmy says:

    Hi, Alexander!
    I dont know much about Linux System programming but just made some modification on your code for more precise test for multi-thread time consume using mutex and pthread respectively. I suppose STL would consume much CPU time for its encapsulation. also I counted the time for each thread to get the lock. Here is my test result:

    liruqi@Apollo:~/Code$ gcc -Wall -pthread dthread.c -o dthread
    liruqi@Apollo:~/Code$ ./dthread
    Consumer TID 28072
    Consumer TID 28073
    Result – 1.72349
    pid: 28073 => loops: 1371800
    pid: 28072 => loops: 8628202

    liruqi@Apollo:~/Code$ gcc -Wall -pthread dthread.c -o dthread -DUSE_SPINLOCK
    liruqi@Apollo:~/Code$ ./dthread
    Consumer TID 28080
    Consumer TID 28081
    Result – 0.314979
    pid: 28081 => loops: 3204326
    pid: 28080 => loops: 6795676

    from this data we can see that spinlock way is at least 5 time faster.

    here is my code:
    #include
    #include
    #include
    #include
    #include
    #include

    #define LOOPS 10000000

    int remains = LOOPS;
    int counter[32*1024];

    #ifdef USE_SPINLOCK
    pthread_spinlock_t spin;
    #define pthread_init() pthread_spin_init(&spin, 0)
    #define pthread_lock() pthread_spin_lock(&spin)
    #define pthread_unlock() pthread_spin_unlock(&spin)
    #define pthread_destroy() pthread_spin_destroy(&spin)
    #else
    pthread_mutex_t mutex;
    #define pthread_init() pthread_mutex_init(&mutex, 0)
    #define pthread_lock() pthread_mutex_lock(&mutex)
    #define pthread_unlock() pthread_mutex_unlock(&mutex)
    #define pthread_destroy() pthread_mutex_destroy(&mutex)
    #endif

    pid_t gettid() { return syscall( __NR_gettid ); }

    void *consumer(void *ptr)
    {

    unsigned int tid = gettid();
    printf(“Consumer TID %u\n”, tid);
    counter[tid] = 0;

    while (1)
    {
    pthread_lock();
    counter[tid] += 1;
    if (remains tv2.tv_usec)
    {
    tv2.tv_sec–;
    tv2.tv_usec += 1000000;
    }

    printf(“Result – %ld.%ld\n”, tv2.tv_sec – tv1.tv_sec,
    tv2.tv_usec – tv1.tv_usec);
    for(i=32*1024 – 1; i!=0; –i) if(counter[i])
    printf(“pid: %d => loops: %d\n”, i, counter[i]);

    pthread_destroy();

    return 0;
    }

  13. @Jimmy
    Yes, indeed it is much faster on right systems. Thanks for your code snippet. It is more elegant (although wordpress’s commenting system completely ruined it).

  14. sam says:

    According to my judgement, a spinlock performs well, when a shared ressource is mostly free (from the point of view of a thread).
    In other words, when operations performed during a lock are so tiny, that concurrent threads’ locking overlap is near zero and therefore, expensive spins will also go towards zero. Then we have a performance gain because of short reaction time by an active wait.
    I claim that this lock is reasonable on SMP machines and in the situation mentioned above, when nThreads == nLogicalProcessorcs are running in parallel during a scheduled time slice (no context switch shall occur without interaction) and can interleave access to a ressource without beeing “scheduled out”, because a spin lock has no relation to the sheduler compared to a mutex. (should match Sandler’s performance test).

    But in the opposite case, when the lock time is long/varying, then we have more overhead consisting of CPU time wasting spins. Therefore, a mutex performs better in this situation. The thread “slumbers” without causing extravagant overhead compared to a spinlock but reacts slower (due to scheduling overhead).

    For this reasons, i claim that software on non- SMP machines must use mutexes (spinlocks are conceptually not made for these).

    Howto implement SMP software:
    -1. Turn your monitor off
    0. take pen and paper
    1. draft a locking strategy on paper (don’t waste your time with questions like “mutex or spinlock”.
    For example:
    > You have a Ressource which is simultaneously used, but not necessarily at the same position: Think about segmentation, split the whole thing up in independent lockable segments. You can formulate (in UML !?) abgstract locking requests like: I, tread freddy want access to this ressource at this offset,length or position…
    > You have a ressource, which can be used by n Threads simultaneously? Then you want a semaphore…
    2. Implement everything without spinlocks
    3. test, stresstest and debug without Spinlocks
    4. Does your software what it needs to do? Really? Then go on
    5. code profiling
    6. You can replace mutexes with spinlocks where reasonable and profile again.

    Greetings (my apologies, if my english was decayed…)

  15. @sam
    Although I totally agree with first part of your comment, I totally disagree with the later part. Thorough planning and designing well ahead is one thing that has proven itself multiple times. Writing the code with muteces and then replacing them with spinlocks if needed, is exactly the opposite.
    There are numerous books on multi-threaded programming and each one of them stress that locking and synchronization should be part of initial design of a project.
    Whether to use a spinlock or a mutex often directly affects how you design your project.

  16. Thanu says:

    Hi Alex,

    Can you elaborate on your previous comment?

    I do know that spinlocks and mutexes give differing performance for different kinds of workloads, but why is it not ok to just replace a spinlock with a mutex in the end?

  17. @Thanu
    There are various types of problems that may occur when switching from one to another. One classical problem called priority inversion problem. It is when you have two threads, one with low priority and one with high priority. Low priority thread holds the spinlock and gets preempted. High priority thread kicks in and tries to acquire the spinlock. Then high priority thread gets into busy loop waiting for the spinlock. Since the thread has higher priority and consumes the CPU, scheduler may not schedule low priority thread as long as high priority thread is busy, which is always.
    If scheduler still decides to give some cpu time to low priority thread, you will still see unusually high cpu consumption.
    Obviously all this wouldn’t be a problem with mutex.

    Other problems that may occur are related to performance.

    One sure way to avoid all of them is to plan ahead your synchronization policy.

  18. Rupesh KP says:

    Can you let me know a good example wher only spinlocks can be used and no mutexes?

  19. @Rupesh KP
    In user-mode there’s no such thing – spinlocks make your code faster, but can be substituted with other sync. primitives.

  20. […] mutex vs spinlock « 15/6/2010 […]

  21. Nihs says:

    Hi Alexander,
    Nice post. This got me thinking.
    From Jimmy’s post on spinlock’s result
    pid: 28081 => loops: 3204326
    pid: 28080 => loops: 6795676

    I would imagine both values should be similar if not equal, assuming both core have the same processing power. Or may be one of the cores has other process or threads running as well?

  22. @Nihs
    Indeed, this is an interesting question. I don’t think you can run two processors (or cores) with different speed in x86 – it just doesn’t make sense to have this ability.
    I think if you run the code on a single CPU machine, you’ll get a result like this. However, can’t tell you for sure.
    Perhaps Jimmy can try to explain the result himself, if he’s still subscribed to comments on this post. Jimmy, can you?

  23. Mark says:

    I suspect that the difference seen is more likely to be an artifact of the scheduler, or is just a visualization of other higher priority threads executing on one of the cores (e.g., hardware IRQs).

    A lot of Linux distributions don’t enable irq balancing out of the box, so CPU 0 will service nearly all of the hardware interrupts. This can significantly reduce the amount of user time that CPU 0 will execute.

  24. Alex Turner says:

    I know this went cold month ago but I only just got here.

    Something which everyone seems to be missing is how spin locks work. In an SMP machine, the spin link needs to build a memory barrier to force synchronization of the lock value in memory else the lock can let more than one thread through.

    Different machine architectures have very different approaches to building memory barriers. Also, the cost of main memory will be different on different machines which outwardly look like they have exactly the same architecture.

    As more and more CPU’s are brought on line and schedulers become more clever (with thread groups etc.) one could imagine a situation where a mutex is the way to go for small numbers of cpus, spin locks for large numbers and mutex again for very large numbers.

    I suspect the only really safe solutions are to either tune the compilation to the machine or put the lock code in objects which all implement a particular interface. Then actual lock implementation can be hot swapped at run time based on metrics.

    Best wishes – AJ

  25. tr00per says:

    It seems argument invalidates if -O3 is given. Here are example runs on my machine:

    $ g++ -o threadTest_spin -DUSE_SPINLOCK -lpthread threadTest.cpp
    $ g++ -o threadTest_mutex -lpthread threadTest.cpp
    $ ./threadTest_mutex
    Thread test using mutex.
    Consumer TID 6838
    Consumer TID 6837
    Result – 2.638293
    $ ./threadTest_spin
    Thread test using spin lock.
    Consumer TID 6847
    Consumer TID 6848
    Result – 1.926722

    $ g++ -o threadTest_spin -DUSE_SPINLOCK -O3 -lpthread threadTest.cpp
    $ g++ -o threadTest_mutex -O3 -lpthread threadTest.cpp
    $ ./threadTest_mutex
    Thread test using mutex.
    Consumer TID 6821
    Consumer TID 6822
    Result – 1.589208
    $ ./threadTest_spin
    Thread test using spin lock.
    Consumer TID 6824
    Consumer TID 6825
    Result – 1.667864

    2.6.38-ARCH x86_64 Intel(R) Core(TM)2 CPU 4400 @ 2.00GHz, g++ 4.6.0 20110513

  26. Hasan says:

    This gives a very good understanding of the Spin Lock versus the Mutex. after reading comments of others and my understanding that Spin lock should not be used in user mode in case of the uni-Processor system. But if the system is multicore, can they be beneficial in practical, i mean writing a fulfledge application. The example program clearly recommends that spin lock is a better bet in user mode as well. Though, I have not found anybody using Spinlock in user programs and they are best used in Kernel modules. what do you say?

  27. Erik says:

    Thanks for the example! It compiled and worked great. Answered the exact question I’ve been struggling with. Keep up the good work :-)

  28. JT says:

    Hello,

    Good article Alex.

    A spinlock will be faster than a mutex.
    A spinlock is a mechanism that gives a thread the ability to not get
    pre-empted (or blocked) by the kernel.
    The side effect to this is that a thread that is waiting on a spinlock is more responsive, at the cost of cpu (since it will be spinning).
    Since the thread is more responsive, it will execute it’s instructions in less wall clock time.

    A point to be noted is that a spinlock will not spin infinitely (the spin count is not documented), so in a worst case scenario a spinlock downgrades to a mutex.
    Replacing all the mutex locks with spinlocks will only improve performance –
    not degrade it.

    Somebody wrongly noted that a spinlock will lead to “priority inversion” scenarios. This is not true.

    The argument that spinlocks are useless on single processor machines, though
    correct, should not be taken seriously since here are not many single
    processor machines around anymore.

    There are numerous cases where spinlocks are applicable.
    A simple example is a message processor app.
    eg: Thread(T1) reads from a socket and write to a queue(Q), Thread(T2) reads from the Q and does some processing. In this case the lock on the Q is very short lived and the threads T1, T2 have to be very responsive.

    enjoy

  29. @tr00per
    I think we will have to analyze what -O3 optimization did in order to figure out why you got the numbers you got.

  30. @JT
    Thanks for backing me up and you are absolutely right.

  31. […] explore both spinlock and mutex. Although using mutex is usually safer and more portable, spinlock may be faster (careless micro-benchmarks tend to prefer spinlock actually). Nonetheless, let me emphasize again […]

  32. Juan Bolsa says:

    I didn’t know the real functions of the spinlock, this much better than a mutex.Thanks for sharing

  33. @Juan Bolsa
    Well, it is not exactly better. It is different. Use it carefully.

  34. Val says:

    Alex,
    Probably better example of spinlock advantages would be reader-writer synchronization. When reader quickly reads in small chunks and writer rarely changes something. In this case the lock is almost always free and spinlock takes no time (almost). While mutex lock requires system call anyway, my guess. Can’t test it, I’m on Windows (sorry).

  35. Thomas K says:

    Hi, nice article! for which I have complementary questions:

    1) How do you ensure each thread to run on a different CPU core?
    2) Is it possible to set up which core to use for each created thread?
    3) Is there some kind of round robin core affectation for each new created thread? If so, where is this affectation/assignment made, somewhere in linux kernel sources?

    Best regards

  36. […] 同質多核心的環境下,Critical Section 所花費的時間如果不多,適合 Spinlock,因為我們可以減少 Context Switch 的時間。(單核心其實兩者的比較沒有太多意義,因為某些系統會透過中段的開關實作 Spinlock) […]

  37. […] 코어에 비해 Thread가 너무 많을 경우 성능이 떨어지는 듯 합니다. 다음은 pthread mutex vs pthread spinlock에 올라온 […]

  38. apollo says:

    @Alexander : Thank you for this useful post.

    @All Others : It seems to be a typical problem of the dotnet generation that people completely lost the feeling of whats really going on in nowerdays multi core systems. Context switching and userspace to kernel switching time versus execution time inside the spinlock isn’t that hard to understand. And this ration only will define if a spinlock is preferable over a conventional mutex. Any reply to this nice article that relates to single core systems is just big fail.

  39. Shai Tahar says:

    I’ve updated the code for setting affinity to the threads
    making sure it will be running on different cores

    void *consumer(void *ptr)
    {
    printf(“Consumer TID %lu\n”, (unsigned long)gettid());

    pthread_t thread ;
    thread = pthread_self();
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(gettid()%2?0:1,&cpuset);
    int s = pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
    if (s != 0)
    printf( “pthread_getaffinity_np – %d “,s);

    without it, both threads were executed on the same core – in this case spinlocks are inferior to mutex

  40. manju says:

    how to your spinlock and mutex code to singl board arm9 plz explain step

  41. @manju – Spinlocks are useless on single CPU machines. Use mutex.

  42. […] 同質多核心的環境下,Critical Section 所花費的時間如果不多,適合 Spinlock,因為我們可以減少 Context Switch 的時間。(單核心其實兩者的比較沒有太多意義,因為某些系統會透過中段的開關實作 Spinlock) […]

  43. […] with Mecki’s suggestion, this article pthread mutex vs pthread spinlock on Alexander Sandler’s blog, Alex on Linux shows how the spinlock & mutexes can be […]

  44. Zebedee Dillon says:

    You are printing out the timing result wrongly. If you want to avoid using floating point as it seems you do, then you need to print out using the format %ld.%06ld. In particular the mutex time 7.99794s should really be 7.099794s.

Leave a Reply

Prove you are not a computer or die *