A new kind of virtualization

There are plenty of virtualization technologies. There are organizations like VMware, VirtualBox and XEN, whose virtualization allows one to run several virtual computers using one physical computer.

I worked at a company called ScaleMP. ScaleMP’s technology, vSMP, turns multiple physical computers into one large computer.

Today I was looking for something different. I was looking for technology that turns multi-core physical computer into a virtual computer with a single core, whose power is a combined power of all physical CPUs. I.e. take two CPUs with four cores each, each doing X flops. Turn them into single CPU with single core that does 8X flops.

In case you wonder why on earth someone would need such program, here’s what I was thinking about. Imagine you have a single threaded program. Lets also assume that the program is difficult to change (or you only have its binary).

Single threaded implies scalability problem. This is because the only thing you can do to improve its performance is to make the CPU, it runs on, to run faster. But modern CPUs has reached their capability in terms of single core processing power. To make CPUs faster, processor vendors add more cores – split CPU in two, four, eight, etc. Yet they cannot make single core to be significantly faster. Hence we’re stuck.

Such super CPU virtualization technology could solve the problem, but it appears that such technology does not exist. There’s not a single firm that does anything that even closely resembles something like this.

Then, I thought that actually such company cannot exist. Its a combination of technological innovation and cost of development that simply cannot coexist.

Think about it. Technology behind virtualized CPU would have to split single stream of execution into several streams. But this is exactly what modern CPUs do already. Processor that utilizes branch prediction, try to predict most probable path of execution of a thread and execute several chunks of code simultaneously.

In case you’re wondering if a processor in your laptop computer does branch prediction, the answer is probably yes.

Because CPUs already use branch prediction and perhaps other technologies that speed up executable code processing, making our super CPU do the job even faster would be a very difficult task. Task as difficult that it would probably require some best minds in the world to join up. And best minds in the cost a lot of money.

On the other hand, market size for such product is relatively small. There are not many programs out there that cannot be optimized to run on SMP. So income from such product wouldn’t be so great. Eventually, such company will bankrupt.

Unless, of course, there are other possible uses for such technology.

Did you know that you can receive periodical updates with the latest articles that I write right into your email box? Alternatively, you subscribe to the RSS feed!

Want to know how? Check out
Subscribe page


  1. David Brown says:

    Actually, there are many types of problem that are inherently single-threaded. The most obvious case for the mass market is games (though I don’t know enough about games design to say why). Some parts of games software can be split into multiple threads, but the main thread is still the bottleneck, and an SMP processor can only get about 25% more speed than a single thread processor could. Many types of simulation and optimisation problems have the same limitations – because each step builds upon previous steps, you can’t parallelise the algorithms effectively.

    But as you say, it’s very difficult getting multiple processors to improve the performance of a single thread. One method is OpenMP, which is getting stronger support in modern compilers – it lets you write the program as a single thread, with multiple threads for things like loops. It’s easier to use than writing explicitly multi-threaded programs, but still complex. Intel has been doing a lot of work recently on making their compilers generate such multi-threaded loops automatically.

    Theoretically, it would be possible to do such loop parallelisation in the cpu hardware, as an extension to branch prediction and super-scaler execution. But it could only work practically on a small scale, using more execution units within a core rather than multiple cores, and even then I’m not sure the overheads would be worth it.

  2. Curtis Maloney says:

    Given the JIT technologies, like those developed by the MIT team later used by Transmeta, a slight rethink on your approach could yield improvements none the less.

    They found that by using various JIT techniques [at the time, quite cutting edge], but “translating” it to the same machine code (from memory, it was a HP PA-9000) they still achieved significant (up to 30%?) improvements due to, basically, “perfect” profiling feedback.

    Now, if you could take something like that, allowing you to duplicate and specialise functions, re-organised branches and hot-code, and inline library calls… as well as identify parallelisable code [as some compilers do now, albeit typically at an intermediate stage, not machine code], you may find many cores combining to yield an overall gain in performance.

    Thinking about it, you could also include threads performing smarter cache warming and pre-fetches than the CPU core could be expected to manage [tag code traces, perhaps?]


Leave a Reply

Prove you are not a computer or die *