By: Curtis Maloney

Curtis Maloney — Tue, 25 Oct 2011 05:32:40 +0000

Given the JIT technologies, like those developed by the MIT team later used by Transmeta, a slight rethink on your approach could yield improvements none the less.

They found that by using various JIT techniques [at the time, quite cutting edge], but “translating” it to the same machine code (from memory, it was a HP PA-9000) they still achieved significant (up to 30%?) improvements due to, basically, “perfect” profiling feedback.

Now, if you could take something like that, allowing you to duplicate and specialise functions, re-organised branches and hot-code, and inline library calls… as well as identify parallelisable code [as some compilers do now, albeit typically at an intermediate stage, not machine code], you may find many cores combining to yield an overall gain in performance.

Thinking about it, you could also include threads performing smarter cache warming and pre-fetches than the CPU core could be expected to manage [tag code traces, perhaps?]

Anyway…

By: David Brown

David Brown — Mon, 25 May 2009 07:59:33 +0000

Actually, there are many types of problem that are inherently single-threaded. The most obvious case for the mass market is games (though I don’t know enough about games design to say why). Some parts of games software can be split into multiple threads, but the main thread is still the bottleneck, and an SMP processor can only get about 25% more speed than a single thread processor could. Many types of simulation and optimisation problems have the same limitations – because each step builds upon previous steps, you can’t parallelise the algorithms effectively.

But as you say, it’s very difficult getting multiple processors to improve the performance of a single thread. One method is OpenMP, which is getting stronger support in modern compilers – it lets you write the program as a single thread, with multiple threads for things like loops. It’s easier to use than writing explicitly multi-threaded programs, but still complex. Intel has been doing a lot of work recently on making their compilers generate such multi-threaded loops automatically.

Theoretically, it would be possible to do such loop parallelisation in the cpu hardware, as an extension to branch prediction and super-scaler execution. But it could only work practically on a small scale, using more execution units within a core rather than multiple cores, and even then I’m not sure the overheads would be worth it.

Comments on: A new kind of virtualization

By: Curtis Maloney

By: David Brown