<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: A new kind of virtualization</title>
	<atom:link href="http://www.alexonlinux.com/a-new-kind-of-virtualization/feed" rel="self" type="application/rss+xml" />
	<link>http://www.alexonlinux.com/a-new-kind-of-virtualization</link>
	<description></description>
	<lastBuildDate>Sun, 05 Feb 2012 21:17:46 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<item>
		<title>By: Curtis Maloney</title>
		<link>http://www.alexonlinux.com/a-new-kind-of-virtualization/comment-page-1#comment-27603</link>
		<dc:creator>Curtis Maloney</dc:creator>
		<pubDate>Tue, 25 Oct 2011 05:32:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.alexonlinux.com/?p=1247#comment-27603</guid>
		<description>Given the JIT technologies, like those developed by the MIT team later used by Transmeta, a slight rethink on your approach could yield improvements none the less.

They found that by using various JIT techniques [at the time, quite cutting edge], but &quot;translating&quot; it to the same machine code (from memory, it was a HP PA-9000) they still achieved significant (up to 30%?) improvements due to, basically, &quot;perfect&quot; profiling feedback.

Now, if you could take something like that, allowing you to duplicate and specialise functions, re-organised branches and hot-code, and inline library calls... as well as identify parallelisable code [as some compilers do now, albeit typically at an intermediate stage, not machine code], you may find many cores combining to yield an overall gain in performance.

Thinking about it, you could also include threads performing smarter cache warming and pre-fetches than the CPU core could be expected to manage [tag code traces, perhaps?]

Anyway...&lt;div class=&quot;comment-remix-meta&quot;&gt;&lt;a href=&quot;#&quot; class=&quot;replyto&quot; onclick=&quot;replyto(&#039;27603&#039;,&#039;Curtis Maloney&#039;); return false;&quot;&gt;Reply&lt;/a&gt;  - &lt;a href=&quot;#&quot; class=&quot;quote&quot; onclick=&quot;quote(&#039;27603&#039;,&#039;Curtis Maloney&#039;,&#039;Given the JIT technologies, like those developed by the MIT team later used by Transmeta, a slight rethink on your approach could yield improvements none the less.\r\n\r\nThey found that by using various JIT techniques &#091;at the time, quite cutting edge&#093;, but \&quot;translating\&quot; it to the same machine code (from memory, it was a HP PA-9000) they still achieved significant (up to 30%?) improvements due to, basically, \&quot;perfect\&quot; profiling feedback.\r\n\r\nNow, if you could take something like that, allowing you to duplicate and specialise functions, re-organised branches and hot-code, and inline library calls... as well as identify parallelisable code &#091;as some compilers do now, albeit typically at an intermediate stage, not machine code&#093;, you may find many cores combining to yield an overall gain in performance.\r\n\r\nThinking about it, you could also include threads performing smarter cache warming and pre-fetches than the CPU core could be expected to manage &#091;tag code traces, perhaps?&#093;\r\n\r\nAnyway...&#039;); return false;&quot;&gt;Quote&lt;/a&gt;&lt;/div&gt;</description>
		<content:encoded><![CDATA[<p>Given the JIT technologies, like those developed by the MIT team later used by Transmeta, a slight rethink on your approach could yield improvements none the less.</p>
<p>They found that by using various JIT techniques [at the time, quite cutting edge], but &#8220;translating&#8221; it to the same machine code (from memory, it was a HP PA-9000) they still achieved significant (up to 30%?) improvements due to, basically, &#8220;perfect&#8221; profiling feedback.</p>
<p>Now, if you could take something like that, allowing you to duplicate and specialise functions, re-organised branches and hot-code, and inline library calls&#8230; as well as identify parallelisable code [as some compilers do now, albeit typically at an intermediate stage, not machine code], you may find many cores combining to yield an overall gain in performance.</p>
<p>Thinking about it, you could also include threads performing smarter cache warming and pre-fetches than the CPU core could be expected to manage [tag code traces, perhaps?]</p>
<p>Anyway&#8230;
<div class="comment-remix-meta"><a href="#" class="replyto" onclick="replyto('27603','Curtis Maloney'); return false;">Reply</a>  &#8211; <a href="#" class="quote" onclick="quote('27603','Curtis Maloney','Given the JIT technologies, like those developed by the MIT team later used by Transmeta, a slight rethink on your approach could yield improvements none the less.\r\n\r\nThey found that by using various JIT techniques &amp;#91;at the time, quite cutting edge&amp;#93;, but \&quot;translating\&quot; it to the same machine code (from memory, it was a HP PA-9000) they still achieved significant (up to 30%?) improvements due to, basically, \&quot;perfect\&quot; profiling feedback.\r\n\r\nNow, if you could take something like that, allowing you to duplicate and specialise functions, re-organised branches and hot-code, and inline library calls... as well as identify parallelisable code &amp;#91;as some compilers do now, albeit typically at an intermediate stage, not machine code&amp;#93;, you may find many cores combining to yield an overall gain in performance.\r\n\r\nThinking about it, you could also include threads performing smarter cache warming and pre-fetches than the CPU core could be expected to manage &amp;#91;tag code traces, perhaps?&amp;#93;\r\n\r\nAnyway...'); return false;">Quote</a></div>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Brown</title>
		<link>http://www.alexonlinux.com/a-new-kind-of-virtualization/comment-page-1#comment-22543</link>
		<dc:creator>David Brown</dc:creator>
		<pubDate>Mon, 25 May 2009 07:59:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.alexonlinux.com/?p=1247#comment-22543</guid>
		<description>Actually, there are many types of problem that are inherently single-threaded.  The most obvious case for the mass market is games (though I don&#039;t know enough about games design to say why).  Some parts of games software can be split into multiple threads, but the main thread is still the bottleneck, and an SMP processor can only get about 25% more speed than a single thread processor could.  Many types of simulation and optimisation problems have the same limitations - because each step builds upon previous steps, you can&#039;t parallelise the algorithms effectively.

But as you say, it&#039;s very difficult getting multiple processors to improve the performance of a single thread.  One method is OpenMP, which is getting stronger support in modern compilers - it lets you write the program as a single thread, with multiple threads for things like loops.  It&#039;s easier to use than writing explicitly multi-threaded programs, but still complex.  Intel has been doing a lot of work recently on making their compilers generate such multi-threaded loops automatically.

Theoretically, it would be possible to do such loop parallelisation in the cpu hardware, as an extension to branch prediction and super-scaler execution.  But it could only work practically on a small scale, using more execution units within a core rather than multiple cores, and even then I&#039;m not sure the overheads would be worth it.&lt;div class=&quot;comment-remix-meta&quot;&gt;&lt;a href=&quot;#&quot; class=&quot;replyto&quot; onclick=&quot;replyto(&#039;22543&#039;,&#039;David Brown&#039;); return false;&quot;&gt;Reply&lt;/a&gt;  - &lt;a href=&quot;#&quot; class=&quot;quote&quot; onclick=&quot;quote(&#039;22543&#039;,&#039;David Brown&#039;,&#039;Actually, there are many types of problem that are inherently single-threaded.  The most obvious case for the mass market is games (though I don\&#039;t know enough about games design to say why).  Some parts of games software can be split into multiple threads, but the main thread is still the bottleneck, and an SMP processor can only get about 25% more speed than a single thread processor could.  Many types of simulation and optimisation problems have the same limitations - because each step builds upon previous steps, you can\&#039;t parallelise the algorithms effectively.\r\n\r\nBut as you say, it\&#039;s very difficult getting multiple processors to improve the performance of a single thread.  One method is OpenMP, which is getting stronger support in modern compilers - it lets you write the program as a single thread, with multiple threads for things like loops.  It\&#039;s easier to use than writing explicitly multi-threaded programs, but still complex.  Intel has been doing a lot of work recently on making their compilers generate such multi-threaded loops automatically.\r\n\r\nTheoretically, it would be possible to do such loop parallelisation in the cpu hardware, as an extension to branch prediction and super-scaler execution.  But it could only work practically on a small scale, using more execution units within a core rather than multiple cores, and even then I\&#039;m not sure the overheads would be worth it.&#039;); return false;&quot;&gt;Quote&lt;/a&gt;&lt;/div&gt;</description>
		<content:encoded><![CDATA[<p>Actually, there are many types of problem that are inherently single-threaded.  The most obvious case for the mass market is games (though I don&#8217;t know enough about games design to say why).  Some parts of games software can be split into multiple threads, but the main thread is still the bottleneck, and an SMP processor can only get about 25% more speed than a single thread processor could.  Many types of simulation and optimisation problems have the same limitations &#8211; because each step builds upon previous steps, you can&#8217;t parallelise the algorithms effectively.</p>
<p>But as you say, it&#8217;s very difficult getting multiple processors to improve the performance of a single thread.  One method is OpenMP, which is getting stronger support in modern compilers &#8211; it lets you write the program as a single thread, with multiple threads for things like loops.  It&#8217;s easier to use than writing explicitly multi-threaded programs, but still complex.  Intel has been doing a lot of work recently on making their compilers generate such multi-threaded loops automatically.</p>
<p>Theoretically, it would be possible to do such loop parallelisation in the cpu hardware, as an extension to branch prediction and super-scaler execution.  But it could only work practically on a small scale, using more execution units within a core rather than multiple cores, and even then I&#8217;m not sure the overheads would be worth it.
<div class="comment-remix-meta"><a href="#" class="replyto" onclick="replyto('22543','David Brown'); return false;">Reply</a>  &#8211; <a href="#" class="quote" onclick="quote('22543','David Brown','Actually, there are many types of problem that are inherently single-threaded.  The most obvious case for the mass market is games (though I don\'t know enough about games design to say why).  Some parts of games software can be split into multiple threads, but the main thread is still the bottleneck, and an SMP processor can only get about 25% more speed than a single thread processor could.  Many types of simulation and optimisation problems have the same limitations - because each step builds upon previous steps, you can\'t parallelise the algorithms effectively.\r\n\r\nBut as you say, it\'s very difficult getting multiple processors to improve the performance of a single thread.  One method is OpenMP, which is getting stronger support in modern compilers - it lets you write the program as a single thread, with multiple threads for things like loops.  It\'s easier to use than writing explicitly multi-threaded programs, but still complex.  Intel has been doing a lot of work recently on making their compilers generate such multi-threaded loops automatically.\r\n\r\nTheoretically, it would be possible to do such loop parallelisation in the cpu hardware, as an extension to branch prediction and super-scaler execution.  But it could only work practically on a small scale, using more execution units within a core rather than multiple cores, and even then I\'m not sure the overheads would be worth it.'); return false;">Quote</a></div>
]]></content:encoded>
	</item>
</channel>
</rss>

