<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
					xmlns:content="http://purl.org/rss/1.0/modules/content/"
					xmlns:wfw="http://wellformedweb.org/CommentAPI/"
					xmlns:atom="http://www.w3.org/2005/Atom"
				  >
<channel>
<atom:link rel="self"  type="application/rss+xml"  href="http://rulinux.net/rss_from_sect_4_subsect_10_thread_42979"  />
<title>rulinux.net - Форум - Talks - Кэп: а ведь все процессоры разные!</title>
<link>http://rulinux.net/</link>
<description><![CDATA[Портал о GNU/Linux и не только]]></description>
<image><title>rulinux.net - Форум - Talks - Кэп: а ведь все процессоры разные!</title>
<link>http://rulinux.net/</link>
<url>http://rulinux.net/rss_icon.png</url>
</image>
<item>
<title>Re:Кэп: а ведь все процессоры разные!</title>
<link>https://rulinux.net/message.php?newsid=42979&amp;page=1#221151</link>
<guid>https://rulinux.net/message.php?newsid=42979&amp;page=1#221151</guid>
<pubDate>Tue, 05 Sep 2017 11:17:49 +0300</pubDate>
<description><![CDATA[<p><fieldset><legend>bash</legend><div class="highlight bash"><br />
<br />
Установлено:<br />
&nbsp; clang.x86_64 4.0.1-4.fc26 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br />
<br />
Выполнено<span class="sy0">!</span><br />
<span class="br0">&#91;</span>root<span class="sy0">@</span>localhost ~<span class="br0">&#93;</span><span class="co0"># выход</span><br />
<span class="br0">&#91;</span>vilfred<span class="sy0">@</span>localhost devel<span class="br0">&#93;</span>$ <span class="kw2">grep</span> <span class="st_h">'model na'</span> <span class="sy0">/</span>proc<span class="sy0">/</span>cpuinfo <br />
model name&nbsp; &nbsp; &nbsp; : Intel<span class="br0">&#40;</span>R<span class="br0">&#41;</span> Core<span class="br0">&#40;</span>TM<span class="br0">&#41;</span> i5-3337U CPU <span class="sy0">@</span> 1.80GHz<br />
model name&nbsp; &nbsp; &nbsp; : Intel<span class="br0">&#40;</span>R<span class="br0">&#41;</span> Core<span class="br0">&#40;</span>TM<span class="br0">&#41;</span> i5-3337U CPU <span class="sy0">@</span> 1.80GHz<br />
model name&nbsp; &nbsp; &nbsp; : Intel<span class="br0">&#40;</span>R<span class="br0">&#41;</span> Core<span class="br0">&#40;</span>TM<span class="br0">&#41;</span> i5-3337U CPU <span class="sy0">@</span> 1.80GHz<br />
model name&nbsp; &nbsp; &nbsp; : Intel<span class="br0">&#40;</span>R<span class="br0">&#41;</span> Core<span class="br0">&#40;</span>TM<span class="br0">&#41;</span> i5-3337U CPU <span class="sy0">@</span> 1.80GHz<br />
<span class="br0">&#91;</span>vilfred<span class="sy0">@</span>localhost devel<span class="br0">&#93;</span>$ <span class="kw2">gcc</span> <span class="re5">--version</span><br />
<span class="kw2">gcc</span> <span class="br0">&#40;</span>GCC<span class="br0">&#41;</span> 7.1.1 20170622 <span class="br0">&#40;</span>Red Hat 7.1.1-3<span class="br0">&#41;</span><br />
Copyright <span class="br0">&#40;</span>C<span class="br0">&#41;</span> 2017 Free Software Foundation, Inc.<br />
Это свободно распространяемое программное обеспечение. Условия копирования<br />
приведены в исходных текстах. Без гарантии каких-либо качеств, включая <br />
коммерческую ценность и применимость для каких-либо целей.<br />
<br />
<span class="br0">&#91;</span>vilfred<span class="sy0">@</span>localhost devel<span class="br0">&#93;</span>$ clang <span class="re5">--version</span><br />
clang version 4.0.1 <span class="br0">&#40;</span>tags<span class="sy0">/</span>RELEASE_401<span class="sy0">/</span>final<span class="br0">&#41;</span><br />
Target: x86_64-unknown-linux-gnu<br />
Thread model: posix<br />
InstalledDir: <span class="sy0">/</span>usr<span class="sy0">/</span>bin<br />
<span class="br0">&#91;</span>vilfred<span class="sy0">@</span>localhost devel<span class="br0">&#93;</span>$ <span class="kw2">gcc</span> <span class="re5">-O2</span> <span class="re5">-msse2</span> x.c <span class="re5">-o</span> x<br />
<span class="br0">&#91;</span>vilfred<span class="sy0">@</span>localhost devel<span class="br0">&#93;</span>$ .<span class="sy0">/</span>x <br />
memset: 12.233878<br />
fast_zero1: 14.054232<br />
fast_zero2: 25.234491<br />
<span class="br0">&#91;</span>vilfred<span class="sy0">@</span>localhost devel<span class="br0">&#93;</span>$ clang <span class="re5">-O2</span> <span class="re5">-msse2</span> x.c <span class="re5">-o</span> x_clang<br />
<span class="br0">&#91;</span>vilfred<span class="sy0">@</span>localhost devel<span class="br0">&#93;</span>$ .<span class="sy0">/</span>x_clang <br />
memset: <span class="nu0">11.982776</span><br />
fast_zero1: <span class="nu0">14.160926</span><br />
fast_zero2: <span class="nu0">11.924667</span><br />
<span class="br0">&#91;</span>vilfred<span class="sy0">@</span>localhost devel<span class="br0">&#93;</span>$ .<span class="sy0">/</span>x_clang <br />
memset: <span class="nu0">13.354430</span><br />
fast_zero1: <span class="nu0">14.112488</span><br />
fast_zero2: <span class="nu0">13.005126</span><br />
<span class="br0">&#91;</span>vilfred<span class="sy0">@</span>localhost devel<span class="br0">&#93;</span>$ <br />
<br />
&nbsp;</div></fieldset></p>]]></description>
</item>
<item>
<title>Re:Кэп: а ведь все процессоры разные!</title>
<link>https://rulinux.net/message.php?newsid=42979&amp;page=1#221150</link>
<guid>https://rulinux.net/message.php?newsid=42979&amp;page=1#221150</guid>
<pubDate>Tue, 05 Sep 2017 04:59:31 +0300</pubDate>
<description><![CDATA[<p>Вот, аналогично атому. stream вариант даже быстрее оказался. Эх, был бы какой-нибудь спец, чтобы это растолковать</p>]]></description>
</item>
<item>
<title>Re:Кэп: а ведь все процессоры разные!</title>
<link>https://rulinux.net/message.php?newsid=42979&amp;page=1#221148</link>
<guid>https://rulinux.net/message.php?newsid=42979&amp;page=1#221148</guid>
<pubDate>Mon, 04 Sep 2017 21:25:52 +0300</pubDate>
<description><![CDATA[<p>&gt; результаты для других архитектур приветствуются
<br><br>
Феном нужен?
<br>
<fieldset><legend>bash</legend><div class="highlight bash"><br />
$ <span class="kw2">grep</span> <span class="st_h">'model na'</span> <span class="sy0">/</span>proc<span class="sy0">/</span>cpuinfo <br />
model name &nbsp; &nbsp; &nbsp;: AMD Phenom<span class="br0">&#40;</span>tm<span class="br0">&#41;</span> II X4 945 Processor<br />
model name &nbsp; &nbsp; &nbsp;: AMD Phenom<span class="br0">&#40;</span>tm<span class="br0">&#41;</span> II X4 945 Processor<br />
model name &nbsp; &nbsp; &nbsp;: AMD Phenom<span class="br0">&#40;</span>tm<span class="br0">&#41;</span> II X4 945 Processor<br />
model name &nbsp; &nbsp; &nbsp;: AMD Phenom<span class="br0">&#40;</span>tm<span class="br0">&#41;</span> II X4 945 Processor<br />
<br />
$ <span class="kw2">gcc</span> <span class="re5">--version</span><br />
<span class="kw2">gcc</span> <span class="br0">&#40;</span>GCC<span class="br0">&#41;</span> 7.1.1 20170622 <span class="br0">&#40;</span>Red Hat 7.1.1-3<span class="br0">&#41;</span><br />
<br />
$ clang <span class="re5">--version</span><br />
clang version 4.0.0 <span class="br0">&#40;</span>tags<span class="sy0">/</span>RELEASE_400<span class="sy0">/</span>final<span class="br0">&#41;</span><br />
Target: x86_64-unknown-linux-gnu<br />
Thread model: posix<br />
<br />
<br />
$ <span class="kw2">gcc</span> <span class="re5">-O2</span> <span class="re5">-msse2</span> msse2_test.c <span class="re5">-o</span> msse2_test<br />
<br />
$ .<span class="sy0">/</span>msse2_test <br />
memset: 18.638798<br />
fast_zero1: 16.737441<br />
fast_zero2: 18.569926<br />
<br />
$ clang <span class="re5">-O2</span> <span class="re5">-msse2</span> msse2_test.c <span class="re5">-o</span> msse2_test_clang<br />
<br />
$ .<span class="sy0">/</span>msse2_test_clang<br />
memset: <span class="nu0">19.404087</span><br />
fast_zero1: <span class="nu0">16.751405</span><br />
fast_zero2: <span class="nu0">19.114861</span><br />
<br />
$ .<span class="sy0">/</span>msse2_test_clang<br />
memset: <span class="nu0">18.794867</span><br />
fast_zero1: <span class="nu0">16.618179</span><br />
fast_zero2: <span class="nu0">18.869342</span><br />
<br />
&nbsp;</div></fieldset></p>]]></description>
</item>
<item>
<title>Re:Кэп: а ведь все процессоры разные!</title>
<link>https://rulinux.net/message.php?newsid=42979&amp;page=1#221144</link>
<guid>https://rulinux.net/message.php?newsid=42979&amp;page=1#221144</guid>
<pubDate>Mon, 04 Sep 2017 11:42:03 +0300</pubDate>
<description><![CDATA[<p>Уже прогнал. Отличия незначительные.</p>]]></description>
</item>
<item>
<title>Re:Кэп: а ведь все процессоры разные!</title>
<link>https://rulinux.net/message.php?newsid=42979&amp;page=1#221143</link>
<guid>https://rulinux.net/message.php?newsid=42979&amp;page=1#221143</guid>
<pubDate>Mon, 04 Sep 2017 09:28:44 +0300</pubDate>
<description><![CDATA[<p>Не будет, инфа 100%</p>]]></description>
</item>
<item>
<title>Re:Кэп: а ведь все процессоры разные!</title>
<link>https://rulinux.net/message.php?newsid=42979&amp;page=1#221141</link>
<guid>https://rulinux.net/message.php?newsid=42979&amp;page=1#221141</guid>
<pubDate>Mon, 04 Sep 2017 06:39:36 +0300</pubDate>
<description><![CDATA[<p>На шланге такая же хуйня юудет. Я гарантирую это.</p>]]></description>
</item>
<item>
<title>Re:Кэп: а ведь все процессоры разные!</title>
<link>https://rulinux.net/message.php?newsid=42979&amp;page=1#221140</link>
<guid>https://rulinux.net/message.php?newsid=42979&amp;page=1#221140</guid>
<pubDate>Mon, 04 Sep 2017 03:34:02 +0300</pubDate>
<description><![CDATA[<p>Эх, вангую, хуйню какую-то собрал gcc.</p>]]></description>
</item>
<item>
<title>Re:Кэп: а ведь все процессоры разные!</title>
<link>https://rulinux.net/message.php?newsid=42979&amp;page=1#221137</link>
<guid>https://rulinux.net/message.php?newsid=42979&amp;page=1#221137</guid>
<pubDate>Sun, 03 Sep 2017 19:11:49 +0300</pubDate>
<description><![CDATA[<p>интель G3220, компилил ggc-ой (gcc -O2 -msse2 main.c). <br><br> memset: 6.919018<br><br>fast_zero1: 7.359665<br><br>fast_zero2: 12.774364 </p>]]></description>
</item>
<item>
<title>Кэп: а ведь все процессоры разные!</title>
<link>https://rulinux.net/message.php?newsid=42979&amp;page=1#221134</link>
<guid>https://rulinux.net/message.php?newsid=42979&amp;page=1#221134</guid>
<pubDate>Sun, 03 Sep 2017 17:07:02 +0300</pubDate>
<description><![CDATA[<p>Хотел написать себе функцию очистки экрана между рендерингом фреймов. Посчитал, что обычного memcpy мне мало, я ведь не хочу, чтобы он пачкал мне кэш. И вот, что я придумал:<br><br><fieldset><legend>c</legend><div class="highlight c"><br />
<span class="co2">#include &lt;emmintrin.h&gt;</span><br />
<span class="co2">#include &lt;sys/time.h&gt;</span><br />
<span class="co2">#include &lt;stdlib.h&gt;</span><br />
<span class="co2">#include &lt;stdio.h&gt;</span><br />
<span class="co2">#include &lt;stdlib.h&gt;</span><br />
<span class="co2">#include &lt;string.h&gt;</span><br />
<span class="co2">#include &lt;stdint.h&gt;</span><br />
<br />
<span class="co2">#define N 100000</span><br />
<span class="co2">#define M 20000</span><br />
<br />
<span class="kw4">typedef</span> uint32_t square<span class="br0">&#91;</span>16<span class="br0">&#93;</span> __attribute__<span class="br0">&#40;</span><span class="br0">&#40;</span>aligned<span class="br0">&#40;</span>16<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
<span class="kw4">double</span> gettime <span class="br0">&#40;</span><span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw4">struct</span> timeval tv<span class="sy0">;</span><br />
&nbsp; &nbsp; gettimeofday <span class="br0">&#40;</span><span class="sy0">&amp;</span>tv<span class="sy0">,</span> NULL<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#40;</span><span class="kw4">double</span><span class="br0">&#41;</span>tv.<span class="me1">tv_sec</span> <span class="sy0">+</span> <span class="br0">&#40;</span>0.000001 <span class="sy0">*</span> <span class="br0">&#40;</span><span class="kw4">double</span><span class="br0">&#41;</span>tv.<span class="me1">tv_usec</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> fast_zero1 <span class="br0">&#40;</span>square <span class="sy0">*</span>ptr<span class="sy0">,</span> size_t len<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; size_t i<span class="sy0">;</span><br />
&nbsp; &nbsp; __m128i zero <span class="sy0">=</span> _mm_setzero_si128 <span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span>i<span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span> i<span class="sy0">&lt;</span>len<span class="sy0">;</span> i<span class="sy0">++</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _mm_stream_si128 <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span><span class="br0">&#41;</span><span class="sy0">&amp;</span><span class="br0">&#40;</span>ptr<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#91;</span>0<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">,</span> zero<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _mm_stream_si128 <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span><span class="br0">&#41;</span><span class="sy0">&amp;</span><span class="br0">&#40;</span>ptr<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#91;</span>4<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">,</span> &nbsp;zero<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _mm_stream_si128 <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span><span class="br0">&#41;</span><span class="sy0">&amp;</span><span class="br0">&#40;</span>ptr<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#91;</span>8<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">,</span> &nbsp;zero<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _mm_stream_si128 <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span><span class="br0">&#41;</span><span class="sy0">&amp;</span><span class="br0">&#40;</span>ptr<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#91;</span>12<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">,</span> zero<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> fast_zero2 <span class="br0">&#40;</span>square <span class="sy0">*</span>ptr<span class="sy0">,</span> size_t len<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; size_t i<span class="sy0">;</span><br />
&nbsp; &nbsp; __m128i zero <span class="sy0">=</span> _mm_setzero_si128 <span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span>i<span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span> i<span class="sy0">&lt;</span>len<span class="sy0">;</span> i<span class="sy0">++</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _mm_store_si128 <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span><span class="br0">&#41;</span><span class="sy0">&amp;</span><span class="br0">&#40;</span>ptr<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#91;</span>0<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">,</span> zero<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _mm_store_si128 <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span><span class="br0">&#41;</span><span class="sy0">&amp;</span><span class="br0">&#40;</span>ptr<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#91;</span>4<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">,</span> &nbsp;zero<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _mm_store_si128 <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span><span class="br0">&#41;</span><span class="sy0">&amp;</span><span class="br0">&#40;</span>ptr<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#91;</span>8<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">,</span> &nbsp;zero<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _mm_store_si128 <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span><span class="br0">&#41;</span><span class="sy0">&amp;</span><span class="br0">&#40;</span>ptr<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#91;</span>12<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">,</span> zero<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">int</span> main <span class="br0">&#40;</span><span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; square <span class="sy0">*</span>space <span class="sy0">=</span> aligned_alloc <span class="br0">&#40;</span>16<span class="sy0">,</span> N<span class="sy0">*</span><span class="kw4">sizeof</span><span class="br0">&#40;</span>square<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">int</span> i<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">double</span> time<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; time <span class="sy0">=</span> gettime<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span>i<span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span> i<span class="sy0">&lt;</span>M<span class="sy0">;</span> i<span class="sy0">++</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; memset <span class="br0">&#40;</span>space<span class="sy0">,</span> 0<span class="sy0">,</span> N<span class="sy0">*</span><span class="kw4">sizeof</span><span class="br0">&#40;</span>square<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; time <span class="sy0">=</span> gettime<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="sy0">-</span> time<span class="sy0">;</span><br />
&nbsp; &nbsp; <a href="http://www.opengroup.org/onlinepubs/009695399/functions/printf.html"><span class="kw3">printf</span></a> <span class="br0">&#40;</span><span class="st0">&quot;memset: %f<span class="es1">\n</span>&quot;</span><span class="sy0">,</span> time<span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; time <span class="sy0">=</span> gettime<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span>i<span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span> i<span class="sy0">&lt;</span>M<span class="sy0">;</span> i<span class="sy0">++</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; fast_zero1 <span class="br0">&#40;</span>space<span class="sy0">,</span> N<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; time <span class="sy0">=</span> gettime<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="sy0">-</span> time<span class="sy0">;</span><br />
&nbsp; &nbsp; <a href="http://www.opengroup.org/onlinepubs/009695399/functions/printf.html"><span class="kw3">printf</span></a> <span class="br0">&#40;</span><span class="st0">&quot;fast_zero1: %f<span class="es1">\n</span>&quot;</span><span class="sy0">,</span> time<span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; time <span class="sy0">=</span> gettime<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span>i<span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span> i<span class="sy0">&lt;</span>M<span class="sy0">;</span> i<span class="sy0">++</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; fast_zero2 <span class="br0">&#40;</span>space<span class="sy0">,</span> N<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; time <span class="sy0">=</span> gettime<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="sy0">-</span> time<span class="sy0">;</span><br />
&nbsp; &nbsp; <a href="http://www.opengroup.org/onlinepubs/009695399/functions/printf.html"><span class="kw3">printf</span></a> <span class="br0">&#40;</span><span class="st0">&quot;fast_zero2: %f<span class="es1">\n</span>&quot;</span><span class="sy0">,</span> time<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; free <span class="br0">&#40;</span>space<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
&nbsp;</div></fieldset><br><br>Хочу пользоваться инструкциями, которые пишут мимо кеша (movntdq), группировать их для записи по 64 байта, чтобы был так называемый write combine, и скорость была не хуже, чем и обычного movdqa.<br><br>Вроде всё правильно сделал. Компиляем на 3 машинах одним и тем же clang 4.0.0 с флагом -O2<br><br>И что мы видим на разных архитектурах?<br><br>AMD Bulldozer: <fieldset><legend></legend><div class="highlight c"><br />
memset<span class="sy0">:</span> <span class="nu16">12.772463</span><br />
fast_zero1<span class="sy0">:</span> <span class="nu16">12.993790</span><br />
fast_zero2<span class="sy0">:</span> <span class="nu16">12.628031</span><br />
&nbsp;</div></fieldset> В принципе, ожидаемый результат<br><br>AMD Piledriver: <fieldset><legend></legend><div class="highlight c"><br />
memset<span class="sy0">:</span> <span class="nu16">12.888442</span><br />
fast_zero1<span class="sy0">:</span> <span class="nu16">14.814509</span><br />
fast_zero2<span class="sy0">:</span> <span class="nu16">12.834234</span><br />
&nbsp;</div></fieldset> Вопрос, почему вариант с movntdq такой тормозной тут?<br><br>И самая мякотка: Intel Atom Pineview <fieldset><legend></legend><div class="highlight c"><br />
memset<span class="sy0">:</span> <span class="nu16">44.080462</span><br />
fast_zero1<span class="sy0">:</span> <span class="nu16">21.228028</span><br />
fast_zero2<span class="sy0">:</span> <span class="nu16">44.028512</span><br />
&nbsp;</div></fieldset> Вариант в обход кэша, ВНЕЗАПНО, в 2 раза быстрее. Вот кто объяснит, ЧЯДНТ? Почему на Piledriver'е такая жопа? Алсо, результаты для других архитектур приветствуются, только по возможности используем тот же компилятор (или хотя бы указываем альтернативу).<br><br>Олсо, скоро куплю ризин, оценю его</p>]]></description>
</item>
</channel>
</rss>