|
Message-ID: <20060427204849.GA17917@openwall.com> Date: Fri, 28 Apr 2006 00:48:49 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Performance tuning Sebastian, You're so religious (this time as it relates to AMD vs. Intel) that you've missed my point. ;-) I wrote, speaking of the possibility to take advantage of the availability of 16 (and not just 8) SSE registers on x86-64: > > The extra registers are indeed very helpful, but the > > slowdown with the move from MMX to SSE on AMD processors is bad enough > > that the extra registers, if used to reduce the instruction count and/or > > to avoid dependencies, would barely compensate for it (of course, this > > is just my guesstimate). > > > > Perhaps this is worth doing for EM64T and for future AMD processors. I expect _no_ significant speedup (if at all) from the use of SSE in 64-bit mode on _current_ AMD64 processors (those I've run benchmarks on), compared to running MMX code on those processors. It's roughly like this: SSE-bitslice-DES is 20% slower than MMX-bitslice-DES on current AMD processors (both 32- and 64-bit); 16regsSSE-bitslice-DES might be 20% faster than 8regsSSE-bitslice-DES - so we arrive at the same performance that we already have with MMX. Of course, that's just a guesstimate. For _current_ EM64T processors, things are different. On Intel P4 processors, including those with EM64T, SSE-bitslice-DES is already faster than MMX-bitslice-DES - and 16regsSSE-bitslice-DES might be faster by another 20%. Thus, the use of 16 SSE registers is likely beneficial for current EM64T processors, maybe for future AMD64 processors, but likely not for current AMD64 processors. That's my way of thinking. No religion or politics involved. (In the above analysis, I considered the trivial conversion from MMX to SSE only. It is possible that certain optimizations would make the SSE code faster on AMD and/or Intel processors. But this is currently an unknown for processors of either vendor.) On Thu, Apr 27, 2006 at 09:49:16PM +0200, sebastian.rother@...erlin.de wrote: > For AMD Motherboards there`s a CO-Processor avaiable wich is > compatible to the AMD-Sockets and wich is more powerfull then a FPGA. > I don`t know the Company anymore but they produce programmable CPUs wich > can be assembled at a f.e. dual CPU Mainboard (one AMD-CPU, one > CO-Processor). > These CPUs are programmable but they`re NOT limited by the PCI-Bus (like > FPGA-based Cards via PCI). So you could speed up some stuff a lot using > those Co-Processors... :) Please post specific references. Thanks, -- Alexander Peslyak <solar at openwall.com> GPG key ID: B35D3598 fp: 6429 0D7E F130 C13E C929 6447 73C3 A290 B35D 3598 http://www.openwall.com - bringing security into open computing environments
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.