|
Message-ID: <2296.84.188.252.232.1146177354.squirrel@www.jpberlin.de> Date: Fri, 28 Apr 2006 00:35:54 +0200 (CEST) From: sebastian.rother@...erlin.de To: john-users@...ts.openwall.com Subject: Re: Performance tuning > It's not registers which "perform". There are x86/MMX or x86-64/SSE > instructions which are translated into one or more micro-ops. Some of > those micro-ops may have latencies of greater than 1 cycle. Both > micro-op counts and their latencies might differ for micro-ops generated > for x86/MMX vs. x86-64/SSE. That's the theory - to answer your question > ("how can it be true"). Interesting :) > However, I've based my brief analysis primarily on the actual benchmarks > I had performed. According to those benchmarks, MMX bitwise ops deliver > better performance per-bit than SSE ones do, despite SSE registers being > twice wider, on Pentium 3 and on AMD processors - but SSE is actually > somewhat faster than MMX per-bit on Pentium 4 processors. In other > words, SSE instructions perform more than twice slower than MMX ones do > on P3 and AMD, but less than twice slower on P4. Of course, this may > change with future processors of either or both vendors. Wouldn`t it be better to benchmark /during compilation) on the CPU itself wich kind of implementation performs faster? You may benchmarked it on AMD CPUs but also on AMD64-base CPUs? The K7 and K8 famaly are not the same and I belive you if you say it does not perform tht well on an AMD K7 CPU. Please point out the CPU-Famalies you tested (Also P4 != P4.. wich core? and maybe also importent: Stepping?). >> Related to the Co-Processors: > > Sebastian, Frank - thank you for the links. I'll have a look a bit > later and comment in here if appropriate. No problem ;) Kind regards, Sebastian
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.