|
Message-ID: <20130627031135.GB15136@openwall.com> Date: Thu, 27 Jun 2013 07:11:35 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com, "Sc00bz64@...oo.com" <sc00bz64@...oo.com> Subject: Re: Anyone want to benchmark AVX2 code for bcrypt On Wed, Jun 26, 2013 at 09:09:27AM -0700, Sc00bz64@...oo.com wrote: > So not using AVX2 is faster. Ouch. One thing to check, though: is the s[] array in bcryptAVX2() 256-bit aligned? It is possible that the stack is only 128-bit aligned. I'd try aligning s[] in a more reliable manner, although my guess is that for gather loads this won't matter. > One reason might be that it runs out of L1 cache (needs more than 32.5 KiB but there's only 32 KiB of L1) and has to hit L2. You could try interleaving two instances where each would use 128-bit vectors with _mm_i32gather_epi32(), etc. This should help hide the latencies on these loads, including those resulting from occasional L1 cache misses (when one of the two instances is stalled waiting for L2 cache read, the other can typically proceed further). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.