|
Message-ID: <20110322030340.GA1475@openwall.com> Date: Tue, 22 Mar 2011 06:03:40 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: bitslice DES on AVX Hi, So I tested the AVX code on Rembrandt's "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz" (quad-core with HT, 8 logical CPUs). Thanks, Rembrandt! The code previously only tested under Intel's Software Development Emulator worked flawlessly. I tried compiling with gcc 4.4.5 that was installed on Rembrandt's Ubuntu, with a gcc 4.5.0 build I uploaded, and finally with a fresh build of gcc 4.7.0-20110319 (development snapshot). All worked fine. The best performance is achieved with 128-bit AVX operations. 256-bit is slightly slower. SSE2 is slower yet. Here's some relevant info on Sandy Bridge vs. Bulldozer: http://www.realworldtech.com/page.cfm?ArticleID=RWT091810191937&p=10 256-bit could have worked better, but it is seen that it shouldn't have provided much of an advantage over 128-bit as long as the instruction stream allows for parallel execution of two ops almost all of the time. Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 4819K c/s real, 4867K c/s virtual Only one salt: 4080K c/s real, 4080K c/s virtual Benchmarking: Traditional DES [256/256 BS AVX]... DONE Many salts: 4627K c/s real, 4674K c/s virtual Only one salt: 3930K c/s real, 3930K c/s virtual Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 4143K c/s real, 4185K c/s virtual Only one salt: 3583K c/s real, 3583K c/s virtual These are for a single thread, no OpenMP. I've also tried weird combinations, such as AVX+MMX and many others. Of these, 128-bit AVX plus MMX plus 64-bit native (256-bit total achieved in this weird way) got very close to plain AVX speed. Others were substantially slower. *-xop builds failed as expected (should work on future AMD CPUs), like: Benchmarking: Traditional DES [128/256 BS XOP]... Illegal instruction Some other performance numbers to note (with latest gcc): Benchmarking: FreeBSD MD5 [32/64 X2]... DONE Raw: 14530 c/s real, 14676 c/s virtual Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE Raw: 936 c/s real, 936 c/s virtual Benchmarking: dummy [N/A]... DONE Raw: 136552K c/s real, 136552K c/s virtual ("dummy" is a new feature that will be in 1.7.7.) LM hash with new DES key setup (yes, I ported that patch): Benchmarking: LM DES [128/256 BS AVX]... DONE Raw: 63195K c/s real, 63195K c/s virtual OpenMP benchmarks: -omp-des-4 (ported to the current code), SSE2 (reference): Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 15949K c/s real, 2003K c/s virtual Only one salt: 7962K c/s real, 1001K c/s virtual Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE Raw: 4752 c/s real, 598 c/s virtual Another run: Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE Raw: 4848 c/s real, 604 c/s virtual Upgrade to AVX: Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 19095K c/s real, 2401K c/s virtual Only one salt: 8613K c/s real, 1091K c/s virtual GOMP_SPINCOUNT=10000: Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 19070K c/s real, 2420K c/s virtual Only one salt: 9560K c/s real, 2051K c/s virtual Another run: Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 19243K c/s real, 2429K c/s virtual Only one salt: 9682K c/s real, 2060K c/s virtual OMP_NUM_THREADS=4: Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 17817K c/s real, 4476K c/s virtual Only one salt: 9270K c/s real, 2346K c/s virtual Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE Raw: 3302 c/s real, 834 c/s virtual -omp-des-7 (ported to the current code): Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 17448K c/s real, 2228K c/s virtual Only one salt: 13577K c/s real, 1797K c/s virtual Benchmarking: LM DES [128/256 BS AVX]... DONE Raw: 68861K c/s real, 9032K c/s virtual GOMP_SPINCOUNT=10000: Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 17006K c/s real, 2358K c/s virtual Only one salt: 14404K c/s real, 2211K c/s virtual Benchmarking: LM DES [128/256 BS AVX]... DONE Raw: 65126K c/s real, 19211K c/s virtual OMP_NUM_THREADS=4: Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 16108K c/s real, 4087K c/s virtual Only one salt: 14258K c/s real, 3609K c/s virtual Benchmarking: LM DES [128/256 BS AVX]... DONE Raw: 96436K c/s real, 24169K c/s virtual OMP_NUM_THREADS=3: Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 12681K c/s real, 4227K c/s virtual Only one salt: 11403K c/s real, 3813K c/s virtual Benchmarking: LM DES [128/256 BS AVX]... DONE Raw: 91029K c/s real, 30444K c/s virtual OMP_NUM_THREADS=2: Benchmarking: Traditional DES [128/256 BS AVX]... DONE Many salts: 8945K c/s real, 4495K c/s virtual Only one salt: 8208K c/s real, 4104K c/s virtual Benchmarking: LM DES [128/256 BS AVX]... DONE Raw: 79233K c/s real, 39616K c/s virtual That's all for now. AVX support itself will be in 1.7.7. The updated OpenMP patches will be released "against 1.7.7". Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.