Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120831230740.GE15594@openwall.com>
Date: Sat, 1 Sep 2012 03:07:40 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Questions about compiling for Optimal CPU Performance

On Wed, Aug 29, 2012 at 11:55:14AM -0400, Brad Tilley wrote:
> >> 1. -fopenmp
> >> 2. -fopenmp -msse2
[...]
> I find the former to work better than the latter on 32-bit systems.

This is puzzling.  Normally, build on 32-bit x86 with -fopenmp, but
without -msse2, "should" fail - I've just tried with john-1.7.9-jumbo-6,
building it as linux-x86-sse2, and it failed with:

x86-sse.o: In function `DES_bs_crypt':
(.text+0x40): multiple definition of `DES_bs_crypt'
DES_bs_b.o:DES_bs_b.c:(.text+0x7e2c): first defined here

and so on, because in -jumbo we're adding -msse2 to CFLAGS, but not to
ASFLAGS.  (magnum - BTW, I think that's a minor bug.  Also, the addition
of -msse2 even for john.c is a bug.  That's a john-dev topic, though.)

Without -jumbo, when you don't specify -msse2 and build for 32-bit x86,
you should have a #warning printed by DES_bs_b.c telling you that you'll
only get assembly code, but not OpenMP, and suggesting you to add -msse2.

Brad, why are you building with OpenMP on your Celeron?  Does it have
more than one logical CPU?  If not, then the assembly code for DES is
indeed faster than the thread-safe alternative that an OpenMP build
would use.  Even with two logical CPUs (but one core), the assembly code
is likely faster (using only one of the logical CPUs).  The performance
hit of going from assembly to compiler-generated SSE2 code on 32-bit x86
is just too great.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.