john-users - Re: OpenMP not using all threads

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAC6_mQN+8L5MXub-4akTuCZ63KtVc7dW4P992p=3sv3T+X2Xrw@mail.gmail.com>
Date: Sat, 19 Nov 2011 22:38:30 -0500
From: Stephen Reese <rsreese@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: OpenMP not using all threads

On Sat, Nov 19, 2011 at 8:35 PM, Solar Designer <solar@...nwall.com> wrote:
> On Sat, Nov 19, 2011 at 07:55:50PM -0500, Stephen Reese wrote:
>> I had a feeling that the 32-bit architecture might be an issue as I
>> noticed that "OpenMP example" was only twice as fast (32-bit OpenMP)
>> instead of four times (64-bit OpenMP).
>> http://openwall.info/wiki/internal/gcc-local-build#OpenMP-example.
>> Though OpenMP example is four times as fast neither the CVS nor
>> stable/patch versions of John would provide the 4x speed-up I was
>> hoping for even on the 64-bit. Maybe XEN and the other respective
>> hosts across the multiple Linodes I am testing are causing roughly a
>> 45 - 60% slowdown from a bare-metal instance but not affecting the
>> "OpenMP Example".
>
> It appears that you simply have unstable system performance (changing
> over time as load from other VMs changes).
>
>> root@:~# time ./loop2
>> 615e5600
>> real    0m2.229s
>> user    0m2.226s
>> sys     0m0.002s
>> root@:~# time ./loop
>> 615e5600
>> real    0m0.333s
>> user    0m1.313s
>> sys     0m0.003s
>
> This would be a 7x speedup if it were for real, but notice how the user
> time decreased as well - indicating that load from other VMs probably
> halved between these two invocations.  You'll need many more invocations
> of your benchmarks to see the overall difference between the different
> builds despite of the changing load.
>
>> What I am trying to achieve: I have 42 DES passwords and three
>> Linodes. Password list is currently split-up so each host has 12
>> entries and are running in incremental mode. Is there a better way,
>> such as specifying a thread per instance on a single host?
>>
>> Is there a performance/time benefit in splitting up the password list
>> amongst multiple hosts or is one host going to achieve the same
>> results as the three?
>
> This depends on the hashes per salt ratio.  You didn't mention how many
> different salts you have.  Is it 42 hashes with 42 different salts?
>
> Anyhow, you may achieve a very slight increase in c/s rate (due to lower
> key setup overhead) by not splitting your 42 hashes (have all nodes load
> all 42), but instead splitting the candidate password space.  However,
> this improvement would probably be negated by slightly less optimal
> order in which candidate passwords would be tested then (e.g., you'd
> split by length: 0-6, 7, 8).  So continuing like you have started is
> fine.  3*12 is 36, not 42, though.
>
> Also note that OpenMP generally performs poorly when the system is
> under other load.  In your case, this other load comes from other VMs.
> Even a 10% load from other processes/VMs may result in a 50% slowdown of
> your task with OpenMP, unfortunately.  And it can be even worse than
> that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706 (yes, you may
> try the GOMP_SPINCOUNT workarounds from there).
>
> As an alternative, you may try an MPI build of -jumbo, even across all
> three of your Linodes.
>
> Or as a simpler alternative, yes, you may choose to use many instances
> of non-OpenMP builds.  Then other load will have less of an effect, but
> the key setup overhead will increase.  The CVS version and the
> -fast-des-key-setup-3 patch (your choice) reduce the key setup overhead,
> though, making it almost negligible.  In 1.7.8 release, it's about 10%
> when cracking just one DES-based crypt(3) hash.  With the newer code or
> the patch, it reduces to about 3%.  You probably lose a lot more than
> that to OpenMP's unfriendliness to system load, so you'll improve things
> overall by going for separate processes.
>
> Alexander
>

Alexander,

Thanks for the quick response. There are 42 hashes and 42 unique salts
(13/node). I am going to change this so there are 42 hashes per node
and specify the length, (1-6, 7, 8 for All.chr).

OpenMP has consistently been around 5000K but I tested another
recommendation of yours for running non-OpenMP due to the previously
discussed system load woes (GOMP_SPINCOUNT did not help). Four
non-OpenMP run at 2000K and a fifth at ~1000K using the same john.pots
and password file via multiple sessions--they seem to even out after a
bit but a combined 9000K is great! This is what I was looking for.

Thanks for your help!
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.