Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100630021337.GA12776@openwall.com>
Date: Wed, 30 Jun 2010 06:13:37 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: bitslice DES parallelization with OpenMP

New best benchmark (dual Xeon X5460 3.16 GHz, under some unrelated load):

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     20889K c/s real, 2607K c/s virtual
Only one salt:  5701K c/s real, 711814 c/s virtual

That's over 87% efficiency for the multi-salt case (I say "over"
considering that there was a bit of other load).

guesses: 15  time: 0:00:00:36  c/s: 18655K  trying: zntkzntk - zzzzzzzz

This is john-1.7.6-omp-des-4, already uploaded to:

http://openwall.info/wiki/john/patches

On Wed, Jun 30, 2010 at 04:42:26AM +0400, Solar Designer wrote:
> ... Changing DES_bs_mt from 8 to 96, I am getting a 1% to 2% slowdown on
> an otherwise idle system,

I was too quick to state that.  I forgot that higher DES_bs_mt may also
make it feasible to parallelize set_salt() and even cmp_all().  Taking
care of that and increasing DES_bs_mt further to 192, I reclaimed the
old speed and more on an almost idle system.  On the Core i7 920 2.67 GHz
system, I am now getting:

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     10174K c/s real, 1267K c/s virtual
Only one salt:  4841K c/s real, 602923 c/s virtual

That's 88% efficiency (of 11500K for 8 separate processes).

To avoid wasting CPU time when an actual run is about to terminate -
when it has fewer than a full chunk of candidate passwords yet to test -
I also enhanced the "crypt bodies" to perform only the required number
of loop iterations.  With this, I am getting:

host!solar:~/john/john-1.7.6-omp-des/run$ ./john -e=double --salts=-2 ~/john/pw-fake-unix
Loaded 1458 password hashes with 1458 different salts (Traditional DES [128/128 BS SSE2-16])
simsim           (u2671-des)
[...]
ssssss           (u3087-des)
guesses: 14  time: 0:00:00:03  c/s: 9873K  trying: ajjgajjg - btslbtsl
guesses: 14  time: 0:00:00:09  c/s: 10019K  trying: btsmbtsm - debrdebr
guesses: 14  time: 0:00:00:15  c/s: 10053K  trying: eokyeoky - fyudfyud
woofwoof         (u1435-des)
guesses: 15  time: 0:00:01:02  c/s: 10055K  trying: wtaywtay - ydkdydkd
guesses: 15  time: 0:00:01:08  c/s: 10004K  trying: zntkzntk - zzzzzzzz

So 10M c/s on the Core i7 is achieved in practice.

On the dual Xeon, for which I included the new 20M benchmark at the
start of this message, an actual run now does:

host!solar:~/john$ ./john-omp-des-4 -e=double --salts=-2 pw-fake-unix
Loaded 1458 password hashes with 1458 different salts (Traditional DES [128/128 BS SSE2-16])
simsim           (u2671-des)
cloclo           (u2989-des)
mimi             (u3044-des)
aaaa             (u1638-des)
xxxx             (u845-des)
aaaaaa           (u156-des)
jamjam           (u2207-des)
booboo           (u171-des)
bebe             (u1731-des)
gigi             (u2082-des)
cccccc           (u982-des)
jojo             (u3027-des)
lulu             (u3034-des)
ssssss           (u3087-des)
guesses: 14  time: 0:00:00:01  c/s: 19487K  trying: ajjgajjg - btslbtsl
guesses: 14  time: 0:00:00:06  c/s: 20544K  trying: eokyeoky - fyudfyud
guesses: 14  time: 0:00:00:16  c/s: 18641K  trying: kdvwkdvw - lofblofb
guesses: 14  time: 0:00:00:27  c/s: 18626K  trying: snzgsnzg - tyiltyil
woofwoof         (u1435-des)
guesses: 15  time: 0:00:00:36  c/s: 18655K  trying: zntkzntk - zzzzzzzz

As you can see, it actually exceeds 20M at times, but then goes below
that because of the changing non-John load.

Any feedback?

Anyone to test this on other systems, with other versions of gcc (needs
4.2 or newer, but I only tested 4.5.0), etc?

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.