|
Message-ID: <4B998C46.2020103@bredband.net> Date: Fri, 12 Mar 2010 01:35:18 +0100 From: "Magnum, P.I." <rawsmooth@...dband.net> To: john-users@...ts.openwall.com Subject: Re: Is JTR MPIrun can be optimized for more cores ? RB wrote: > 2010/3/10 Solar Designer <solar@...nwall.com>: >>> Please be aware that the MPI patch by itself induces (as I recall) >>> 10-15% overhead in a single-core run. >> Huh? For "incremental" mode, it should have no measurable overhead. > > This is what my experimentation showed. Whether it's the MPI > initialization or something else, the difference between patched and > non-patched was statistically significant on my Phenom x4. I'll > repeat the tests to get more precise numbers, but it's why I made sure > it was optional. I did some testing. First an --inc=digits against 1634 DES hashes with the same salt, until completion. 129 of the hashes was cracked. - john-jumbo2: 1m27.074s - mpijohn on one core: 1m27.045s - mpijohn on two cores: 1m7.025s The lousy figure when using two cores is because one of the cores completed its run after just 22 seconds! Not optimal. I tried some longer jobs running alpha but the problem remains, one job completes long before the other. In real world usage, incremental mode is not supposed to complete and this won't be that much of a problem. On the other hand, the problem will be much larger using many more cores. Anyway, the tests show that MPI-john has no overhead in itself, just as I expected. Running on one core, it performs just as vanilla john. It's *only* a matter of how we split the jobs. So I did another test, using my MPI patches that auto-splits the Markov range. In this mode the workload is always evenly split, and this mode is *supposed* to run to completion. I ran -markov:250 against 15 LM hashes until completion. 10 were cracked: - john-jumbo2: 10m0.313s - mpijohn on one core: 10m1.249s - john-jumbo2 manually parallel: 5m13.690s - mpijohn on two cores: 5m14.277s This is less than 0.2% overhead. Actually, all MPI overhead occurs before and after the jobs actually run. For a 100 times longer run, the overhead *should* be 1/100 of the one seen here - and thus completely insignificant. The tests was run on Linux/amd64, using MPICH2, running on some Intel laptop core2 thingy with cpu speed pegged to 1.6 GHz (due to some problems with heat ^^ ) FWIW: in wordlist and single modes, john-fullmpi will currently leapfrog rules if used, and otherwise leapfrog words. I haven't yet tested if the latter would be better all the time. If so, and when loading wordlist to memory as introduced by the jumbo patch, an mpi job should ideally load only its own share of words. It's not very sensible to load the full 134 MB "rockyou" wordlist in 32 copies to memory on a 32 core host. cheers magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.