|
Message-ID: <CA+E3k930yq3d4RUG8opnqUzcKbFLiFmotTiAkj_jjjMrWf0ZoA@mail.gmail.com> Date: Mon, 3 Jul 2017 07:12:12 -0800 From: Royce Williams <royce@...ho.org> To: john-users@...ts.openwall.com Cc: Denis Burykin <apingis@...nwall.net> Subject: Re: bcrypt cracking on ZTEX 1.15y FPGA boards (bcrypt-ztex) On Sun, Jun 25, 2017 at 9:07 AM, Solar Designer <solar@...nwall.com> wrote: > We finally got the bcrypt-ztex format into bleeding-jumbo this week. Pretty great work - thanks again to you and Denis and anyone else who has been working on this. > The speed is roughly ~106k c/s at bcrypt cost 5 on ZTEX 1.15y without > overclocking, ~114k with overclocking. It should scale almost linearly > with multiple boards (e.g. Denis reported ~103k c/s/board with 3 boards > on the same host). I can't easily measure the power consumption right > now, but I estimate it's ~20W as both the board (with a large but slowly > rotating cooling fan) and the 12V, 5A power adapter (brick) stay barely > warm to the touch. These used to get much warmer in Bitcoin mining > tests (known to be ~40W). Here are some tests on my cluster, as recently described here: http://www.openwall.com/lists/john-users/2017/06/30/1 I discovered today that I had a USB power problem with two boards, which I have fixed. (I had read that these boards require steady power on the USB side, even though they are independently powered.) They are still a little finicky, but I can usually coax them into working now. I now have two more boards for a total of 16, so adjust any calculations accordingly. > Denis' implementation works around our current synchronous crypt_all() > API by buffering a large number of candidate passwords - many times > larger than the number of cores. The current design has 124 bcrypt > cores per chip, so 496 per board. My tests are with "TargetSetting = 5" > (tuning for bcrypt cost 5) in the "[ZTEX:bcrypt]" section in john.conf, > and this results in: > > 0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 63488 I wasn't paying a lot of attention to it at the time, but looking at john.log, unless I've lost track of something, my value was: 0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 262140 ... for values of both 5 and 6 for TargetSetting. My first tests were with all 16 boards. The first test used the default john.conf [ZTEX:bcrypt] TargetSetting = 6 value, with john compiled with the keys_per_crypt *= 2 tweak: $ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8 -mask='?w?l?l?l?l' pw-fake-unix Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:01:54 0g/s 0p/s 1609Kc/s 1609KC/s loveaaaa..loveioia 0g 0:00:04:54 0g/s 0p/s 1611Kc/s 1611KC/s loveaaaa..loveioia 0g 0:00:09:53 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia 0g 0:00:11:56 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia 0g 0:00:12:18 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia 0g 0:00:19:32 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia 0g 0:00:22:06 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia 0g 0:00:24:21 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia 0g 0:00:27:30 0g/s 0p/s 1613Kc/s 1613KC/s loveaaaa..loveioia 0g 0:00:32:16 0.00% (ETA: 2030-12-18 00:34) 0g/s 491.5p/s 1613Kc/s 1613KC/s lovaaani..lovaioli 0g 0:00:43:20 0.00% (ETA: 2035-07-31 03:45) 0g/s 366.0p/s 1613Kc/s 1613KC/s lovaaani..lovaioli 0g 0:00:51:23 0g/s 308.6p/s 1613Kc/s 1613KC/s lovaaani..lovaioli 0g 0:00:57:27 0g/s 276.0p/s 1613Kc/s 1613KC/s lovaaani..lovaioli 0g 0:01:00:56 0g/s 260.3p/s 1613Kc/s 1613KC/s lovaaani..lovaioli 0g 0:01:13:00 0.00% (ETA: 2032-09-23 00:29) 0g/s 434.5p/s 1613Kc/s 1613KC/s lolaaatn..lolaiocn That test ran at ~505W / 16 = ~31.6W per board, which includes the power for the onboard fans. The power consumption actually jumps around quite a bit between 495W and 515W, but 505W seemed about average. The second test was with 16 boards, changing to TargetSetting = 5, and still with keys_per_crypt *= 2: $ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8 -mask='?w?l?l?l?l' pw-fake-unix Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:12 0g/s 0p/s 1625Kc/s 1625KC/s loveaaaa..loveaida 0g 0:00:02:02 0g/s 0p/s 1633Kc/s 1633KC/s loveaaaa..loveaida 0g 0:00:03:14 0g/s 0p/s 1633Kc/s 1633KC/s loveaaaa..loveaida 0g 0:00:08:22 0g/s 0p/s 1633Kc/s 1633KC/s loveaaaa..loveaida 0g 0:00:12:30 0g/s 0p/s 1632Kc/s 1632KC/s loveaaaa..loveaida 0g 0:00:17:57 0g/s 0p/s 1632Kc/s 1632KC/s loveaaaa..loveaida 0g 0:00:21:34 0g/s 0p/s 1632Kc/s 1632KC/s loveaaaa..loveaida 0g 0:00:24:27 0g/s 0p/s 1631Kc/s 1631KC/s loveaaaa..loveaida 0g 0:00:38:52 0.00% (ETA: 2031-03-22 14:56) 0g/s 482.2p/s 1632Kc/s 1632KC/s lovaaaay..lovaaidy 0g 0:00:41:28 0.00% (ETA: 2032-02-20 19:37) 0g/s 452.0p/s 1632Kc/s 1632KC/s lovaaaay..lovaaidy For that test, I'd say that power was very slightly higher, maybe averaging 510W, so ~31.9W per board. But this might be normal variation. So across the cluster, with known tweaks and settings without overclocking, I'm getting 1.632Mc/s for 510W. Next, here are single-board versions of both tests, using the same board. (I did this by disconnecting the other boards. Is there a way to tell john to only use a specific device?) First, TargetSetting = 5, keys_per_crypt *= 2: $ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8 -mask='?w?l?l?l?l' pw-fake-unix SN XXXXXXXXXX: firmware uploaded SN XXXXXXXXXX: uploading bitstreams.. ok ZTEX XXXXXXXXXX bus:1 dev:72 Frequency:141 141 141 141 Using default input encoding: UTF-8 Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:14 0g/s 0p/s 106815c/s 106815C/s loveaaaa..loveaaoa 0g 0:00:03:12 0g/s 0p/s 107169c/s 107169C/s loveaaaa..loveaaoa 0g 0:00:05:44 0g/s 0p/s 107173c/s 107173C/s loveaaaa..loveaaoa 0g 0:00:06:51 0g/s 0p/s 107181c/s 107181C/s loveaaaa..loveaaoa 0g 0:00:10:36 0g/s 0p/s 107190c/s 107190C/s loveaaaa..loveaaoa 0g 0:00:15:34 0g/s 0p/s 107197c/s 107197C/s loveaaaa..loveaaoa 0g 0:00:20:13 0g/s 0p/s 107194c/s 107194C/s loveaaaa..loveaaoa 0g 0:00:24:07 0g/s 0p/s 107199c/s 107199C/s loveaaaa..loveaaoa Then using TargetSetting at the default of 6, keys_per_crypt *= 2 (--progress-every, where have you been all my life?) $ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8 -mask='?w?l?l?l?l' --progress-every=300 pw-fake-unix ZTEX XXXXXXXXXX bus:1 dev:72 Frequency:141 141 141 141 Using default input encoding: UTF-8 Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:01 0g/s 0p/s 102565c/s 102565C/s loveaaaa..loveomaa 0g 0:00:05:00 0g/s 0p/s 106748c/s 106748C/s loveaaaa..loveomaa 0g 0:00:10:00 0g/s 0p/s 106902c/s 106902C/s loveaaaa..loveomaa 0g 0:00:15:00 0g/s 0p/s 106952c/s 106952C/s loveaaaa..loveomaa 0g 0:00:20:00 0g/s 0p/s 106978c/s 106978C/s loveaaaa..loveomaa 0g 0:00:25:00 0g/s 0p/s 106991c/s 106991C/s loveaaaa..loveomaa 0g 0:00:30:00 0g/s 33.04p/s 107002c/s 107002C/s loveaaco..loveomco 0g 0:00:35:00 0g/s 28.32p/s 107010c/s 107010C/s loveaaco..loveomco 0g 0:00:40:00 0g/s 24.78p/s 107015c/s 107015C/s loveaaco..loveomco 0g 0:00:45:00 0g/s 22.02p/s 107019c/s 107019C/s loveaaco..loveomco 0g 0:00:50:00 0g/s 19.82p/s 107023c/s 107023C/s loveaaco..loveomco 0g 0:00:55:00 0g/s 18.02p/s 107027c/s 107027C/s loveaaco..loveomco 0g 0:01:00:00 0g/s 33.04p/s 107031c/s 107031C/s loveaavl..loveomvl Then I enabled the full cluster again. Here are all 16 boards again, with TargetSetting = 5, the keys_per_crypt *= 2 tweak, and Frequency = 152. During this test, I was also trying to coax a 17th board into usability. I include this test anyway because there appears to have been a slight (temporary?) drop in performance associated with the attempt to talk to that board (or it might be a coincidence; I will test further to check this correlation): Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:01:03 0g/s 0p/s 1654Kc/s 1654KC/s loveaaaa..loveaida 0g 0:00:05:00 0g/s 0p/s 1655Kc/s 1655KC/s loveaaaa..loveaida SN XXXXXXXXXX: firmware uploaded SN XXXXXXXXXX: uploading bitstreams.. ok SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type SN XXXXXXXXXX: uploading bitstreams.. ok SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type SN XXXXXXXXXX: firmware uploaded SN XXXXXXXXXX: uploading bitstreams.. ok SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type SN XXXXXXXXXX: uploading bitstreams.. ok SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type 0g 0:00:08:19 0g/s 0p/s 1644Kc/s 1644KC/s loveaaaa..loveaida 0g 0:00:10:00 0g/s 0p/s 1645Kc/s 1645KC/s loveaaaa..loveaida 0g 0:00:15:00 0g/s 0p/s 1649Kc/s 1649KC/s loveaaaa..loveaida 0g 0:00:18:03 0g/s 0p/s 1650Kc/s 1650KC/s loveaaaa..loveaida 0g 0:00:20:00 0g/s 0p/s 1650Kc/s 1650KC/s loveaaaa..loveaida 0g 0:00:25:00 0g/s 0p/s 1651Kc/s 1651KC/s loveaaaa..loveaida 0g 0:00:30:00 0g/s 0p/s 1652Kc/s 1652KC/s loveaaaa..loveaida 0g 0:00:35:00 0g/s 0p/s 1652Kc/s 1652KC/s loveaaaa..loveaida 0g 0:00:40:00 0.00% (ETA: 2031-08-16 00:16) 0g/s 468.6p/s 1653Kc/s 1653KC/s lovaaaay..lovaaidy 0g 0:00:45:00 0.00% (ETA: 2033-05-21 14:48) 0g/s 416.5p/s 1653Kc/s 1653KC/s lovaaaay..lovaaidy 0g 0:00:50:00 0.00% (ETA: 2035-02-25 05:21) 0g/s 374.8p/s 1653Kc/s 1653KC/s lovaaaay..lovaaidy And finally, a more focused example - all 16 boards, a single artificial hash, with bcrypt work factor 12, with the same tweaks: $ cat single-bf.hash $2a$12$S7H1VijH5FFkU/1bWeM98ObKGC6BwfjNnhsPFs3U88yNbYSphoTp. $ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8 -mask='?w?l?l?l?l' --progress-every=300 single-bf.hash Using default input encoding: UTF-8 Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:12 0.00% (ETA: 2017-12-16 07:45) 0g/s 14299p/s 14299c/s 14299C/s loveisxm..lovehjfc 0g 0:00:05:00 0.00% (ETA: 2017-12-17 12:58) 0g/s 14422p/s 14422c/s 14422C/s laliawhy..lalidtdh 0g 0:00:10:01 0.00% (ETA: 2017-12-17 19:40) 0g/s 14417p/s 14417c/s 14417C/s bebeapeq..bebednqq 0g 0:00:15:00 0.01% (ETA: 2017-12-17 17:52) 0g/s 14417p/s 14417c/s 14417C/s lalluepc..lallqhbd 0g 0:00:20:00 0.01% (ETA: 2017-12-17 20:20) 0g/s 14414p/s 14414c/s 14414C/s pinaidtw..pinahzrz 0g 0:00:25:00 0.01% (ETA: 2017-12-17 18:51) 0g/s 14413p/s 14413c/s 14413C/s poleiswt..polehjjm 0g 0:00:30:00 0.01% (ETA: 2017-12-17 20:20) 0g/s 14412p/s 14412c/s 14412C/s locakkyv..locaeocf 0g 0:00:35:00 0.01% (ETA: 2017-12-17 21:23) 0g/s 14412p/s 14412c/s 14412C/s beednaol..beedbyas 0g 0:00:40:00 0.02% (ETA: 2017-12-17 20:20) 0g/s 14414p/s 14414c/s 14414C/s popenwkp..popebtuj 0g 0:00:45:00 0.02% (ETA: 2017-12-17 19:31) 0g/s 14416p/s 14416c/s 14416C/s luiznpnr..luizbnil 0g 0:00:50:00 0.02% (ETA: 2017-12-17 18:51) 0g/s 14417p/s 14417c/s 14417C/s boolnupg..boolbakp 0g 0:00:55:00 0.02% (ETA: 2017-12-17 19:40) 0g/s 14418p/s 14418c/s 14418C/s puthpzto..puthocln 0g 0:01:00:00 0.02% (ETA: 2017-12-17 19:06) 0g/s 14419p/s 14419c/s 14419C/s joespjwb..joesolvk 0g 0:01:01:00 0.03% (ETA: 2017-12-17 18:31) 0g/s 14419p/s 14419c/s 14419C/s johoiouh..johohbdu This pulled about 560W from the wall. I tried to compare this to john on my general-purpose GPU system (which isn't working the way I expect it to, as it appears to only be using one GPU. Not sure what I'm doing wrong yet): $ ./john --format=bcrypt-opencl --device=gpu --fork=6 -inc=lower -min-len=8 -max-len=8 -mask='?w?l?l?l?l' --progress-every=300 --max-run-time=3660 single-bf.hash Using default input encoding: UTF-8 Loaded 1 password hash (bcrypt-opencl [Blowfish OpenCL]) Node numbers 1-6 of 6 (fork) Device 3: GeForce GTX 1080 Device 0: GeForce GTX 1080 Device 5: GeForce GTX 1080 Device 4: GeForce GTX 1080 Device 1: GeForce GTX 1080 Device 2: GeForce GTX 1080 [ptxas info elided] Press 'q' or Ctrl-C to abort, almost any other key for status 1 0g 0:00:01:16 0.00% (ETA: 2037-12-19 05:32) 0g/s 53.38p/s 53.38c/s 53.38C/s GPU:34C lilluela..lilleoya ... but maybe all six GPUs might run at 53.38c/s x 6 = 320c/s? I also compared GPU performance with hashcat. First, with max power throttled down to 150W per card from the default of 180, which is how I usually run: $ hashcat -w 4 -a 3 -m 3200 single-bf.hash ?l?l?l?l?l?l?l hashcat (v3.6.0-44-g21d10215+) starting... OpenCL Platform #1: NVIDIA Corporation ====================================== * Device #1: GeForce GTX 1080, 2028/8113 MB allocatable, 20MCU * Device #2: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU * Device #3: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU * Device #4: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU * Device #5: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU * Device #6: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU Hashes: 1 digests; 1 unique digests, 1 unique salts Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates Applicable optimizers: * Zero-Byte * Single-Hash * Single-Salt * Brute-Force Watchdog: Temperature abort trigger set to 90c Watchdog: Temperature retain trigger disabled. [s]tatus [p]ause [r]esume [b]ypass [c]heckpoint [q]uit => Session..........: hashcat Status...........: Running Hash.Type........: bcrypt $2*$, Blowfish (Unix) Hash.Target......: $2a$12$S7H1VijH5FFkU/1bWeM98ObKGC6BwfjNnhsPFs3U88yN...phoTp. Time.Started.....: Sun Jul 2 20:51:46 2017 (9 mins, 31 secs) Time.Estimated...: Thu Nov 2 04:23:40 2017 (122 days, 7 hours) Guess.Mask.......: ?l?l?l?l?l?l?l [7] Guess.Queue......: 1/1 (100.00%) Speed.Dev.#1.....: 128 H/s (154.07ms) Speed.Dev.#2.....: 125 H/s (157.26ms) Speed.Dev.#3.....: 127 H/s (154.83ms) Speed.Dev.#4.....: 127 H/s (154.09ms) Speed.Dev.#5.....: 128 H/s (154.25ms) Speed.Dev.#6.....: 126 H/s (155.12ms) Speed.Dev.#*.....: 760 H/s Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts Progress.........: 422400/8031810176 (0.01%) Rejected.........: 0/422400 (0.00%) Restore.Point....: 0/308915776 (0.00%) Candidates.#1....: oarieri -> ombreri Candidates.#2....: ovhteri -> oibzana Candidates.#3....: osdyban -> ojkhana Candidates.#4....: opwzana -> ozanana Candidates.#5....: oufgeri -> ocwzana Candidates.#6....: oxckier -> ohydana HWMon.Dev.#1.....: Temp: 35c Fan:100% Util:100% Core:1911MHz Mem:4513MHz Bus:8 HWMon.Dev.#2.....: Temp: 35c Fan:100% Util:100% Core:1873MHz Mem:4513MHz Bus:4 HWMon.Dev.#3.....: Temp: 39c Fan:100% Util:100% Core:1898MHz Mem:4513MHz Bus:16 HWMon.Dev.#4.....: Temp: 35c Fan:100% Util:100% Core:1898MHz Mem:4513MHz Bus:4 HWMon.Dev.#5.....: Temp: 34c Fan:100% Util:100% Core:1911MHz Mem:4513MHz Bus:1 HWMon.Dev.#6.....: Temp: 33c Fan:100% Util:100% Core:1898MHz Mem:4513MHz Bus:1 Returning the GPUs' default max power (180W) made no difference at all for a single $12$ bcrypt hash. In both cases, the GPU system was pulling 500W from the wall, and the GPUs hardly broke a sweat, temperature-wise. There may be ways to get more performance from hashcat for this hash type and work factor, but that will take some research on my part. So if I'm reading this right, for single-hash bcrypt with work factor 12, just using my own hardware and techniques to compare, the best performance available to me so far on FPGA (14419c/s) is about 19 times as fast as the best performance I know how to get on my GPU system (760H/s), at around the same power consumption: FPGA: 14419c/s / 560W = ~25.75c/s/W GPU: 760H/s / 500W = 1.52H/s/W So for a focused, single-hash attack on a modern target using my own gear, FPGA is ~17 times as efficient as GPU? I will also do some testing without the keys_per_crypt *= 2 tweak, and with different keys_per_crypt values, but I wanted to get this posted. Royce
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.