|
Message-ID: <20180723194135.GA14810@openwall.com> Date: Mon, 23 Jul 2018 21:41:35 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Cc: Denis Burykin <apingis@...nwall.net> Subject: Re: sha512crypt & Drupal 7+ password cracking on FPGA On Mon, Jul 23, 2018 at 08:40:48PM +0200, Jens Steube wrote: > For sha512crypt I'm getting around 377kH/s on all four GPU. That > translates to ~94300 per 90W. > For Drupal7 I'm getting around 156kH/s on all four GPU. That translates > to ~39000 per 90W. > > This is a weird result on the first look. Why, it looks reasonable to me. Thanks for sharing it. > If I understand your > measurements correctly a single quad FPGA board is doing 54600H/s at 40W > on sha512crypt and 16600H/s at 40W on Drupal7. If you scale this up to > 90W, it's 122850H/s per sha512crypt and 37350H/s per Drupal7. That means > from power consumption perspective it's 30% faster than the GPU for > sha512crypt, but at the same time it's slower for Drupal7? Right. Our sha512crypt and Drupal7 on FPGA are basically same speed in terms of their underlying SHA-512 hashes computed per second. Like I mentioned, our Drupal7 could have been more optimal in a specialized design without support for unaligned access and maybe without the soft CPUs at all (we could have freed up that logic to have up to 25% more SHA-512 cores maybe), but we got it almost for free here (on top of the sha512crypt design), so we're happy. On GPU, you actually take advantage of Drupal7's relative simplicity, as you say: > The reason > here is the branches in the loop function in sha512crypt which is a > special case. GPU's really don't like them. Actually, when all passwords loaded on the GPU at once are of the same length I guess the branches don't hurt much. What hurts is the need to support unaligned accesses - and I guess you avoid this overhead in your Drupal7 kernel. > IOW, the GPU implementation > for all *crypt algorithms is a bit below it's theoretical maximum. In > Drupal7 (and PBKDF2 and most other KDF) there's no such branches in the > loop thus the GPU can perform at full speed on all compute units. > > As you can see here the GPU of today are pretty close when it comes to > power consumption to a FPGA board. I know that ztex boards are old now > and that there's better solutions, but the same as with newer GPU, see > alone the V100. I'm happy with the results. Right. Spartan-6 was introduced in 2009(?) on a 45nm process, and as budget series (Virtex-6 were larger and faster). NVIDIA Pascal was introduced in 2016 and on a 16nm process. So there's bigger potential for improvement by switching from Spartan-6 to current UltraScale+ FPGAs (2016, 16nm) than from Pascal to Volta (2017-2018, 12nm). V100 is about twice larger than GTX 1080. VU9P as offered on AWS F1 is ~16x larger than our Spartan-6 LX150 (so ~4x larger than our boards) and also faster (we'll have higher clock rate - e.g., I saw mentions of it running Keccak at 700+ MHz as a power consumption stress-test that altcoin miners now use). And this isn't even the largest FPGA (but apparently larger ones are unrealistic to cool at full utilization). The drawback is price. Thousands of those boards tweaked for cryptocurrency mining (lower core voltage, etc.) were recently offered and quickly sold out to altcoin miners for $3600 each. Original are called VCU1525, tweaked are BCU1525 - you might want to Google them and the reported altcoin mining speeds vs. GPUs. I didn't look into this closely yet, but if people are buying these then there must be significant advantage. Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.