john-users - jtr 1.7.7 jumbo 1 opencl 02 patch

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1305052689.3864.26.camel@cthulhu.linuxasylum.net>
Date: Tue, 10 May 2011 20:38:09 +0200
From: Samuele Giovanni Tonon <samu@...uxasylum.net>
To: john-users <john-users@...ts.openwall.com>
Subject: jtr 1.7.7 jumbo 1 opencl 02 patch

hello,
i've just uploaded to the wiki opencl patch 02 .
i'm posting on john-users to get more tester however i'll go a bit
on technical details if someone is interested in look to the code.

The following patch should fix segmentation fault problem on NT format
and there are also some improvements in terms of speed.

The following formats at the moment are supported:
raw_sha1 ( raw-sha1-opencl )
raw_md5  ( raw-md5-openc )
NT       ( nt_opencl )
NSLDAPS  ( ssha-opencl )

Code should work on Nvidia and Ati GPU, unfortunately i wasn't able to 
test on Nvidia Cards please feel free to report me any problem .

for ATI users, be sure ATISTREAMSDKROOT is configured and point to the
right place, for nvidia users be sure CL/cl.h is in /usr/include or
/usr/local/include 

Quick Readme: 
for those interested in some tweaks you have to look for
 _NUM_KEYS defines and local_work_size variable, these two are the one
responsible for how many password to try at time and how big is the
local work size given to the GPU .
NUM_KEYS for rawsha1 and NSLDAPS are integrated in the .c code and
passed as argument to .cl kernel, this is a fancy addon that help you
test faster.

Changes:
* NSLDAPS i moved sha update (salt) to cl code to save bandwith 
  transfer, getting some minor performance
* NT_opencl has been separated from NT_fmt for better understanding
  of the code
* cmp_all and cmp_one have been improved: if cmp_all is TRUE then i
  "download" all the remaining hashes in outbuffer2 once, so all the
  remaining cmp_one call just check on local data instead of repeatedly
  calling clEnqueueReadBuffer for each hash and for each password.
  this made me gain a 20% more performance

I need benchmarkers willing to test the formats and post me speed
comparison as well as GPU specs example:

Ati Radeon HD 6970 on linux 64 bit.

Benchmarking: Raw SHA-1 OpenCL [SHA-1]... 
Kernel path is : ./sha1_opencl_kernel.cl
OpenCL Platform: <<<ATI Stream>>> and device: <<<Cayman>>>
DONE
Many salts:     14155K c/s real, 15728K c/s virtual
Only one salt:  15702K c/s real, 15548K c/s virtual

Benchmarking: Raw MD5 [raw-md5-opencl]... 
Kernel path is : ./md5_opencl_kernel.cl
OpenCL Platform: <<<ATI Stream>>> and device: <<<Cayman>>>
DONE
Raw:    64368K c/s real, 63736K c/s virtual

Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... 
Kernel path is : ./ssha_opencl_kernel.cl
OpenCL Platform: <<<ATI Stream>>> and device: <<<Cayman>>>
DONE
Many salts:     35651K c/s real, 36011K c/s virtual
Only one salt:  23301K c/s real, 23301K c/s virtual

Benchmarking: NT MD4 [OpenCL 1.0]... 
Kernel path is : ./nt_opencl_kernel.cl
OpenCL Platform: <<<ATI Stream>>> and device: <<<Cayman>>>
DONE
Raw:    28550K c/s real, 28835K c/s virtual



TODO: 
* better handling of *NUM_KEYS and local_work_size 
* fix for nsldaps and single mode memory issue
* compare binary and hashes on .cl code and return one partial hash
  (for get_hash functions) and an array of 0|1 depending if hashes
  matched with binary.


Cheers
Samuele
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.