|
Message-Id: <8039EE63-F47E-43FB-94E2-DF929735C338@m.patpro.net> Date: Thu, 23 Mar 2017 21:04:39 +0100 From: Patrick Proniewski <p+password@...atpro.net> To: john-users@...ts.openwall.com Subject: GPU performance Hello, I'm very new to GPU cracking. I've only used few times hashcat on a Windows PC with an old Radeon. Now I have @work a dedicated Linux PC with Nvidia Geforce GTX 1080. I've compiled john on Ubuntu 16.x LTS, following doc/INSTALL-UBUNTU. I've made a simple bench comparing john and hashcat and I'm quite surprised by the results: > patpro@...cracker:~$ ./john/run/john -test -format=raw-sha1 > Benchmarking: Raw-SHA1 [SHA1 256/256 AVX2 8x]... DONE > Raw: 26527K c/s real, 26527K c/s virtual > > patpro@...cracker:~$ ./john/run/john -test -format=raw-sha1-opencl > Device 1: GeForce GTX 1080 > Benchmarking: Raw-SHA1-opencl [SHA1 OpenCL]... Build log: > ptxas info : 0 bytes gmem > ptxas info : Compiling entry function 'sha1' for 'sm_61' > ptxas info : Function properties for sha1 > ptxas . 64 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads > ptxas info : Used 28 registers, 16388 bytes smem, 400 bytes cmem[0], 28 bytes cmem[2] > DONE > Raw: 87208K c/s real, 87208K c/s virtual On one hand, I find it surprising that raw-sha1-opencl on GTX 1080 is only 3.3 times faster than raw-sha1 on 2.1 GHz Xeon (E5-2620 v4). On the other hand, hashcat got a very nice 8191.7 MH/s on the GPU: > patpro@...cracker:~$ ./hashcat/hashcat64.bin -m 100 -b > hashcat (v3.40) starting in benchmark mode... > > * Device #2: WARNING! Kernel exec timeout is not disabled, it might cause you errors of code CL_OUT_OF_RESOURCES > See the wiki on how to disable it: https://hashcat.net/wiki/doku.php?id=timeout_patch > nvmlDeviceSetPowerManagementLimit(): Insufficient Permissions > > OpenCL Platform #1: Intel(R) Corporation > ======================================== > * Device #1: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, skipped > > OpenCL Platform #2: NVIDIA Corporation > ====================================== > * Device #2: GeForce GTX 1080, 2027/8110 MB allocatable, 20MCU > > Hashtype: SHA1 > > Speed.Dev.#2.....: 8191.7 MH/s (81.86ms) > > Started: Thu Mar 23 21:21:58 2017 > Stopped: Thu Mar 23 21:22:01 2017 I wonder if hashcat's H/s is the same thing as john's c/s. I wonder if I've made something wrong when compiling john with OpenCL that could explain the low 3.3x gain. I'm not saying hashcat is better. I just want to understand the difference here (and I'm a big fan of john). Any info welcome! thanks. john --list=build-info Version: 1.8.0.9-jumbo-1-bleeding Build: linux-gnu 64-bit AVX2-ac OMP SIMD: AVX2, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1 $JOHN is ./john/run/ Format interface version: 14 Max. number of reported tunable costs: 3 Rec file version: REC4 Charset file version: CHR3 CHARSET_MIN: 1 (0x01) CHARSET_MAX: 255 (0xff) CHARSET_LENGTH: 24 SALT_HASH_SIZE: 1048576 Max. Markov mode level: 400 Max. Markov mode password length: 30 gcc version: 5.4.0 GNU libc version: 2.23 (loaded: 2.23) OpenCL headers version: 2.0 Crypto library: OpenSSL OpenSSL library version: 01000207f OpenSSL 1.0.2g 1 Mar 2016 GMP library version: 6.1.0 File locking: fcntl() fseek(): fseek ftell(): ftell fopen(): fopen memmem(): System's john --list=opencl-devices Platform #0 name: Intel(R) OpenCL, version: OpenCL 2.0 LINUX Device #0 (0) name: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz Device vendor: Intel(R) Corporation Device type: CPU (LE) Device version: OpenCL 2.0 (Build 25) Driver version: 1.2.0.25 Native vector widths: char 32, short 16, int 8, long 4 Preferred vector width: char 1, short 1, int 1, long 1 Global Memory: 31.0 GB Global Memory Cache: 256.2 KB Local Memory: 32.0 KB (Global) Max memory alloc. size: 7.0 GB Max clock (MHz): 2100 Profiling timer res.: 1 ns Max Work Group Size: 8192 Parallel compute cores: 32 Speed index: 537600 Platform #1 name: NVIDIA CUDA, version: OpenCL 1.2 CUDA 8.0.0 Device #0 (1) name: GeForce GTX 1080 Device vendor: NVIDIA Corporation Device type: GPU (LE) Device version: OpenCL 1.2 CUDA Driver version: 375.39 [recommended] Native vector widths: char 1, short 1, int 1, long 1 Preferred vector width: char 1, short 1, int 1, long 1 Global Memory: 7.0 GB Global Memory Cache: 320.3 KB Local Memory: 48.0 KB (Local) Max memory alloc. size: 1.0 GB Max clock (MHz): 1733 Profiling timer res.: 1000 ns Max Work Group Size: 1024 Parallel compute cores: 20 CUDA cores: 2560 (20 x 128) Speed index: 4436480 Warp size: 32 Max. GPRs/work-group: 65536 Compute capability: 6.1 (sm_61) Kernel exec. timeout: yes NVML id: 0 PCI device topology: 03:00.0 PCI lanes: 16/16 Fan speed: 27% Temperature: 37°C Utilization: 0%
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.