|
Message-ID: <20091215140810.GA5064@openwall.com> Date: Tue, 15 Dec 2009 17:08:10 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Bit slice DES for CUDA On Thu, Dec 10, 2009 at 10:40:45PM +0200, Dennis Yurichev wrote: > Does anybody had attempt to port bit sliced DES routines to CUDA? > I tried (got deseval.c from http://www.darkside.com.au/bitslice/ ) > But it work very slow, because compiled routine consumes a lot of > registers and they shifted to local memory, which is very slow. IIRC, deseval.c uses Matthew's slightly older S-box expressions than the final ones he released as sboxes.c and nonstd.c. If you want to play with his C code, I suggest that you pick nonstd.c. Also, you shouldn't be too concerned about the deseval() function failing to fit everything into registers. Most of the processing time should be in individual S-boxes, not in this "wrapper" function. Thus, you should focus on optimal register usage within the S-boxes. In practice, 16 registers should be enough to implement Matthew's S-box expressions, whereas 8 registers is not enough (a few intermediary values have to be stored in "memory"). This is clearly seen in JtR's x86-64.S vs. x86-sse.S. Just my $0.02. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.