|
Message-ID: <20051221023242.GA17029@openwall.com> Date: Wed, 21 Dec 2005 05:32:42 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: john improvement suggestions Hi Radim, First of all, thank you for your suggestions - and thank you for posting them in here rather than sending them to me privately. :-) On Mon, Dec 19, 2005 at 09:48:30PM +0000, Radim Horak wrote: > 1. Bugs and annoyances > - I have passwords (Traditional DES) from some old linux box, that are longer > than 16 chars, ie. consist of 3 hashes (crypt24?). John ignores such passwords > completely. I have tested them by manually cutting them. The 3rd hash uses salt > from the beginning of 2nd hash as 2nd hash uses the salt from beginning of the > 1st hash. I cannot provide the hashes nor I have access to that old linux box. Yes, this should be implemented, but I wanted to see some samples from commercial Unices first: http://article.gmane.org/gmane.comp.security.openwall.john.user/165 > - Is it possible to disable log Currently, no (although you can symlink it to /dev/null). This is a reasonable request. I've added it to the TODO for consideration for post-1.7. > to avoid very large log files (with certain rules)? If you have a lot of rules (perhaps due to the rule preprocessor), you would run into another problem first: John tries to validate the syntax of all rules (after preprocessor expansion) at startup. If some of your preprocessor commands would expand into millions of rules, John startup would be slow. Maybe this syntax check should be made optional, or maybe it should be limited to the first million of rules. > 2. Compilation > - I have tested MSVC compiler with john. > (freely avail. here: http://msdn.microsoft.com/visualc/vctoolkit2003/) > I compiled the encryption routines with these parameters: > cl /c /TC /G7 /Ox /O2 LM_fmt.c > and the rest (I guess more POSIX dependent files) under the cygwin. The result > was notable improvement in Lanman DES cracking speed (around 19% increase), on > AthlonXP cpu (comparing to gcc -O3 -march=athlon-xp). Other algorithms were > performing at the same speed. I really don't think that the performance improvement you've observed is related to this compiler producing better code (I do not know whether that is the case). LM_fmt.c is not really performance critical. For LM hashes on x86/MMX, the performance critical code is primarily in DES_bs.c and indeed in x86-mmx.S. One possible guess is that the apparent improvement is due to the changed code and data placement in L1 instruction/trace and L1 data caches. That's pure luck. The code could have become "slower" just as easily. John does make an attempt to reduce this kind of uncertainty by keeping all the frequently accessed data structures together (struct DES_bs_all), and 19% sounds excessive for x86 (it would not surprise me on Alpha with direct-mapped caches - I've seen much worse things happen there). Here's another guess: maybe you were not building for MMX at all? If your benchmarks are for pure C code and you were recompiling DES_bs* as well, then I'm not surprised. There can be this kind of a difference between C compilers, especially on x86. But the important thing is that it is irrelevant since anyone who cares about performance would be making use of the MMX code. Can you provide the specific "john --test" outputs for both builds and tell us your CPU clock rate (real and P4 rating)? Overall, I do not see a need to support building John with MSVC. That would complicate the code with more #ifdef's for no good reason. > - At this point, I'd like to see some way to increase the benchmarking time to > get more coherent/stable/exact figures. As explained above, it's not only the short benchmark time which is responsible for the differences. There will be differences between different builds and between different invocations of John. Of the supported hash types, LM hashes will usually be impacted the most since they are very quick resulting in frequent branches between pieces of code which come from different object files. (Yes, a possible optimization would be to combine pieces of code from inc.c, cracker.c, LM_fmt.c, DES_bs.c, and DES_bs_b.c or x86-mmx.S in one source file. But it would make the program hard to maintain.) Anyway, the benchmarking time can be adjusted: it's BENCHMARK_TIME in params.h. You may also want to patch the __CYGWIN32__ checks out of bench.c, forcing it to display virtual (processor) time c/s rates on Windows. (Normally, this is disabled since it does not work right on Windows 9x.) > 3. Optimizations > - In bigcrypt and other splitted psw. hashes, the first part(s) of the password > has known length of 8 (7 in LM). When cracking such passwords, the password > candidates (either from dictionary or incremental mode) that are shorter should > be filtered out (skipped). Yes, I had this idea, too, and it's been suggested a few times by others. Unfortunately, I do not see a way to implement it without introducing much code complexity and/or performance overhead. For the saltless LM hashes there's almost no extra cost in trying shorter candidate passwords against all of the loaded hashes if there remains at least one hash for which that is needed. So let's concentrate on bigcrypt: For each given salt, there may be hashes both of known plaintext lengths (non-last parts of split encodings) and of unknown ones (non-split or last parts of split encodings). There will have to be per-salt flags indicating whether there are still any uncracked unknown-plaintext-length hashes for that salt. There will need to be quite some extra code to maintain those flags (it won't be performance critical, though). There will also need to be per-hash flags indicating whether the plaintext length for this hash is known. Then, for salts with only known-plaintext-length hashes, some candidate passwords will have to be skipped. However, those same candidate passwords will need to be tried against other salts. So the number of candidate passwords to try will be different for different salts. Unfortunately, this is not in line with the way John does things currently and with algorithms such as bitslice DES (which assume a fixed number of keys per invocation). To overcome this difficulty, John would need to maintain two separate sets of buffered keys (doubling the number of buffered keys - while L1 data cache size remains constant!) and invoke the low-level crypto routines for the appropriate sets of salts whenever either buffer fills up. Does this subtle optimization for the rarely-used bigcrypt justify the added complexity and the likely slight reduction in c/s rate? I am really not sure. I'd rather spend my time on more obvious improvements. (A similar approach - also maintaining two sets of buffered keys - could help to bitslice AFS passwords hashing achieving some great speedup. If those hashes were very common, I would do that. But they are not.) > - Optimize regexp expansion order [a-z] according to some general/usage > statistics. That would cause much confusion. As Frank has suggested, you can always do such optimizations on your own. > - When there are psw. hashes cracked in session, does john skip cracking them > immediatelly? Yes, it won't be wasting any time trying to crack whatever it has already cracked. > Or is there a session reload needed? No, except when you have multiple instances of John share the same john.pot (which is supported). > 4. Other improvements > - I use adhoc "wordlist" rules a lot, could there be some kind of this > functionality: -rules:wordlist_set_1 for selecting wordlist rules?? Yes, it's already on TODO for post-1.7. I'm not sure about the exact syntax and semantics that I'd implement yet. > - Is it possible to execute mmx and non-mmx instructions in x86 CPUs in > parallel? If so would it be possible get advantage of it in john? Yes - for some instruction pairs and, consequently, for some hash types. On some CPUs, ALU+SSE, MMX+SSE, and even ALU+MMX+SSE could be beneficial, too - and people have been doing that. There's definitely room for improvement here. However, please note that the current MMX code in John does use several of the regular non-MMX registers (for array subscripts, etc.) - so there are not many registers available to do other calculations. Also, the non-MMX instruction set and timings are less suitable for bitslice DES. > - The cygwin version of john can't access files (ie. wordlists) larger then 2 > GB, would you consider a more win32 native/friendly version of john? (mingw?) This has nothing to do with Cygwin or Windows. John does not implement LFS, so it won't support files larger than 2 GB on 32-bit systems (but will happily work with larger files when built natively on 64-bit systems). I _might_ implement LFS eventually, if it would still make sense (that is, if people would be seriously using John on 32-bit systems by the time). > - I think that the complexity gap between wordlist mode (even with rules) and > incremental modes is too big, there is a "undiscovered country" of various > optimizations/complexity reductions etc. Ie. like generating only "pronouncable" > passwords I disagree. There used to be an optional built-in filter for word-like passwords in ancient versions of John, but when I improved "incremental" mode for John 1.5+, I dropped that for good. John effectively stores statistical information on trigraphs (and on arbitrary trigrams) in .chr files. I don't think you can be generally smarter than that when programming a filter manually. Besides, you might want to be trying the less likely candidate passwords later on. If you _really_ want to filter out non-pronounceable passwords, you can do that with an external filter() that you would specify in john.conf, then invoke John like: ./john -i=alpha -e=filter_pr0n passwd For a better c/s rate (but a highly non-optimal order of tries), you could define an entire external mode that would produce only pronounceable passwords. But I really do not recommend that you do that, except in very few special cases, -- e.g., enforcing a particular password policy or exploiting the properties of a password generator: > (http://www.multicians.org/thvv/gpw.html) Now, it could make sense to match a very particular password generator like that with a John external mode. The "Strip" cracker currently in the default john.conf (of 1.6.40) does something similar. The same could be done to search the keyspaces used by the various pronounceable password generators (those keyspaces are often not large enough). Additionally, many of those password generators do not produce a uniform distribution. So a combination of a .chr file generated from lots of actual passwords produced by the particular generator and an external filter() which would make John obey the rules of that generator might work best. > or the recent addition of alpha-numeric charset. Actually, in most cases all.chr should be used. I believe most of the uses of the new alnum.chr would be because of a lack of understanding of how "incremental" mode works. But the demand for this was just too high so I got tired of explaining why this was not needed except in very few cases. (I've been reminded that one of those cases is testing for compliance to password policies that require alphanumeric plus one shift character.) > Another good thing to do would be to re-feed the cracked passwords Yes, this is on TODO for post-1.7 - but no promises (there are way too many items on TODO). > with the single-mode rules. As Frank has pointed out, some of the "single crack" mode rules may only be used on word pairs. > 5. Time-memory trade-off for Traditional DES > - I know. I know that there are salts that make it less effective, but I think > it's still worth (I assume 4096-times more memory needed for the equivalent of > non-salt performance, or 1024-times more memory for 1/16th of non-salt > performance. > Ophcrack (http://lasecwww.epfl.ch/~oechslin/projects/ophcrack/) uses 701 MB of > tables for alphanumeric LM passwords which is comparable to smallalphanumeric > Traditional Unix DES and 701 GB is quite acceptable storage space today, much > more tomorrow. Even pure smallalphabetical table would be beneficial and much > less demanding.) Yes, this could be done, but I think that would be a separate program. I could roll it into John the Ripper for "marketing" reasons, though. ;-) (No, I am not planning to work on that in the near future. But I do not rule out the possibility either.) > 6. There was mentioned a non-public sse-optimized version of john in this forum. > Is there a chance we (public) will ever get our hands on it? Yes, definitely. > If so, when? I don't really know. This is not planned for the 1.7 release, but it will likely appear in 1.8, or maybe in a "1.7 Pro". The current code is suboptimal and there's no proper SSE detection and fallback to MMX (rather, the program would simply segfault if the CPU and/or OS would lack SSE support). It would also run _slower_ than MMX on all CPUs except for Intel Pentium 4 (and its flavors, including P4 Celerons and P4 Xeons), so maybe those would need to be detected, too. Yes, I realize that P4s are indeed widespread. Thanks again, -- Alexander Peslyak <solar at openwall.com> GPG key ID: B35D3598 fp: 6429 0D7E F130 C13E C929 6447 73C3 A290 B35D 3598 http://www.openwall.com - bringing security into open computing environments Was I helpful? Please give your feedback here: http://rate.affero.net/solar
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.