|
Message-ID: <BLU0-SMTP464824265B10CBE8632A8D6FD3F0@phx.gbl> Date: Tue, 17 Apr 2012 19:48:06 +0200 From: Frank Dittrich <frank_dittrich@...mail.com> To: john-users@...ts.openwall.com Subject: Re: Crowd-sourcing statistics and rules On 04/16/2012 03:36 AM, Rich Rumble wrote: > With all this talk of pattern matching/finding, could it also be time > to look at updating JtR's rules and giving anonymous feed back on > rules? Yes, I think it may be time to evaluate if john's default rule sets for word list or single mode need to be adjusted. > With the clients I audit, I don't see much variation... > password incrementing and using the company name or products in the > passwords are very evident. This is to be expected, if users are forced to switch password regularly. For some web sites where users don't have to change their passwords frequently, users can also pick their user name, which usually is not the case in a business environment. That's why, users are much more likely to pick a password that is somehow based on their user name. > JtR has a lot of information about each > cracking session in the log file that could be useful, While useful, this information can be somewhat misleading. The reason is that john is buffering passwords for performance reasons. The buffer size depends on hash algorithm and compile options. That's why, the last rule mentioned prior to a "Cracked ..." line does not necessarily indicate which rule really cracked this password. For larger word lists, the log file will in most cases report the correct rule. For very small word lists and for single mode, the reported rule will more frequently be wrong. With the upcoming GPU support (and further increased buffer sizes), it will also be more likely that the last rule reported in the log file is not the rule which really cracked the password. That's why, just generating the statistics directly from the log file will not provide results that are 100 percent correct. To get correct results, some more effort is required. But may be 100 % correctness is not needed. > John/Jumbo could be patched, but I bet a script could be used just as > well to cut down the minutia, and create more succinct details: > 0:00:01:19 - Rule #15: '-c )?a r l' accepted as ')?arl' (cracked 353) > 0:00:01:30 - Rule #16: '-: <* !?A l p' accepted as '<*!?Alp' (cracked 8) > 0:00:01:39 - Rule #17: '-c <* !?A c p' accepted as '<*!?Acp' (cracked 31) > 0:00:01:47 - Rule #18: '-c <* c Q d' accepted as '<*cQd' (cracked 99) > 0:00:01:56 - Rule #19: '-c >7 '7 /?u' accepted as '>7'7/?u' (cracked 0) > 0:00:01:56 - Rule #20: '>4 '4 l' accepted as '>4'4l' (cracked 0) > 0:00:02:06 - Rule #21: '-c <+ (?l c r' accepted as '<+(?lcr' (cracked 9) > 0:00:02:15 - Rule #22: '-c <+ )?l l Tm' accepted as '<+)?llTm' (cracked 17) > .... > 0:09:30:07 - Trying length 7, fixed @1, character count 31 (cracked 446) > 0:09:37:26 - Trying length 6, fixed @6, character count 47 (cracked 248) As I explained above, a script might provide somewhat incorrect results, but this could still be better than nothing. When you interrupt the cracking session and restart it later, there is a risk to introduce even more errors, e.g., because the word list file has been changed, because the input files with passwords have been changed (so that you either have more or less hashes to be cracked), or the .rec file has been changed. In fact, even if you don't change any of those files manually, the results bay be useless, because you had other sessions running in parallel which cracked many of the passwords before you restarted your session. If you know what you are doing, you can of course avoid most of these problems. But just collecting a lot of log files from a large number of users and generating statistics on them might not work as one would hope. > I think finding out what rules are working for more me personally > could save some time for others as well, it could be interesting to > see if I re-run John on those same passes minus the most "successful" > rules and compare... To do a fair comparison, you'd of course have to start each test with an empty pot file. If you want to avoid repeating the session for slow hashes, you could generate a new input file for --format=dummy and use this one for tests. It should be much faster. Currently, the dummy format is saltless, which means for frequently used passwords, you still get just one line in john.pot. But you could just put multiple lines with the same hash, but different user names into your input file, and then use ,/john--show instead of counting he lines in john.pot. > perhaps for me using 0000-9999 get's me far more > passes than the rule that does 19xx and 20xx date/years and I don't > want the overlap. IMO, you have to consider the ratio of cracked passwords / password candidates or even cracked / ( candidates * salts ). For fast saltless hashes it might be OK to just try all numbers from 0000-9999. For salted hashes, you should consider that 19xx and 20xx will probably more likely than most other numbers (may be except 1234, 1337, 2345, 2468, 1111, and a few others.) So, trying the years first will hopefully reduce the number of remaining salts, and thus reduce the time you need for the other 4 digit numbers. > There are a lot of variables here, > some of the things I stated are moot if the wordlist is small or the > hash is very very fast, Even then you have to adjust your strategy. When I started experimenting with the ca. 140 million raw-md5 hashes published by KoreLogic, I realized that loading 10 million passwords and checking which of these passwords have already been cracked can take much more time than just trying a few rules on a word list. > but I'd be curious to grab more stats not only > from the passwords themselves, but the whole session holds information > we might all benefit from, even if it's not going to get you 5x more > passwords, maybe it gets you the 10-20 really hard ones you've been > going after. Those 10-20 really hard passwords (or similar passwords using the same pattern) might not exist in your next set of hashes. That's why IMHO it is more reasonable to hope to get the more likely passwords cracked faster. This will reduce the number of remaining salts, resulting in a larger number of passwords that can be tried with --incremental or --markov mode. Then, you can try to detect patterns in the passwords you didn't crack using rules, and adjust your strategy. > In closing my very long winded email: “Statistics are like a bikini. > What they reveal is suggestive, but what they conceal is vital.” Thanks for this Aaron Levenstein quote. I didn't know it. Frank
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.