|
Message-ID: <A65D8A56F6F88A4CB9CF081E07C8777A173E7478@EX01.dtn.com> Date: Fri, 10 Jul 2009 12:27:03 -0500 From: "Jim Fougeron" <Jim.Fougeron@....com> To: <john-users@...ts.openwall.com> Cc: <jfoug@....net> Subject: Contributing significant changes to the jumbo patch (mostly performance improvements) I posted this email originally to Solar, off list. He requested me to post on-list and start a new thread. I will simply repost the email to Solar in whole, and then start following up with the patches I am working on, along with links to said patches. There are a couple of patches which are sort of walking all over each other in a few places, so I am working to get these diff files to play nice with each other. Jim. *** Original message *** Alexander, I have made several changes, some pretty significant. These are for performance gains. I would like to know how you want to proceed on some of these patches, since they might be big enough to end up being a new 1.7.3-jumbo-6. I was using a fast (raw-md5) algorithm, and was getting about 5% of expected throughput (compared with --test speeds) The changes are: *** Enhance percentages (from 43% to 43.12% i.e. added 1/100'ths %). Not a performance improvement, only impacts markov and wordlist. *** Expanded the fmt_methods structure (*** This is the main one I have questions about, see below ***), to have 5 binary_hash and 5 get_hash functions. This is a HUGE performance gain when the size of the password/hash list is large. I was working with 250k element list (shrunk to 100k quickly). The raw-md5 uses one 'salt' list, and hashes that. Thus, jamming 250k into a 4k hash table is horrible for performance. I also made changes to param, to start using a larger hash table earlier on. I did quite a bit of performance testing, and it now seems pretty decent, up to about 2 million entries (but will again start to degrade after about 750k). The new table sizes are 64k and 1mb entries. Each table jump does 'slightly' reduce performance due to a larger workingset, and fewer memory cache hits. However, a cache miss is MUCH less costly, than a whole comparison of a chained hash entry. This modification can make a huge improvement (5x or better) if the size of the list is large and the algorithm is FAST. Slow algorithms really do not benefit from 'any' of these changes, only the really fast ones. *** Added 'option' to do in-memory file caching. This can benefit a lot of --rules based searches, by 25-45% (after all other improvements). It is pretty drastic. All save --restore is working just fine, same as it would reading directly from a file. The method I used 'might' not be the best in the end. I have a single array I load the file in, then count lines, then allocate an array of int32's, then strtok the stored cache (strtok(NULL, "\r\n")), assigning each new line to the next int 'pointer'. This requires a lot more memory (size of file + 4*(number of lines)). However, doing this, can easily allow for things such as computation of file offset, and also being able to handle things like lines longer than the LINE_BUFFER_SIZE size (which I have increased from 0x400 to 0x2400). The code 'could' be written to simply work with a single file memory buffer (strtok'd for \r\n), and then walk forward to the next line. That type code is workable, but might be slower (would use less memory though), and it would have to be a little more complex to be able to handle situations where 1 EOL char, 2 EOL chars, and lines longer than LINE_BUFFER_SIZE. However, I think it 'could' be done, it is just not how I initially have written it. *** Added a command line switch --mem-file-size=# Default is 5000000 (this is hard coded, and trivial to change). Zero, 'might' be a better default, as it would give exact same run time characteristics as today (i.e. all files read from disk). But I thought a default of 5m would be pretty good (possibly a #define for something like __DJGPP__ or others which might have a smaller memory footprint). When a wordfile is opened, if the size of the file is less than the value from mem-file-size, then the file is loaded into the memory cache (note, the size of the pointer array to the lines is NOT used in computing size). Also, if --save-memory is 1 or more, then --mem-file-size variable is forced to 0, so file is read from disk. *** One bugfix/workaround/side-affect of the memory-file code, is I do an initial fseek to end of file. If that returns a negative number, then I abort john with a message listing the file is too big for the OS to properly read it, as compiled. Again, not a performance improvement, but gives john a way to inform the user that their input is not within valid parameters, allowing them to cut the file. *** The final performance improvement was seen in my initial modification to raw-md5 (I will get this patch in later), when I added SSE2. I did so, and was only processing 4 (or 2 for MMX) password checks per 'run'. At the end of each run, cracker calls fix_state() which for a wordlist.c call, it calls ftell(). Well, this significantly slows the speed down, even though calling this a million times per second for a fast algorithm does not make sense. I initially made this change, and then later made changes to my raw-md5 code to process 64 passwords per 'chunk'. However, I still have this code left in, and it does work fine. I added a command line option --fix-state-delay=# (default 0) and then made changes to wordlist.c so that it only does the 'real' work every # times it is called. Thus --fix-state-delay=50 will improve performance if the _fmt is very fast, and does not do enough work in batches. NOTE this patch may or may not be needed, but for someone starting out with the format's, it can help. Even after going to blocks of 64 for raw-md5, if setting --fix-state-delay=10 there was still a 3-5% improvement in speed. NOTE2, if using a memory file, then we do NOT delay state fixing and do it every time, since it is simply a pointer subtraction from another pointer. NOTE3, this might be a good item to add to the fmt_main structure, to allow a format to set it's 'own' delayed ftell() calling, if it need be. *!*!*!*!* Ok, now for the main question. In the changes to the ftm_methods means that all *_fmt.c files 'should' be re-worked. However, pulling out, and dusting off my C programming language book (and C++ programming v2), I do see that structs being initialized where there are NOT enough params, are supposed to be null. Now, I have made changes in the ldr_init_hash so that it starts from element[4], but will ONLY use the hash, if both the methods binary_hash[x] and get_hash[x] are not null. This is the code: static void ldr_init_hash(struct db_main *db) { struct db_salt *current; int size; if ((current = db->salts)) do { for (size = NUM_PW_HASHES-1; size >= 0; size--) if (current->count >= password_hash_thresholds[size]) { if (db->format->methods.binary_hash[size] && db->format->methods.get_hash[size]) break; } ...... So what will happen, is the code will try to use the bigger hash tables (if there are enough elements), but ONLY do so if the code actually exists. Now, I have made 'real' changes to raw-md5, because I am using and have tested it. However, I have not spent the time to update others (I tried for DES, and it failed self test, so I backed things out). My main question is I have 2 NULL's added to each *_fmt.c file, and the ones that have default_hash values for the 3 existing elements, I added defaults for the 4th and 5th. However, this touches a LOT of files, but does not add code to them (more for documentation). The question I have (finally), is do you want me to make patch changes with all of these NULL, NULL added, or simply keep the *_fmt.c's the same as they are, and let default compiler initialization set them to NULL. If the 2nd, I have no idea if there are compilers out there which fail to properly pre-set to null or not. If the 2 NULL's are added explicitly, then there would be no chance of compiler issues. However, it makes for a patch that touches the world for little 'gain. To sum things up, how would you like me to proceed from here? I have maintained other open software before, and know that large changes like this can be a pain in the arse. I can provide the stuff in any way that would be beneficial to you. But I am not sure this is the proper 'patch' to put out publicly, and might be better to be incorporated into an 'official' jumbo. NOTE if you want, I 'could' build a jumbo-6 with jumbo-5 and these changes in it. If this is acceptable to you, I can get this within 24 hours or so (if the wife gives me time). I would also add MinGW32 (and M$VC) #defines and changes, so that MinGW32 would be a 'default' part of the jumbo. The VC is not really stand alone, but still requires MinGW to build some of the asm (and do things like copy the proper x86-*.h to arch.h). But I simply use the VC to help in my own debugging. I will leave what 'type' changes you want in your court, and wait for a reply from you. However, all in all, I took my raw-md5 SSE2 which was --testing at 12M/s and was getting 450K/s with 50000 passwords, and now it is getting closer to 4M/s. Still slower (some delay is in the rules and pre-rules), but much better than it was. The SSE2 version of the raw-md5, will be provided as it's own patch. This one I also have questions about, since there are other patches floating around. I am not sure what patch level to do for that one, as the jumbo-5 seems to take the best of them, but is not the same. It might be best to roll that into jumbo-6. Jim. ----------------------------------------- NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.