|
Message-ID: <20180912114951.GA5022@openwall.com> Date: Wed, 12 Sep 2018 13:49:51 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: good program for sorting large wordlists On Wed, Sep 12, 2018 at 01:14:22PM +0200, JohnyKrekan wrote: > Thanx for infos, after I have raised the memory sizes and the space for > temp, the sort went well. Iwas sorting it to know how many duplicates (when > ignoring the character case) are in the superwpa wordlist. The original > file size was approx 10.7 gb, after sorting it was 7.05 gb, so 4 gb was > taken by the same words with modified character case. It's a case where you don't need to sort. You could use: ./unique -v output.lst < input.lst or e.g.: tr 'A-Z' 'a-z' < input.lst | ./unique -v output.lst Testing this on JtR's bundled password.lst: $ tr 'A-Z' 'a-z' < password.lst | ./unique output.lst Total lines read 3559 Unique lines written 3422 If you're interested in sizes in bytes as well, use "ls -l" or "wc -c" on the two files. For tiny wordlists like password.lst, "sort -u" is more convenient in that it can output to a pipe, so you can do: $ tr 'A-Z' 'a-z' < password.lst | sort -u | wc -l 3422 But for large wordlists "sort" may be slower, even with the "-S" and "--parallel" options. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.