Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130112151842.GA7635@openwall.com>
Date: Sat, 12 Jan 2013 19:18:42 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: parallel sort of text files under Windows

On Sat, Jan 12, 2013 at 03:29:11PM +0100, JohnyKrekan wrote:
> Hello, I would like to ask whether someone could recommend a sort program for sorting large text files which can use multiprocessor systems efficiently? I am using sort from the GNU coreutils which is very good but can not utilize more than one core. Any advice on good parallel sort would be very useful.

In addition to RB's reply:

Are you already using the -S option to sort to let it use more RAM?
For example, "sort -uS 14G" works well on a 16 GB RAM machine (64-bit
system, of course).

Do you really need the lines sorted rather than merely duplicates
removed?  JtR's own "unique" program might be faster.  You may try:
"./unique -mem=25 OUTPUT-FILENAME-HERE < INPUT-FILE" (or you may pipe
the output of "john" into it).  The -mem=25 will make it use 2 GB
of RAM (unfortunately, it can't use more than that currently).

So if you're on a machine with e.g. only 4 GB RAM, the "unique" approach
is likely faster (since it's a simpler task).  If you're on a machine
with e.g. 16 GB RAM or more, the "sort" approach is likely faster (since
it can use more RAM).

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.