john-users - Re: Chunk of work specification

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181125133318.GA4428@openwall.com>
Date: Sun, 25 Nov 2018 14:33:18 +0100
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Chunk of work specification

Hi Vojtech,

On Sat, Nov 24, 2018 at 10:02:14AM +0100, Vojt??ch Ve??e??a wrote:
> first I'd like to apologise for any mistakes with using the mailing list. It
> is my first experience with this kind of communication.

You did everything right.

magnum already replied to your questions.  I'll mostly reiterate the
same answers, with slight differences/additions:

> To the point: I'd like to make a wrapper around JtR which would allow me to
> control the starting position and the size of the chunk. The aim is to use 
> the wrapper in some distributed system. Now the question is, are there some
> options to do this? I'm in of something similar to Hashcat's --skip and --
> limit. I know that JtR has --node parameter but it doesn't seem too
> intuitive nor precise when it comes to the exact sizes of chunks. I want to
> use some dynamic scheduling algorithms. So that I can control for how long 
> will the machine be in use. The algorithm will then create the chunk of such
> a size that given machine will compute it in given time, let's say a
> minute).

Yes, --node isn't "precise when it comes to the exact sizes of chunks",
so it requires some math or/and tuning for a specific attack, but other
than that it's usable for the purpose.  Here's an except from a Perl
script I used in password cracking contests in 2012-2014:

	print "Starting slave $slave as #$slave_num for entry#$num\n";

	open(ENTRY, "> $workdir/$session" . '-entry-' . $num);
	print ENTRY 1; # running
	close(ENTRY);
	$pid = fork();
	if ($pid == 0) {
		my $start = ($num - 1) * $chunk + 1;
		my $end = $start + $chunk - 1;
		$end = $total if ($end > $total);
		my $conn = myconnect($slave);
		upload_files($conn, $slave)
		    unless ($slave_has_files{$slave});
		print $conn "OPTS $opts --node=$start-$end/$total\n";

Your example of "1 minute" is good when it corresponds to a large
enough number of candidate passwords, which is usually the case.
However, for very slow hashes with a lot of different salts loaded for
cracking, it might not be sufficient to achieve an optimal number of
candidate passwords to fully use the password hash implementation's
parallelism, sometimes not even on CPU.  Also, 1 minute per chunk per
node at 60+ nodes corresponds to multiple chunks starting/ending per
second on the master node.  For these reasons, I ended up using
significantly larger chunks.

> I guess I could abuse session file for specification of the starting point 

This is what people typically suggest doing, and I recommend against.

> (Hashcat's --skip) but I don't see anything that could be used as Hashcat's
> --limit.

There's --max-run-time and --max-candidates, but I don't recommend this
approach.

> In JtR's context, the things I need are provided by the Markov mode
> the start and end is but then I lack at least specification of the mask.

This is just one cracking mode of many.  A Markov mode specific cracker
isn't very useful.

> So is there some way how to achieve what I need using John?

Yes, with --node.

> Or is it even 
> possible to add these tiny things to John? Is the generator build the way 
> that this would be possible implement? I could try to do it myself I just 
> need some info whether it is worth the time or whether I'd have to rewrite 
> half of the John to make it work.

Like magnum said, the generators are per cracking mode.

What we could reasonably improve without rewriting is enhance
incremental mode's node granularity.  Right now, it only assigns whole
numbers of so-called cracking order entries to nodes, skipping other
nodes' entries.  In a typical cracking task, there are only a few
thousand of those entries, and they correspond to different (generally
progressively larger, but not always) numbers of candidate passwords to
test.  An improvement would be to split the ranges of candidate
passwords within each entry instead.  Then having only a very limited
number of entries and them being of different size wouldn't matter.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.