Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <941F77A06D834C3CB423893D5C4BAFC5@ath64dual>
Date: Thu, 20 Aug 2009 18:00:45 -0500
From: "JimF" <jfoug@....net>
To: <john-users@...ts.openwall.com>
Subject: Re: Thoughts and questions on creation of a 'generic' MD5 hash set format (to handle 'all' of them)

**WARNING, if you are a 'normal' john user, not interested in what john does 
'under the hood', then you can probably stop reading this email now, and not 
waste your time, since it is a little long, and wanders around quite a bit.

On Thursday, August 20, 2009 11:51 AM, Solor Designer wrote:
> Jim,
> You have some good thoughts here.  Thank you for posting this.

Thanks.

> On Thu, Aug 20, 2009 at 11:26:22AM -0500, JimF wrote:
>> However, I have been thinking about how best to feed john some of the 
>> many md5 hash types (families).  I propose something like this:
>>
>> Password 4turtles
>> Salts either ttzzz or i a   (i space a)
>>
>> uid:md5($p.$s5)ttzzzf879de3ea2c872243bf38ff482fecb7f     (pw=4turtles 
>> salt=ttzzz)
>
> You have probably seen my related proposal already, but just in case:
> http://www.openwall.com/lists/john-users/2009/01/21/5

I did read that a while ago.  I have had it in the back of my mind since 
then, and have recently done more direct 'thinking' about it.

**NOTE this next part is written off the cuff in this email, and not fully 
validated as being the right way, i.e. I have not printed and read it a few 
times, and slept on it, etc, thus, there may be glaring problems.  At this 
time, I am still just bouncing ideas around.

> I think that for decent performance there got to be some shortcut for
> the simple/typical cases like that.  Maybe the shortcut should be right
> in the input syntax, maybe it should be in having separate "formats" for
> simple vs. complicated hash types, or maybe it should be in the code
> (auto-detection of simple special cases with corresponding substitution
> of function pointers, etc. to avoid further overhead).

I believe this could be done by 'custom' coding the known simple/common 
format ones:

Such as:

md5($p)
md5(md5($p))
md5(md5($p)$s)
 ...

Then directly link the proper function pointers once the 'type' has been 
determined (either by user supplying it, or by it being seen in the hash 
file).

Now, for as of yet, 'unknown' types, I am pretty sure I can build a 
parser/compiler that would load an array of function objects (pointers and 
params), to process the data in the proper order, and get it working at a 
reasonable clip. I am pretty sure I could run AT LEAST at 50% of the best 
custom coded version, and hopefully 75-80%.   Initially, it will simply use 
md5 from openssl, since that is the easiest to get things working perfectly. 
Then work on making it fast for different build configs.

But then, if someone gets some hashes that are build using 
md5($s(md5(md5($p(md5($s.$p.$s)$s))))) which no one has seen at this time 
then john would be able to handle that format in a 'reasonable' fast manner, 
until the format became popular enough for someone to build a custom format 
set of optimal functions.  I just need a little more time to get a better 
view of how I could put this together, but feel this would be a good way to 
go.

It would be some change in 'normal' for john.  Most of john right now is 
very specific formats. phpass handles 'all' formats (i.e. $H$, $P$, $H$9, 
$H$7, .. or any other valid 'loop count'), and I think there is a MD5 that 
does multi (like apache and mdac ???)   But most formats are very specific. 
This would be a common use md5 suite, with certain 'formats' optimized, but 
able to handle many arbitary types.

I think a lot of the complex logic, and 'overhead', will be done at load up 
time, and once the program starts in a crack() mode, it will fall right back 
into processing just like is done today.

I guess time will tell, because I need to get some of the other 85% 
completed things done, and get time for this, lol

The 'language' I can see is:

md5()
MD5()
md5b64()
Functions that take a string input.  Output is the md5 in low-base16, 
UPPER-base16 or 'common' base-64

Replacement params:

$p  passhash (lower case-16)
$P  passhash (UPPER case-16)
$p64 passhash (base-64)

The passhash will be the last X bytes of the string following the proper 
closing ) from the language signature.

$s
Salt.  It is the first bytes following the closing ) of the sig.   QUESTION. 
Could there be multiple 'different salts' ?  If so, and if we are to 
support, then we need some way to say salt 1, salt 2, and which bytes make 
up this salt.  If we are NOT handling multiple salts, then I think a simple 
$s is sufficiently and will handle variable length salts, as the length of 
the salt can be computed from the length of the hash string.

operator .
append 2 strings  So $p.$s  if p=12345 and s=aazzz would be the string 
12345aazzz

So, md5(md5($p.$s.md5(md5($p.$s)).$p)) is perfectly valid, and if some 
package used this, and the user uses a password of GOOSE with a salt xyzzy, 
then
uid:md5(md5($p.$s.md5(md5($p.$s)).$p))xyzzye9fd8e197adaf04ad191080be668a421
is a valid an 'usable' hash, since:
md5(GOOSExyzzy) = b8b6979216ac86ff2381c2989c51bf09
md5(b8b6979216ac86ff2381c2989c51bf09) = 8ec530dae739ef3368429bd6fd9b2c81
md5(GOOSExyzzy8ec530dae739ef3368429bd6fd9b2c81GOOSE) = 
e8d8350f26ae3606ced30f620fa3c1f6
md5(e8d8350f26ae3606ced30f620fa3c1f6) = e9fd8e197adaf04ad191080be668a421


*** Now some other 'possible' language parts:  ***

$u
User ID.  Some salts use this. I am not sure we have this information 
available, but it 'might' be very useful.  If this is something we need, and 
can get, then $u should be there.

operator ^
string xor.  Not sure if people would write a hash like this, but they 
'might'.

$"string"
quoted string (quotes 'could' be any charaters, and would not be part of the 
string added).

operator ^^NUM
Power.  Thus md5(md5($p.$s))^^2048  would be phpass for $H$9's.  Might be 
better (simpler) to be done with something like a md5pow() function and not 
the ^^ operator.  So, if using md5pow, the above would be 
md5pow(md5($p.$s),2048) or md5pow2048(md5($p.$s))

There are probably OTHER language 'things' which could be added.  If people 
know of things which ARE being used, then please post.  If it is something 
which could fit into a VERY simple language structure like this, then now is 
the time to talk it out.

Jim. 


-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.