|
Message-ID: <20150826031555.GA2082@openwall.com> Date: Wed, 26 Aug 2015 06:15:56 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Anyone looked at the Ashley Madison data yet? On Tue, Aug 25, 2015 at 04:27:37PM -0400, KZug wrote: > Due to the extremely sloooow Bcrypt speed of the A.Madison dump, I don't see a point at each of us working in its own corner. > If you are willing to participate, I can open a Dropbox folder and share it. (hashes recovered, etc.) > By doing so, we don't have to reinvent the wheel each time, and can save few CPU cycles. I suggest that, if you do this, you all start by defining your goals, and this will affect whether and how you work together (or at all). First, obviously you're doing this for research (and not e.g. for having anyone's account anywhere compromised to any greater extent). What kind of research is that? What would you like to find out, and why? Is this to get as many passwords cracked as you can, and then state this figure - e.g., "0.1% of the bcrypt hashes in the dump cracked in 7 days" (totally arbitrary figures, but these feel realistic to me based on what was said so far)? So that e.g. academic publications on password security have some figure to refer to for the case of very slow salted hashes without a password policy and with related information available for each account in a multi-million password hash dump (these are the factors that I think are primarily determining the success rate). If so, you may accept contributions from about anyone. Is this to create a "top N" list for this leak, to have one more of those to refer to e.g. in academic publications on password security? (I doubt the list would be of great other use. We have many of those already. It might be usable to adjust the rankings in a cumulative "top N" list across many leaks, though.) If so, you need to define the methodology first, or the resulting list would likely be badly skewed in unknown ways. You can't blindly accept arbitrary contributions (there are ways to make partial use of those yet avoid biases, but this is not trivial so it will likely go wrong). For the "top N" work, you need to "shuf" the dump and choose specific e.g. 100k lines from it (e.g. for intending to produce a top 100 list). To make this even safer, "shuf" the 100k sub-list of hashes for each potential contributor separately, and give each contributor only their shuffled list. This extra measure is in case of interrupted attacks, so that with a large number of contributors the original 100k list is attacked uniformly anyway. (It wouldn't be fatal even if it's not, though, since it's already shuffled. However, if a particularly common password is found closer to the start of the 100k list, it might appear as even more common than it actually is if some attacks are interrupted.) Also, to avoid increasing the damage of this leak you should keep your cracked password list such that it's not valuable to attackers. Focusing e.g. on a 100k sub-list and creating a top 100 list from it will likely make your cracked passwords less valuable for possible misuse as compared to attacking the entire dump and maximizing the number of hashes cracked through the accounts' related info (the GECOS alikes). You will probably produce way fewer cracked passwords when processing only a relatively small sub-list. For a number of reasons, I do not seriously think that any of these cracked passwords would be used for actual attacks, yet it's an angle you might want to consider. Alexander P.S. I don't intend to participate. I am merely commenting on this. And yes, please take the actual coordination, if any, off-list.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.