|
Message-ID: <CAJ9ii1GG0QJe8i4vdSEdD5X0HMAM-gNn+X2bnWeZg65zVHS=aw@mail.gmail.com> Date: Wed, 12 Dec 2012 18:03:12 -0500 From: Matt Weir <cweir@...edu> To: crypt-dev@...ts.openwall.com Subject: Intentionally Increasing Collisions in Password Hashing Algorithms So this is a continuation of a rather long Twitter thread between Solar, @DefuseSec and me, (apparently Twitter is not IRC), talking about increasing collisions in certain password hashing algorithms as a defensive measure. As a quick disclaimer, this very well might be a horrible idea, (I know when people first hear about it that’s their initial reaction). That being said, it is something that I feel is at least worth looking into. Also i'd like to point people to @DefuseSec's blogpost on the subject: https://defuse.ca/blog/2012/12/increasing-collisions-in-password-hashing/ Quick Definitions: Offline Attack: The attacker has access to the password hash. They can make as many guesses as their resources allow them to. An example of this would be using John the Ripper/HashCat to crack BCrypt hashes. Online Attack: The attacker does not have access to the password hash. The defender can limit the number/rate of guesses the attacker can make. An example of this would be an attacker using THC Hydra to submit password guesses to a web login page. Background/Scope: Just to get this out of the way at the start. Increasing collisions is in fact a horrible idea if your goal is to defend against offline attacks. Under no circumstances should this be used for encryption applications. So taking a step back, there are certain fundamental truths when it comes to the current state of password security. Users will choose weak passwords; users will reuse passwords; and many sites will not implement memory hard or computationally hard password hashing algorithms. Now some users do choose strong passwords, many users avoid reusing passwords for important sites, and some sites do use strong hashing functions. But there’s enough examples of that not being the case I don’t think I have to spend too much time saying this is a major problem. If it wasn't a problem then tools like JtR and Hashcat would be useless and no-one would worry when LinkedIn’s password hashes were stolen ;p Now there’s certainly good tactics to combat this problem. Two-factor authentication, password vaults like Keepass, hashing algorithms like BCrypt and SCrypt, user training, etc. That being said, there’s reasons why we haven’t solved the password problem yet. Many of these techniques like password vaults are only slowly gaining user acceptance. On the other hand blaming users for picking weak passwords has gained high acceptance but that doesn't seem to be working very well ;p Strong hashing algorithms certainly can scale up to websites that support millions of users, (Twitter uses BCrypt). But going off Solar and Simon Marechal’s slides from the Password^12 conference when talking about phpass, “The portable hashes are very simple, which was key to phpass’ acceptance. More elaborate portable hashes would likely not be accepted; This may be something to try for a next generation phpass now that the foot is in the door”. My interpretation of that, (which is also based on my personal experiences), is there is going to be a lot of resistance to widely deploying computationally expensive hashing algorithms. Here's the link to their talk btw: http://www.openwall.com/presentations/Passwords12-The-Future-Of-Hashing/ This brings us to password hash collisions. Ideally we could just focus on making password hashes as expensive as possible to compute while training users to choose strong/unique passwords to make cracking cost prohibitive. Since there is a lot of pushback on this, I've been looking at defensive techniques that might have an asymmetric cost for an attacker vs. the defender. A good example of such a technique is password salts. They have almost no impact on the defender, but have a huge impact on an attacker who is targeting multiple users. Now one issue is that users have passwords for many websites where their account has little value to an attacker, but the users then reuse their passwords for sites that do contain valuable info. Attackers exploit this by breaking into low value sites, stealing the password hashes, (hopefully they are hashed…), cracking them and then attempting to use the stolen credentials against high value sites. Basically they are taking advantage of the fact that offline attacks against weak hashes are fairly cheap while online attacks can be comparatively quite expensive, (you have to deal with rate limiting, IP blacklists, captchas, etc). Now online attacks are certainly popular as well, (just look into all the SSH bruteforcing going on), but targeting password reuse is popular for a reason. So it would be nice if we could raise the cost of checking if a cracked password is valid on a different website, (basically adding an “online” component to an “offline” cracking attack). That’s why I’m looking into intentionally increasing collisions in password hashes for low value websites/accounts. If we can insert some level of uncertainty into the hash cracking process, (does this guess really match the plaintext password the user selected?), it can raise the cost for an attacker when it comes to attacking other sites. For example, now attackers will try a username+password combo along with a few variations, (such as incrementing numbers at the end of the password), but if that doesn't work they assume the user didn't reuse their password and they move on. The best description I've seen of this based on real world data is: https://www.guildwars2.com/en/news/mike-obrien-on-account-security/ With collisions, the attacker doesn't know if the password they cracked is the user’s real password. This means when they attempt to reuse it and fail the attacker has a couple of choices: 1) Proceed as normal, (depending on the probability of a collision, the attacker’s first match still has a good chance of being the user’s password). The advantage for a defender is, A) If the attacker did crack the user’s password then it would have been cracked regardless B) If it is a collision the attacker will fail to access the user’s account even if the user reused their password. 2) If the reused credentials don’t allow access, continue their cracking session until they find another match and then try the new credentials, (they can queue up possible passwords before checking them). Repeat until they find the correct password, or they run out of time, (they decide to stop). Not only is there the additional cost of checking collisions, but if a user did not reuse their password the attacker doesn't know that. Aka the attacker may waste time cracking/checking passwords long beyond when they would normally stop if there weren't collisions. One other advantage with password hash collisions is they provide deniability of the plaintext value. If someone’s password is embarrassing, such as “I hate my boss”, this provides them the ability to claim that their password was really something else if their password hash is disclosed. Now unlike salts, there are downsides to the defender for this approach. The first is that it increases a user’s vulnerability to online guessing attacks. AKA, an attacker’s guess might collide with the user’s password allowing the attacker to log in. That’s not a minor concern and I’ll come back to that later. The second problem is that for this approach to have any impact it will make it trivial for an attacker to find a collision when performing an offline attack. Basically if an attacker has access to the hash they also have access to that user’s account. That’s not something to discount and it was the subject of a lot of the Tweets between Solar and myself earlier ;p Before getting back to those trade-offs I’d like to describe some ideas about how to increase hash collisions in a what I hope is a controlled fashion. For example, there are many different ways to increase collisions in popular password hashing functions. The key then is to select a method that does not leak additional info about the plaintext password while at the same time allowing the defender to estimate the number of collisions that are likely to occur. This rules out increasing collisions by mangling the plaintext password before hashing. An unintended example of this approach can be seen in old DES based Crypt(3) hashes which truncate user passwords after 8 characters. While this certainly increases collisions, (for example ‘password’ and ‘password123’ would share the same hash), it also allows an attacker to target only the first 8 characters of the password instead of the entire password. While there are some advantages to this approach, (the attacker doesn't know whether to try ‘password’ or ‘password123’ when reusing the credentials), those advantages are more than outweighed by the fact that an attacker can learn about the beginning portion of a password without having to guess the entire password. These requirements also disqualifies modifying the internal operation of popular hashing algorithms, or building new collision prone hashing algorithms from scratch. Beyond the fact that such an approach is a bad idea in general, such changes or new hashing algorithms may unintentionally leak information about the plaintext password. In addition, it is unlikely without a detailed study of the new algorithms that the number of collisions can be accurately predicted. This is a longer way of saying that while it is theoretically possible a new collision prone hashing algorithm could be developed, the chances of it going catastrophically wrong are not trivial and there is no reason to do so in the first place. Therefore the approach I’m currently investigating is to create collisions by truncating the results of existing password hashing algorithms. First of all this approach does not leak any additional information about the underlying password in an offline attack. I’m working on a formal proof to show this but the colloquial example is if this approach did leak additional info, then attackers would already be doing it. Note, I’m specifically referencing offline attacks. In an online attack it does leak data since it can let the attacker know about collisions if they can guess them, (and as a side effect log in which might be a bigger problem…). Still the knowledge gained about the underlying plaintext password learned based on the collisions would probably be limited if a standard/modern hashing algorithm was used and the password salt was large and remained unknown to the attacker. As far as being able to estimate the collision frequency goes, this is admittedly a trickier prospect. This is because it is dependent on the underlying hash function being used. The closer in behavior the underlying hash function is to a random oracle, then the easier it is to calculate the frequency of collisions. While no hash function exists that is a perfect random oracle, several are close enough in practice that intelligent guesses might be able to be made as to the probability that random collisions will occur, (yes I just suggested we make the cow spherical…) There are several other advantages to increasing collisions by truncating existing hashes. First, it should cause no additional performance impact compared to existing implementations. In fact it might trivially make authentication attempts quicker as only the first n bits need be compared between the hash from a user’s password submission and the saved hash on disk, (hopefully that speedup isn't noticeable or you need to be using a stronger hash!) Second, this should limit any additional side channel leaks compared to the existing hashing setup as the added work, (the truncation), is done on a fixed length hash for a fixed length value for every single password hash, regardless of the underlying plaintext. Finally, and perhaps most importantly, it is easy to upgrade to this from existing deployments. You simply take whatever hashing algorithm is being used and truncate it to the desired length. Since this is not dependent on the underlying plaintext password, site administrators don’t have to deal with two sets of user hashes, one for users who have logged in since the upgrade, and one for users who haven’t, (note if you want to avoid collisions with a blacklist of banned passwords it does require some additional work though). Now one consideration is you probably don’t want a user’s strong password to collide with passwords that are commonly used in online attacks. Aka if a user’s password collided with ‘123456’ that could be a serious problem. One solution for this when the user creates a new password, (or logs in with an existing password and the plaintext is available), is to check the password’s hash against the hash for those blacklisted passwords. If a collision occurs, change the salt, (You are using a salt right!?), and run a check with the new salt. Repeat until there are no collisions with blacklisted passwords. Now this is more complicated if you are upgrading your hashing scheme or blacklists, and don’t have access to the user’s plaintext passwords. There’s a couple of ways to deal with this, but one option might simply be to “hash the hash” and make that part of the new hashing algorithm going forward, (or switch back once the user logs in again). It also helps if the same blacklist is used to prevent users from actually picking those banned passwords. Oh, also a unique long and random salt is required. If different sites share the same collisions it kind of negates the advantages of this approach ;p So this leaves a lot of interesting questions. For example where should this be used and on what accounts? Aka you might not want to use this on webmaster accounts while using it for normal users. Also you may not want to use it for accounts that contain sensitive info as the assumption that if an attacker has the hash then they have access to the user’s account might not hold true. Then there’s the basic question of how likely should collisions be? Plus it's still left to be determined if the advantages of this approach outweigh all the negatives even for unimportant accounts. …. But I’m way past my 140 character limit right now and to be perfectly honest I don’t have good answers for these questions yet so I’ll leave answering them to other people and future e-mails ;p
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.