Web Cryptography: Salted Hash and Other Tasty Dishes

by Lyle Mullican

30 Reader Comments

Back to the Article
  1. Your remarks on hashing passwords are incomplete.

    First, you should use a different salt value for each user. If you use the same salt each time, then an attacker can discover users who have the same password, which probably means that it’s a common one and they should concentrate on cracking that one.

    Second, you have to assume that an attacker who has compromised your system and has got your user database has also got your salt values, so they can use them to try a dictionary attack against your users’ passwords. To help to prevent this, you should use a hashing algorithm that has been deliberately designed to be slow and inefficient, such as bcrypt.

    Copy & paste the code below to embed this comment.
  2. I’m no expert, but doesn’t “This means that multiple inputs could theoretically produce the same result…” contradict the idea of collision resistance?

    Copy & paste the code below to embed this comment.
  3. No. Collision-resistance merely means that there’s no way of producing a collision that is simpler than just trying different inputs until by accident two hash to the same value.

    Since the output, the hash, is generally shorter than the input. (typically the input can be any length while the output is fixed-length), there *must* be collisions.

    Consider it with a theorethical (too small!) hash that is fixed-length at one hex-character.

    There’s only 16 possible hash-values, so you MUST get a collision if you hash 17 different strings, and likely you’ll get one much sooner.

    The same thing applies if the hash is larger, it’s just that you need to hash more stuff before you get a collision - and if the hash is large enough, then “more stuff” becomes “unreasonably much stuff”

    For example, if the hash is 128 bit, and truly collision-resistant, then you’d expect to have to try to hash aproximately 2^64 different strings before you’d get the first collision. (on the average, offcourse you could get lucky and get a collision after the first 2)

    Since 2^64 is equivalent to hashing a billion strings every second for slightly more than 200.000 years, this is unlikely to happen in practice. (and by the time computers are fast enough that it could be provoked, we’d need a bigger hash)

    Copy & paste the code below to embed this comment.
  4. It’s great that ALA decided to publish an article about security and password storage, because generally the awareness among web developers is rather low.

    However, the advice about salting is a little short and doesn’t really explain in-depth how to go about storing the salt. In that, I agree with commenter #1. But the real problem with this article is that it’s giving bad advice! There “have been”:http://www.guardian.co.uk/technology/2010/dec/29/gawker-hacking-gnosis-six-months “several”:http://nakedsecurity.sophos.com/2010/12/28/mozilla-accidentally-publishes-user-ids-and-passwords/ “incidents”:http://plentyoffish.wordpress.com/2011/01/31/plentyoffish-hacked/ in “recent news”:http://www.lightbluetouchpaper.org/2011/02/09/measuring-password-re-use-empirically/ of sites that were using simple hashing mechanisms to store passwords where people stole the databases and retrieved a lot of passwords.

    Nowadays hashes can be calculated at amazing speeds and there are “tools that make this very easy”:http://www.openwall.com/john/ to do. Combined with a dictionary-based attack, this allows attackers to retrieve a password rather quickly. There are “standardised functions for safely storing passwords like the UNIX crpyt() function”:http://codahale.com/how-to-safely-store-a-password/ that every web developer should know about.

    Why use crypt() and not roll your own? Well, crypt() has many advantages:

    * It is easy to make mistakes in rolling your own. Smart people have put a lot of thought into these functions. and they are more likely to be secure than whatever you might come up with, unless you’re an experienced security researcher.
    * Crypt() bindings are available in almost every programming language. This makes it a portable way to store passwords, and will allow you to easily share your database with programs written in different languages.
    * The crypt() format is forwards-compatible; you can upgrade to a stronger mechanism transparently and passwords encrypted with an older encryption algorithm continue to work. As people log in, you can rehash their password using the new mechanism.

    Copy & paste the code below to embed this comment.
  5. Thanks for the feedback. @drplokta, it is true that you can gain a little additional strength with per-user salts, but they have to be used in combination with a global salt that’s stored separately from the user data, or they’ll be vulnerable to dictionary attacks. I don’t think the additional complexity is worth the gain, which is minimal. If the attacker identifies repeated passwords, it doesn’t make the attack itself easier, it just allows him to access more accounts with the same work. If a salt value is kept away from the user database and remains uncompromised, a dictionary attack is impractical in any case.

    @Peter, we can quibble over algorithms but I think we can all agree that hashing is better than not hashing, and salted hashing is better than straight hashing. That said, the traditional implementation of crypt() has some serious problems. This is in fact precisely how Gawker was storing their passwords in the incident you linked to, and their database was compromised just as you described because cracking tools for crypt() output are widely available. Among other things, the traditional DES implementation of crypt() truncates passwords to 8 characters, making dictionary attacks much more feasible. The other incidents you cited did not mention whether the hashes were salted. I agree that “simple hashing mechanisms” can be a problem, which is precisely why salting is important. crypt() can be used with other schemes, including MD5. However, I still think it needs to be used in conjunction with a global salt value that’s not stored with the password. I would never advocate that someone “roll their own” algorithm. However, SHA-1 is a NIST standard and extremely robust. I think your characterization of “bad advice” is a bit strong.

    Copy & paste the code below to embed this comment.
  6. Lyle, thanks for your reply.

    The original DES implementation of crypt() is weak and broken indeed and it needs to be stressed that people shouldn’t rely on that algorithm anymore. However, all modern systems implement either OpenBSD’s blowfish-based algorithm or RedHat/Ulrich Drepper’s SHA-2-based algorithms, both of which are quite strong.

    I think avoiding crypt() just because an older implementation is broken is unneccessary. In fact, exactly because algorithms become weaker as time passes it’s a *good* idea to use crypt(), because of its modular design. Just upgrade the system’s crypt and voila, *all* your applications suddenly have stronger hashing capabilities, with complete backward compatibility for existing passwords.

    You mention that these password crackers are specifically for crypt() output. Indeed, John the Ripper is by default targeted at OS-specific algorithms, but it can be “easily modified to target custom salted SHA-1 hashes”:http://reusablesec.blogspot.com/2010/05/carderscc-analysis-of-password-cracking.html

    The above link shows that this person was able to get 2,900 guesses per second from their modified JtR on a plain salted SHA1.

    So, yes, JtR and other crackers are aimed specifically at standard operating system algorithms, but that just proves my point: these standard algorithms are constantly under close scrutiny. This means that their secure status is well-known and you can safely rely on using them. When they are broken it will be pretty widely known and new algorithms will be replaced, which you can easily upgrade to because modern crypt() is inherently modular.

    Upgrading your own system will be much harder unless you’ve given this serious thought. And once you’ve given it serious thought, you’ll probably end up with a similar design, so why not save you the hassle and just use crypt()?

    Finally, you advocate storing a *global* salt value, but that’s dangerous advice. If the database can be stolen, it’s very likely that your configuration files are compromised as well, and any global salt can easily be extracted from the configuration files.

    In fact, if you were to use *only* a global hash value, you can immediately see from your database when two people are using the same password (because the same salt is concatenated to the same password this will yield an identical hash value). Generating a custom “Rainbow Table” given a known salt is also pretty easily done, and the passwords in your database can be cracked in parallel. If the salt is custom per hash, like with crypt(), each password *must* be cracked separately.

    Copy & paste the code below to embed this comment.
  7. Peter,

    I don’t think storing a global salt value is more dangerous than relying *solely* on a salt that’s stored alongside the password. If the user database is exposed, the latter is exposed by definition.  But that’s not a given for the former. Databases may be compromised through SQL injection without access to the application code. I do agree that using both together creates a harder problem for an attacker, assuming they know all the salts.

    If an attacker has complete access to the entire application, and the application allows dictionary passwords to be used, then cracking them with a dictionary attack will *always be possible*. The best we can hope for is that it takes a bit longer—which is an argument for requiring at least a moderate level of complexity in passwords.

    Copy & paste the code below to embed this comment.
  8. “I think we can all agree that hashing is better than not hashing, and salted hashing is better than straight hashing.”

    Definitely agreed. That said, if the salt is unknown as the password is, that also makes hacking significantly more difficult. For example, simply salting a password with a user’s name is a fairly standard protocol, even if it’s not the best; at least it makes the salt unique per user.

    A lot of people advocate using random salts, which generally fall into 2 categories. First is those that are completely random, and thus must be stored along with the password. This approach is very difficult to beat, provided the method of salting isn’t known to the database, but only to the application (for example, if the attacker sees the salt in the database, he will probably try something like simply concatenating the random salt and the password first).

    Another option is to use a salt generated in such a way that it’s unique per user, but isn’t the user name, and isn’t stored in the database. One example I’ve seen involves creating a random variable seeded based on something (like, say, the length of the user name) that always returns the same value, and concatenating that with the user name.

    All of these, though, are predicated on the hacker not being able to access the application code. If he can, then it doesn’t matter how you hash and salt.

    Good article. I think a lot of people will find it useful.

    Copy & paste the code below to embed this comment.
  9. Lyle,

    I agree that storing a separate global salt is a good additional layer of security if you’re concerned about SQL injection *or* when the database server is reachable through the internet, since those two scenarios are about the only ways someone could get to the database without also having access to the webserver’s files.

    You mention that if this happens, we should slow down the attacker as much as possible. This ties in to the very first comment here by drplokta; systems like crypt() (or bcrypt as drplotka mentions) don’t just *salt* the passwords, they also *stretch* them. This effectively lengthens the password in such a way that cracking one password takes longer. You can do this through applying the hashing function for thousands of iterations or more. This could reduce the JtR example above from 2,300 attempts per second to *several seconds per attempt*, if you want! I’m sure you agree this is preferable to hashing just once.

    In fact, that’s what crypt() does. The modern implementations I mentioned in fact even allow you to tweak how many iterations it should use. This allows a system administrator to compensate for Moore’s law; as CPUs get faster, the administrator can just tune the number of iterations. The cool thing is that the modular syntax of crypt() hashes allow you to do this also without having to rehash existing passwords. This allows you to implement your own policy for expiring passwords hashed with older algorithms and upgrade to a more secure system at your own pace.

    Copy & paste the code below to embed this comment.
  10. Peter,

    Yes, I do agree that this approach makes the attacker’s job harder, but as long as he has access to the entire application and all salts, it’s always a tractable problem. There’s also tension between security and usability. If you stretch the time required to perform the operation too much, then you impact the user experience at signup and login. You also increase the demand on server resources at scale, when many people are attempting to sign up or log in at the same time. On a small scale it’s less of an issue.

    Copy & paste the code below to embed this comment.
  11. Lyle,

    Yep, that’s a classic tradeoff and everyone will make it differently, since different apps or businesses have different needs. The main point is that people should be *aware* that you can make this tradeoff, and that the software should have the accommodations to allow the user to make it.

    Copy & paste the code below to embed this comment.
  12. To get rid of all those rainbow tables.

    I just use a some functions

    I talked with some guys from Deloitte - they got me thinking a lot. It has nothing to do with the length of the password, but how many times you may try to login. Fx creditcard codes are normally on 4 numbers, but you only have 3 tries.

    So what I did was to make my login function show a captcha after 3 tries, and after 5 tries it will lock the username, and report multiple attacks to an administrator email.

    Usernames are twoway encrypted in my database, with a random function, where you will need the source code to get the username.

    Copy & paste the code below to embed this comment.
  13. @lsv20, limiting login attempts is good practice, but it doesn’t protect you from an offline attack if someone gets access to the database itself. They can take the data and crunch it independently of the Web application.

    Copy & paste the code below to embed this comment.
  14. @lsv20, Lyle is right; the techniques under discussion are about stopping attackers from obtaining passwords *off-line* from a compromised database containing the hashes, not about *on-line* attempts to bruteforce a successful login.

    Also, your approach, if PASSWORD is just the user’s plain password, is *just* as vulnerable as unsalted single-function hashing wrt rainbow tables: one only has to generate all hashes once for your set of functions. Rainbow tables are about *precomputation*. Your set of functions only make that precomputation a bit more expensive, but it still only needs to be done *once* for all deployments of your application or library if you don’t salt your hashes.

    Even if your password *is* salted, your homebrew scheme doesn’t buy you much additional security; the crypt()-based methods I mentioned run blowfish or SHA-2 (both more expensive than SHA-1) for literally *thousands* of iterations.  Your approach is computationally about as expensive as running SHA-1 for 4, maybe 5 iterations…

    Copy & paste the code below to embed this comment.
  15. Hi Lyle,

    The only thing I can see the article is missing is…Where can one go to learn more about how to implement these suggestions?

    Thanks, Jim

    Copy & paste the code below to embed this comment.
  16. Jim,
    That depends a great deal on your application environment. However, one excellent resource for security issues in general is the Security Now podcast at http://twit.tv/sn. Look through the archives for specific topics. Also, books by Bruce Schneier are excellent.

    Copy & paste the code below to embed this comment.
  17. Thank Lyle for the tips!

    Copy & paste the code below to embed this comment.
  18. Usually I like ALA articles a lot, but I felt the need to register to post a comment on this. The discussion of hashing here is dangerously incomplete.

    Most cryptographic hash methods are meant to be fast. That’s a good thing for most applications, but it’s terrible for password hashing, because people can generate rainbow tables relatively quickly. (_Especially_ now that Amazon EC2 Cluster GPU instances are available.)

    Using standard hash functions like SHA1 is bad security, and telling people that these hash functions are “good enough” is dangerous. If you want your passwords to be secure, you need to be using *bcrypt* or *scrypt*. They are specifically designed to be slow, and they do the work of salting for you.

    “Thomas Ptacek’s article”:http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html is still the best I’ve seen on password hashing. Please read it before giving people advice on secure password storage. If you are using SHA1, and your database is compromised, your users’ passwords are not secure.

    Copy & paste the code below to embed this comment.
  19. Thank you for writing on this subject. This is interesting I have not thought about it from your point of view.I have been looking for something like this and your blog helps me a lot to understand the topic better. Waiting for your next post.

    Copy & paste the code below to embed this comment.
  20. @samdk, thank you for your comments. This point has been raised in the discussion above and also in some other forums, where several people have felt that I should be recommending a slower hashing algorithm than SHA-1. It’s a valid point, but I think in practice, the gains are fairly small and there are also some drawbacks that the advocates of these algorithms don’t always consider. Here is some of my reasoning:

    I absolutely agree that slower is better when it comes to passwords. But slow does not mean impossible. Take the following example. Say we have a moderate user database of 50,000 records, and that an attacker can perform a SHA-1 hash in 1ms. Taking a dictionary of 1000 common passwords, he can generate a custom rainbow table in one second, compare to the database and probably recover a fair number of passwords unless the application enforced strict complexity requirements. This is assuming that the attacker has access to the application code and the salt, a point which I’ll address in a moment. Alternatively, say we’re using bcrypt with a cost factor that requires 100ms for the attacker to process. Using the same 1000-word dictionary, it will now take the attacker 58 days to run the entire list. More difficult, yes, but not infeasible. And that’s assuming that he has to process the list serially. Parallelizing the operation would make it considerably faster, and this is easy with modern GPUs and services like EC2, exactly as you pointed out.

    So the slower algorithms have a benefit, but they’re not going to render an attack impractical. Changing the numbers, of course, changes the outcome. A smaller user database will be run faster, and a larger one will take longer. A dictionary of 500 words might yield almost as many results as a larger one. And it’s worth noting that an attacker doesn’t have to exhaust the database to start using information maliciously. You could give bcrypt a higher cost factor, but you’re going to impact legitimate application performance if you take it very far.

    Now to drawbacks. These functions can be complex to implement, for various reasons. Often they rely on libraries installed at the OS level. For example, PHP’s crypt() function did not have an internal implementation prior to v5.3. In earlier versions, it would simply rely on the OS, and if your OS didn’t support Blowfish, PHP would just fall back on DES, which as we know has serious problems. There are a *lot* of servers running versions of PHP older than 5.3. And crypt()-based functions can be especially problematic on Windows, since they’re generally rooted in a *nix library. The MySQL ENCRYPT() function simply doesn’t work on Windows, for example, so portability can be a problem. Web developers are not always in control of the operating systems on which their apps depend, and adding complexity makes people less likely to do anything at all. I’d rather have them use an algorithm that’s broadly supported even if it’s faster.

    Finally, as I’ve pointed out before, these functions store the salts and algorithm identifier along with the passwords. That means that unless you *also* use an application-level salt, an attacker who accesses the database has all the information he needs to perform an attack. But a database may be compromised through SQL injection, or from a backup file, or because a management tool was compromised. It is far from given that an attacker who has the database owns the entire machine or even the application code. With a salt that’s stored separately at the application layer, the attacker who has only the database has zero chance of recovering any passwords. Not only does he not have the salt, he doesn’t know its length or how it’s combined with the passwords. Now we certainly shouldn’t be over-reliant on obscurity, but compartmentalizing the salt away from the result, and guarding it like you would guard a symmetric key, at least narrows the type of attack that’s required.

    You can certainly do both, of course, and use a global salt along with individual salts—those individual salts will definitely increase the attacker’s time requirement. In the end, though, any hash will be crackable by an attacker who has complete system access, as long as the application allows dictionary words to be used.

    Copy & paste the code below to embed this comment.
  21. Since the output, the hash, is generally shorter than the input. (typically the input can be any length while the output is fixed-length), there must be collisions.

    Copy & paste the code below to embed this comment.
  22. Your time estimates are off by orders of magnitude. GPUs can calculate hundreds of millions of SHA1 hashes in a second (http://www.win.tue.nl/cccc/sha-1-challenge.html), and powerful GPUs are available to anyone who wants them with an EC2 cluster GPU instance. That is, your 50,000,000 SHA1 iterations can be run in under a second. (Even a regular CPU core will take only a few seconds.)

    Certainly bcrypt/scrypt are not perfect, but giving your users days/weeks/months to change their passwords is a lot better than no time at all. If you can’t use bcrypt or scrypt for technical or other reasons, then running many (1000+) iterations of a hash function is another alternative.

    But really, the problem I have is not with the discussion here in the comments: it’s that none of this is discussed in the article, which is all most people are going to read.

    Copy & paste the code below to embed this comment.
  23. Sam, I said that the custom attacker could generate a custom rainbow table in one second. That was in *agreement* with your argument. My point is simply that the scenario where an attacker has complete system access is not the one I’m most concerned about defending against *in the context of web apps*, where database exposure seems to me most likely to occur through SQL injection. So I put a higher priority on separating the salt from the data than on having individual salts.

    The essay is intended to be fairly basic and I only wanted to mention tools that are accessible to most developers—not to describe a gold standard of password storage.

    Copy & paste the code below to embed this comment.
  24. Typo above: strike -custom- attacker

    Copy & paste the code below to embed this comment.
  25. Yes, I do agree that this approach makes the attacker’s job harder, but as long as he has access to the entire application and all salts, it’s always a tractable problem. There’s also tension between security and usability. If you stretch the time required to perform the operation too much, then you impact the user experience at signup and login. You also increase the demand on server resources at scale, when many people are attempting to sign up or log in at the same time. On a small scale it’s less of an issue.

    Copy & paste the code below to embed this comment.
  26. One of the problems I’m dealing with right now is that, while lots of authentication systems use SHA-1 out of the box, they all use a different default salt algorithm.  That makes it really hard to merge accounts or switch from one system to another without loosing everyone’s password.

    I’m not sure it’s true that “The SHA-1 algorithm… is currently the best available hash function that enjoys broad support in most popular programming languages.” To my knowledge, bcrypt is at least as secure,* and there are implementations in all the major programming languages.  Our Ubuntu (Debian) Linux servers have it pre-installed.

    But the real win with bcrypt is that the salt is stored with the hash in a standardized format.  Bcrypt generates a different hash every time it’s called for the same password, though any one of those hashes can validate that password.  So you can move your accounts to and from any bcrypt-capable system without worrying about how OpenLDAP (for instance) decided to salt the password.

    What it comes down to is, encryption is best left to the experts.  Deciding how to salt your hashes isn’t something web developers should have to think about.  SHA-1 doesn’t do that for you, bcrypt does.

    *I’m not cryptographer, but bcrypt has been around for a while, and I don’t know of any security holes.  It’s also newer than SHA-1, and has a number of properties designed to make it harder to crack.  In particular, you can set the minimum amount of processing time required to verify a hash, so that it’s plenty fast for authentication, but expensive to brute force attack.

    Copy & paste the code below to embed this comment.
  27. @dleppik, bcrypt has been discussed quite a bit in the comments above. Yes, it is a stronger algorithm in many ways. It does have some drawbacks that I’ve listed on the previous page, but for the most part they have more to do with implementation than cryptography. My sentence about SHA-1 has been a bit misunderstood, I think. The point was not that it’s the strongest hashing algorithm for password storage in general, but that it’s *broadly available*. I should have been clearer on that point. Bcrypt isn’t supported everywhere, and since hashing is irreversible, that’s a very important consideration to me. But really, the essay was intended to describe the *concept* of hashing and salting, not to provide a specific tutorial for how everyone should be implementing it. Different environments will have different constraints.

    Copy & paste the code below to embed this comment.
  28. This was an excellently written article, and covers a concern that I’ve tried time and again to communicate to other people. Hashing the raw password is not enough, and an additional check digit or check character is always a great addition as well. That way, the password is modified, the hash is concatenated to a fixed length with the position based on the user ID (not user name), and a check digit is added.

    Copy & paste the code below to embed this comment.
  29. Be careful to limit your expectations of what a salted hash actually does!

    It is fairly simple to extend a theoretical attack to prepend the salt onto a word to test, a small ammount of setup time is insignificant.  A salt only prevents a precomuted (rainbow) table being used, it doesn’t make it any more time consuming for an attacker to use a keyspace or dictionary attack on the cyphertext.

    On a 8600 GPU (fairly outdated) the search rate for SHA1 hashes is a little over 46M/s, varying slightly with the number of hashes to check against. A linux box with several more powerful dedicated GPU could possibly chew through a bullion a second. Most salting systems used in PHP applications offer little to no protection against these sort of attacks, sheer speed is enough to reverse most passwords.

    The term ‘custom rainbow table’ is an oxymoron, the point of a rainbow table is that it can be reused without additional calculation time. Generating tables would be a waste of time and power, a standard attack (finishing when a possible hash is found) would be far more efficient.

    I’ve always been interested to know how Facebook stores password fingerprints, has it been published anywhere?

    Copy & paste the code below to embed this comment.
  30. The fact that most users re-use passwords across multiple sites is the problem, as pointed out in the article.  But therefore what benefit is it if my site uses military level encryption if another site just stores plain passwords?

    My site’s security for that user (and probably most users) is still just as good as that plain password. It’s not a huge comfort to me that I was hacked because of someone else’s lax security.

    Hashing / encryption always sends us off into getting excited about the number of bits, or complexity or whatever - but distracts from all the human aspects where the real security risks are.

    Copy & paste the code below to embed this comment.
  31. Sorry, commenting is closed on this article.