Issue № 324

Web Cryptography: Salted Hash and Other Tasty Dishes

One of the most powerful security tools available to web developers is cryptography—essentially a process by which meaningful information is turned into random noise, unreadable except where specifically intended. Many governments tightly regulated cryptographic software until relatively recently. But cryptography is simply applied math, and it’s proved impossible to suppress. A web developer working on an underpowered netbook in his basement now has access to cryptosystems that major governments could only have dreamed of a few decades ago.

Article Continues Below

It’s tempting to think that a simple web application doesn’t require industrial-strength security because it’s not going to contain any information worth stealing, like credit card numbers. Unfortunately, people’s tendency to reuse passwords across many different systems means that a compromise of the weakest one can often give access to the rest. In 2009, a hacker famously gained access to a number of internal systems at Twitter by compromising the email account of a single administrative assistant.

Just as any artisan must know his tools and materials to achieve excellence, it’s important for web developers to understand what types of cryptography work in specific contexts, even if we don’t need to grasp the details of the math involved. Poorly applied cryptography provides no benefit and might even be dangerous by creating a false sense of security.

There are many different types of cryptosystems, but three broad categories commonly relate to web applications.

One-way functions#section2

You can’t start with a sausage and work backwards to produce the raw ingredients that went into it. In the same way, a cryptographic hash is designed to take in data and mangle it, so that what comes out the other end is unrecognizable—irreversibly so. Described this way, it doesn’t sound like a particularly difficult or useful operation. But well-designed hashes have two properties that make them both complex and useful.

First, a small change in the input should make a drastic change in the output. In other words, changing one character in the data being hashed will change a lot more than one character of the output. Usually, hashes produce an output of fixed (and relatively short) length, regardless of the input size. This means that multiple inputs could theoretically produce the same result, making it impossible to know what the original data was if an attacker only has the hashed result to work with.

Second, a hash will always produce the same output when given the same input. The most obvious application for hashes is in the storage of passwords. A web application doesn’t really need to know a user’s password—it just needs to verify that the person requesting access knows the password. If we hash passwords the same way at the time of creation and login, we only have to store and compare the result of the hash operation to know that the originals are the same. Then, even if a user database is exposed, the attacker doesn’t have anything terribly useful to work with. At least, that’s the conventional wisdom.

In reality, it’s not quite that simple. First, not all hashing algorithms are created equal. The once-popular MD5 algorithm, for example, is now known to be cryptographically weak in general (although it’s still usable for passwords). Second, we know that a lot of people choose the same common passwords. This means that if an attacker knows the hash value of “123456” as produced by a few popular algorithms, he can easily recognize it in a database. Lists of pre-computed hashes are widely available and known in the security industry as rainbow tables.

To counter this weakness, a hash can be “salted.” Given the two properties of good hash algorithms described above, we can simply append a little data of our own to the user’s password and store the hash of that combined text rather than the password itself. This will create a completely different result that’s still easily verifiable.

Compare the following:

sha1('password')
=> 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8sha1('x5T1password')
=> e1f9530af9bde38db0eef386c4d22ec2ba10d2fe

In this example, adding four random characters to the beginning of the word changes 95% of the resulting output. The SHA-1 algorithm, developed by the US National Security Agency, is currently the best available hash function that enjoys broad support in most popular programming languages.

Symmetric encryption#section3

Of course, one-way functions have a fairly narrow use. In many cases, information is encrypted only to ensure that it can’t be read outside its intended context. But within that context—an administrative console, for example—the encryption needs to be reversible.

However, the first question an application developer should always ask is: “Do I really need to collect and store this information?” Keeping data collection to an absolute minimum usually contributes to an improved user experience, simplifies development, and is naturally more secure. After all, data that doesn’t exist can’t be stolen or exploited.

But assuming that sensitive information is really crucial to an application’s function, we have to consider how to handle it securely. Reversible encryption falls into two categories. In the first, a single secret “key” is used both for scrambling and unscrambling the data. A key is somewhat like a password, but since keys are more likely to be used by programs than people, they can (and should) be longer and completely random.

With modern algorithms, symmetric encryption has the advantage of being extremely fast. The strong AES algorithm (also known as the Rijndael cipher) was specifically designed for speed and is well supported, with implementations in both database servers and application frameworks. Encrypting and decrypting data in MySQL, for example, is as simple as the following:

INSERT INTO people (pet_name) 
  VALUES (AES_ENCRYPT('Schmoopie','my-secret-key'));SELECT AES_DECRYPT(pet_name, 'my-secret-key') AS pet_name
  FROM people;

This doesn’t protect the information from exposure if a malicious user gains access to the web application, since it knows how to decrypt the data. It does, however, protect against accidental disclosure in other contexts, like backup files or an attack on the database itself.

Symmetric encryption works well when we only need to access the information in a single context. However, all of its strength lies in the secrecy of the key. This becomes a challenge when we want to move the data from one place to another. If we have to share the key, especially with multiple recipients, it’s no longer a secret.

Asymmetric encryption#section4

To meet this need, asymmetric algorithms rely upon a pair of linked keys that are generated with specific properties. When one key encrypts a piece of information, only the corresponding key in the pair can decrypt it. This type of encryption is also called public-key cryptography because often (not always), one key is made public while the other is kept private.

The math that makes this pairing possible is interesting, but what web developers need to understand is when to use it and what protection it provides. We most commonly encounter the technology in SSL (now called TLS) connections. A web server sends its public key to a web browser, which uses it to encrypt data that only the server can decrypt. It can also be used for sending encrypted e-mail.

Compared with symmetric functions, asymmetric ones are slow and require keys that are several times longer to be effective. In TLS connections, the browser and server only use public-key cryptography to establish a temporary symmetric key that they can use to encrypt subsequent communication.

These functions fulfill a crucial role in the modern web experience, however, by allowing us to protect data in transit between an application and its users. The prevalence of open WiFi makes this a very real concern. On open WiFi networks, users broadcast everything they’re doing in a 360-degree radius for a considerable distance. Without encryption, that data can be easily observed by anyone with a laptop. Two high-profile incidents highlighted this risk in 2010. First, Google ran afoul of privacy authorities by accidentally collecting and storing unencrypted WiFi traffic with its Street View cars. What Google did accidentally, others can do on purpose. Later the same year, the Firesheep plugin made headlines by showing just how simple (and damaging) it can be to eavesdrop on an open network.

At minimum, web applications should require TLS connections when transmitting login information. Using them for all traffic is even better.

Understanding the risk#section5

Given the way people use computers today, all of these technologies will only become more important to web developers. The past few years have shown that the risk is not academic. It’s not enough to say “my application doesn’t have a high enough profile to be a target,” because attacks are frequently automated, not targeted. At the same time, we have to understand and educate others about specific risks and how we use technology to address them. Without that education, clients may consider a site or application “secure” simply because it accepts HTTPS connections, not understanding why it’s unable to e-mail a clear-text password to them.

Security can never be absolute. But used properly, modern cryptography provides us with the tools to eliminate the biggest threats. It’s up to us to put them to use.

30 Reader Comments

  1. Your remarks on hashing passwords are incomplete.

    First, you should use a different salt value for each user. If you use the same salt each time, then an attacker can discover users who have the same password, which probably means that it’s a common one and they should concentrate on cracking that one.

    Second, you have to assume that an attacker who has compromised your system and has got your user database has also got your salt values, so they can use them to try a dictionary attack against your users’ passwords. To help to prevent this, you should use a hashing algorithm that has been deliberately designed to be slow and inefficient, such as bcrypt.

  2. I’m no expert, but doesn’t “This means that multiple inputs could theoretically produce the same result…” contradict the idea of collision resistance?

  3. No. Collision-resistance merely means that there’s no way of producing a collision that is simpler than just trying different inputs until by accident two hash to the same value.

    Since the output, the hash, is generally shorter than the input. (typically the input can be any length while the output is fixed-length), there *must* be collisions.

    Consider it with a theorethical (too small!) hash that is fixed-length at one hex-character.

    There’s only 16 possible hash-values, so you MUST get a collision if you hash 17 different strings, and likely you’ll get one much sooner.

    The same thing applies if the hash is larger, it’s just that you need to hash more stuff before you get a collision – and if the hash is large enough, then “more stuff” becomes “unreasonably much stuff”

    For example, if the hash is 128 bit, and truly collision-resistant, then you’d expect to have to try to hash aproximately 2^64 different strings before you’d get the first collision. (on the average, offcourse you could get lucky and get a collision after the first 2)

    Since 2^64 is equivalent to hashing a billion strings every second for slightly more than 200.000 years, this is unlikely to happen in practice. (and by the time computers are fast enough that it could be provoked, we’d need a bigger hash)

  4. It’s great that ALA decided to publish an article about security and password storage, because generally the awareness among web developers is rather low.

    However, the advice about salting is a little short and doesn’t really explain in-depth how to go about storing the salt. In that, I agree with commenter #1. But the real problem with this article is that it’s giving bad advice! There “have been”:http://www.guardian.co.uk/technology/2010/dec/29/gawker-hacking-gnosis-six-months “several”:http://nakedsecurity.sophos.com/2010/12/28/mozilla-accidentally-publishes-user-ids-and-passwords/ “incidents”:http://plentyoffish.wordpress.com/2011/01/31/plentyoffish-hacked/ in “recent news”:http://www.lightbluetouchpaper.org/2011/02/09/measuring-password-re-use-empirically/ of sites that were using simple hashing mechanisms to store passwords where people stole the databases and retrieved a lot of passwords.

    Nowadays hashes can be calculated at amazing speeds and there are “tools that make this very easy”:http://www.openwall.com/john/ to do. Combined with a dictionary-based attack, this allows attackers to retrieve a password rather quickly. There are “standardised functions for safely storing passwords like the UNIX crpyt() function”:http://codahale.com/how-to-safely-store-a-password/ that every web developer should know about.

    Why use crypt() and not roll your own? Well, crypt() has many advantages:

    * It is easy to make mistakes in rolling your own. Smart people have put a lot of thought into these functions. and they are more likely to be secure than whatever you might come up with, unless you’re an experienced security researcher.
    * Crypt() bindings are available in almost every programming language. This makes it a portable way to store passwords, and will allow you to easily share your database with programs written in different languages.
    * The crypt() format is forwards-compatible; you can upgrade to a stronger mechanism transparently and passwords encrypted with an older encryption algorithm continue to work. As people log in, you can rehash their password using the new mechanism.

  5. Thanks for the feedback. @drplokta, it is true that you can gain a little additional strength with per-user salts, but they have to be used in combination with a global salt that’s stored separately from the user data, or they’ll be vulnerable to dictionary attacks. I don’t think the additional complexity is worth the gain, which is minimal. If the attacker identifies repeated passwords, it doesn’t make the attack itself easier, it just allows him to access more accounts with the same work. If a salt value is kept away from the user database and remains uncompromised, a dictionary attack is impractical in any case.

    @Peter, we can quibble over algorithms but I think we can all agree that hashing is better than not hashing, and salted hashing is better than straight hashing. That said, the traditional implementation of crypt() has some serious problems. This is in fact precisely how Gawker was storing their passwords in the incident you linked to, and their database was compromised just as you described because cracking tools for crypt() output are widely available. Among other things, the traditional DES implementation of crypt() truncates passwords to 8 characters, making dictionary attacks much more feasible. The other incidents you cited did not mention whether the hashes were salted. I agree that “simple hashing mechanisms” can be a problem, which is precisely why salting is important. crypt() can be used with other schemes, including MD5. However, I still think it needs to be used in conjunction with a global salt value that’s not stored with the password. I would never advocate that someone “roll their own” algorithm. However, SHA-1 is a NIST standard and extremely robust. I think your characterization of “bad advice” is a bit strong.

  6. Lyle, thanks for your reply.

    The original DES implementation of crypt() is weak and broken indeed and it needs to be stressed that people shouldn’t rely on that algorithm anymore. However, all modern systems implement either OpenBSD’s blowfish-based algorithm or RedHat/Ulrich Drepper’s SHA-2-based algorithms, both of which are quite strong.

    I think avoiding crypt() just because an older implementation is broken is unneccessary. In fact, exactly because algorithms become weaker as time passes it’s a *good* idea to use crypt(), because of its modular design. Just upgrade the system’s crypt and voila, *all* your applications suddenly have stronger hashing capabilities, with complete backward compatibility for existing passwords.

    You mention that these password crackers are specifically for crypt() output. Indeed, John the Ripper is by default targeted at OS-specific algorithms, but it can be “easily modified to target custom salted SHA-1 hashes”:http://reusablesec.blogspot.com/2010/05/carderscc-analysis-of-password-cracking.html

    The above link shows that this person was able to get 2,900 guesses per second from their modified JtR on a plain salted SHA1.

    So, yes, JtR and other crackers are aimed specifically at standard operating system algorithms, but that just proves my point: these standard algorithms are constantly under close scrutiny. This means that their secure status is well-known and you can safely rely on using them. When they are broken it will be pretty widely known and new algorithms will be replaced, which you can easily upgrade to because modern crypt() is inherently modular.

    Upgrading your own system will be much harder unless you’ve given this serious thought. And once you’ve given it serious thought, you’ll probably end up with a similar design, so why not save you the hassle and just use crypt()?

    Finally, you advocate storing a *global* salt value, but that’s dangerous advice. If the database can be stolen, it’s very likely that your configuration files are compromised as well, and any global salt can easily be extracted from the configuration files.

    In fact, if you were to use *only* a global hash value, you can immediately see from your database when two people are using the same password (because the same salt is concatenated to the same password this will yield an identical hash value). Generating a custom “Rainbow Table” given a known salt is also pretty easily done, and the passwords in your database can be cracked in parallel. If the salt is custom per hash, like with crypt(), each password *must* be cracked separately.

  7. Peter,

    I don’t think storing a global salt value is more dangerous than relying *solely* on a salt that’s stored alongside the password. If the user database is exposed, the latter is exposed by definition. But that’s not a given for the former. Databases may be compromised through SQL injection without access to the application code. I do agree that using both together creates a harder problem for an attacker, assuming they know all the salts.

    If an attacker has complete access to the entire application, and the application allows dictionary passwords to be used, then cracking them with a dictionary attack will *always be possible*. The best we can hope for is that it takes a bit longer — which is an argument for requiring at least a moderate level of complexity in passwords.

  8. “I think we can all agree that hashing is better than not hashing, and salted hashing is better than straight hashing.”

    Definitely agreed. That said, if the salt is unknown as the password is, that also makes hacking significantly more difficult. For example, simply salting a password with a user’s name is a fairly standard protocol, even if it’s not the best; at least it makes the salt unique per user.

    A lot of people advocate using random salts, which generally fall into 2 categories. First is those that are completely random, and thus must be stored along with the password. This approach is very difficult to beat, provided the method of salting isn’t known to the database, but only to the application (for example, if the attacker sees the salt in the database, he will probably try something like simply concatenating the random salt and the password first).

    Another option is to use a salt generated in such a way that it’s unique per user, but isn’t the user name, and isn’t stored in the database. One example I’ve seen involves creating a random variable seeded based on something (like, say, the length of the user name) that always returns the same value, and concatenating that with the user name.

    All of these, though, are predicated on the hacker not being able to access the application code. If he can, then it doesn’t matter how you hash and salt.

    Good article. I think a lot of people will find it useful.

  9. Lyle,

    I agree that storing a separate global salt is a good additional layer of security if you’re concerned about SQL injection *or* when the database server is reachable through the internet, since those two scenarios are about the only ways someone could get to the database without also having access to the webserver’s files.

    You mention that if this happens, we should slow down the attacker as much as possible. This ties in to the very first comment here by drplokta; systems like crypt() (or bcrypt as drplotka mentions) don’t just *salt* the passwords, they also *stretch* them. This effectively lengthens the password in such a way that cracking one password takes longer. You can do this through applying the hashing function for thousands of iterations or more. This could reduce the JtR example above from 2,300 attempts per second to *several seconds per attempt*, if you want! I’m sure you agree this is preferable to hashing just once.

    In fact, that’s what crypt() does. The modern implementations I mentioned in fact even allow you to tweak how many iterations it should use. This allows a system administrator to compensate for Moore’s law; as CPUs get faster, the administrator can just tune the number of iterations. The cool thing is that the modular syntax of crypt() hashes allow you to do this also without having to rehash existing passwords. This allows you to implement your own policy for expiring passwords hashed with older algorithms and upgrade to a more secure system at your own pace.

  10. Peter,

    Yes, I do agree that this approach makes the attacker’s job harder, but as long as he has access to the entire application and all salts, it’s always a tractable problem. There’s also tension between security and usability. If you stretch the time required to perform the operation too much, then you impact the user experience at signup and login. You also increase the demand on server resources at scale, when many people are attempting to sign up or log in at the same time. On a small scale it’s less of an issue.

  11. Lyle,

    Yep, that’s a classic tradeoff and everyone will make it differently, since different apps or businesses have different needs. The main point is that people should be *aware* that you can make this tradeoff, and that the software should have the accommodations to allow the user to make it.

  12. To get rid of all those rainbow tables.

    I just use a some functions
    sha1(md5(md5(sha1(sha1(md5(PASSWORD)))))

    I talked with some guys from Deloitte – they got me thinking a lot. It has nothing to do with the length of the password, but how many times you may try to login. Fx creditcard codes are normally on 4 numbers, but you only have 3 tries.

    So what I did was to make my login function show a captcha after 3 tries, and after 5 tries it will lock the username, and report multiple attacks to an administrator email.

    Usernames are twoway encrypted in my database, with a random function, where you will need the source code to get the username.

  13. @lsv20, limiting login attempts is good practice, but it doesn’t protect you from an offline attack if someone gets access to the database itself. They can take the data and crunch it independently of the Web application.

  14. @lsv20, Lyle is right; the techniques under discussion are about stopping attackers from obtaining passwords *off-line* from a compromised database containing the hashes, not about *on-line* attempts to bruteforce a successful login.

    Also, your approach, if PASSWORD is just the user’s plain password, is *just* as vulnerable as unsalted single-function hashing wrt rainbow tables: one only has to generate all hashes once for your set of functions. Rainbow tables are about *precomputation*. Your set of functions only make that precomputation a bit more expensive, but it still only needs to be done *once* for all deployments of your application or library if you don’t salt your hashes.

    Even if your password *is* salted, your homebrew scheme doesn’t buy you much additional security; the crypt()-based methods I mentioned run blowfish or SHA-2 (both more expensive than SHA-1) for literally *thousands* of iterations. Your approach is computationally about as expensive as running SHA-1 for 4, maybe 5 iterations…

  15. Hi Lyle,

    The only thing I can see the article is missing is…Where can one go to learn more about how to implement these suggestions?

    Thanks, Jim

  16. Jim,
    That depends a great deal on your application environment. However, one excellent resource for security issues in general is the Security Now podcast at http://twit.tv/sn. Look through the archives for specific topics. Also, books by Bruce Schneier are excellent.

  17. Usually I like ALA articles a lot, but I felt the need to register to post a comment on this. The discussion of hashing here is dangerously incomplete.

    Most cryptographic hash methods are meant to be fast. That’s a good thing for most applications, but it’s terrible for password hashing, because people can generate rainbow tables relatively quickly. (_Especially_ now that Amazon EC2 Cluster GPU instances are available.)

    Using standard hash functions like SHA1 is bad security, and telling people that these hash functions are “good enough” is dangerous. If you want your passwords to be secure, you need to be using *bcrypt* or *scrypt*. They are specifically designed to be slow, and they do the work of salting for you.

    “Thomas Ptacek’s article”:http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html is still the best I’ve seen on password hashing. Please read it before giving people advice on secure password storage. If you are using SHA1, and your database is compromised, your users’ passwords are not secure.

  18. Thank you for writing on this subject. This is interesting I have not thought about it from your point of view.I have been looking for something like this and your blog helps me a lot to understand the topic better. Waiting for your next post.

  19. @samdk, thank you for your comments. This point has been raised in the discussion above and also in some other forums, where several people have felt that I should be recommending a slower hashing algorithm than SHA-1. It’s a valid point, but I think in practice, the gains are fairly small and there are also some drawbacks that the advocates of these algorithms don’t always consider. Here is some of my reasoning:

    I absolutely agree that slower is better when it comes to passwords. But slow does not mean impossible. Take the following example. Say we have a moderate user database of 50,000 records, and that an attacker can perform a SHA-1 hash in 1ms. Taking a dictionary of 1000 common passwords, he can generate a custom rainbow table in one second, compare to the database and probably recover a fair number of passwords unless the application enforced strict complexity requirements. This is assuming that the attacker has access to the application code and the salt, a point which I’ll address in a moment. Alternatively, say we’re using bcrypt with a cost factor that requires 100ms for the attacker to process. Using the same 1000-word dictionary, it will now take the attacker 58 days to run the entire list. More difficult, yes, but not infeasible. And that’s assuming that he has to process the list serially. Parallelizing the operation would make it considerably faster, and this is easy with modern GPUs and services like EC2, exactly as you pointed out.

    So the slower algorithms have a benefit, but they’re not going to render an attack impractical. Changing the numbers, of course, changes the outcome. A smaller user database will be run faster, and a larger one will take longer. A dictionary of 500 words might yield almost as many results as a larger one. And it’s worth noting that an attacker doesn’t have to exhaust the database to start using information maliciously. You could give bcrypt a higher cost factor, but you’re going to impact legitimate application performance if you take it very far.

    Now to drawbacks. These functions can be complex to implement, for various reasons. Often they rely on libraries installed at the OS level. For example, PHP’s crypt() function did not have an internal implementation prior to v5.3. In earlier versions, it would simply rely on the OS, and if your OS didn’t support Blowfish, PHP would just fall back on DES, which as we know has serious problems. There are a *lot* of servers running versions of PHP older than 5.3. And crypt()-based functions can be especially problematic on Windows, since they’re generally rooted in a *nix library. The MySQL ENCRYPT() function simply doesn’t work on Windows, for example, so portability can be a problem. Web developers are not always in control of the operating systems on which their apps depend, and adding complexity makes people less likely to do anything at all. I’d rather have them use an algorithm that’s broadly supported even if it’s faster.

    Finally, as I’ve pointed out before, these functions store the salts and algorithm identifier along with the passwords. That means that unless you *also* use an application-level salt, an attacker who accesses the database has all the information he needs to perform an attack. But a database may be compromised through SQL injection, or from a backup file, or because a management tool was compromised. It is far from given that an attacker who has the database owns the entire machine or even the application code. With a salt that’s stored separately at the application layer, the attacker who has only the database has zero chance of recovering any passwords. Not only does he not have the salt, he doesn’t know its length or how it’s combined with the passwords. Now we certainly shouldn’t be over-reliant on obscurity, but compartmentalizing the salt away from the result, and guarding it like you would guard a symmetric key, at least narrows the type of attack that’s required.

    You can certainly do both, of course, and use a global salt along with individual salts — those individual salts will definitely increase the attacker’s time requirement. In the end, though, any hash will be crackable by an attacker who has complete system access, as long as the application allows dictionary words to be used.

  20. Since the output, the hash, is generally shorter than the input. (typically the input can be any length while the output is fixed-length), there must be collisions.

  21. Your time estimates are off by orders of magnitude. GPUs can calculate hundreds of millions of SHA1 hashes in a second (http://www.win.tue.nl/cccc/sha-1-challenge.html), and powerful GPUs are available to anyone who wants them with an EC2 cluster GPU instance. That is, your 50,000,000 SHA1 iterations can be run in under a second. (Even a regular CPU core will take only a few seconds.)

    Certainly bcrypt/scrypt are not perfect, but giving your users days/weeks/months to change their passwords is a lot better than no time at all. If you can’t use bcrypt or scrypt for technical or other reasons, then running many (1000+) iterations of a hash function is another alternative.

    But really, the problem I have is not with the discussion here in the comments: it’s that none of this is discussed in the article, which is all most people are going to read.

  22. Sam, I said that the custom attacker could generate a custom rainbow table in one second. That was in *agreement* with your argument. My point is simply that the scenario where an attacker has complete system access is not the one I’m most concerned about defending against *in the context of web apps*, where database exposure seems to me most likely to occur through SQL injection. So I put a higher priority on separating the salt from the data than on having individual salts.

    The essay is intended to be fairly basic and I only wanted to mention tools that are accessible to most developers — not to describe a gold standard of password storage.

  23. Yes, I do agree that this approach makes the attacker’s job harder, but as long as he has access to the entire application and all salts, it’s always a tractable problem. There’s also tension between security and usability. If you stretch the time required to perform the operation too much, then you impact the user experience at signup and login. You also increase the demand on server resources at scale, when many people are attempting to sign up or log in at the same time. On a small scale it’s less of an issue.

  24. One of the problems I’m dealing with right now is that, while lots of authentication systems use SHA-1 out of the box, they all use a different default salt algorithm. That makes it really hard to merge accounts or switch from one system to another without loosing everyone’s password.

    I’m not sure it’s true that “The SHA-1 algorithm… is currently the best available hash function that enjoys broad support in most popular programming languages.” To my knowledge, bcrypt is at least as secure,* and there are implementations in all the major programming languages. Our Ubuntu (Debian) Linux servers have it pre-installed.

    But the real win with bcrypt is that the salt is stored with the hash in a standardized format. Bcrypt generates a different hash every time it’s called for the same password, though any one of those hashes can validate that password. So you can move your accounts to and from any bcrypt-capable system without worrying about how OpenLDAP (for instance) decided to salt the password.

    What it comes down to is, encryption is best left to the experts. Deciding how to salt your hashes isn’t something web developers should have to think about. SHA-1 doesn’t do that for you, bcrypt does.

    *I’m not cryptographer, but bcrypt has been around for a while, and I don’t know of any security holes. It’s also newer than SHA-1, and has a number of properties designed to make it harder to crack. In particular, you can set the minimum amount of processing time required to verify a hash, so that it’s plenty fast for authentication, but expensive to brute force attack.

  25. @dleppik, bcrypt has been discussed quite a bit in the comments above. Yes, it is a stronger algorithm in many ways. It does have some drawbacks that I’ve listed on the previous page, but for the most part they have more to do with implementation than cryptography. My sentence about SHA-1 has been a bit misunderstood, I think. The point was not that it’s the strongest hashing algorithm for password storage in general, but that it’s *broadly available*. I should have been clearer on that point. Bcrypt isn’t supported everywhere, and since hashing is irreversible, that’s a very important consideration to me. But really, the essay was intended to describe the *concept* of hashing and salting, not to provide a specific tutorial for how everyone should be implementing it. Different environments will have different constraints.

  26. This was an excellently written article, and covers a concern that I’ve tried time and again to communicate to other people. Hashing the raw password is not enough, and an additional check digit or check character is always a great addition as well. That way, the password is modified, the hash is concatenated to a fixed length with the position based on the user ID (not user name), and a check digit is added.

  27. Be careful to limit your expectations of what a salted hash actually does!

    It is fairly simple to extend a theoretical attack to prepend the salt onto a word to test, a small ammount of setup time is insignificant. A salt only prevents a precomuted (rainbow) table being used, it doesn’t make it any more time consuming for an attacker to use a keyspace or dictionary attack on the cyphertext.

    On a 8600 GPU (fairly outdated) the search rate for SHA1 hashes is a little over 46M/s, varying slightly with the number of hashes to check against. A linux box with several more powerful dedicated GPU could possibly chew through a bullion a second. Most salting systems used in PHP applications offer little to no protection against these sort of attacks, sheer speed is enough to reverse most passwords.

    The term ‘custom rainbow table’ is an oxymoron, the point of a rainbow table is that it can be reused without additional calculation time. Generating tables would be a waste of time and power, a standard attack (finishing when a possible hash is found) would be far more efficient.

    I’ve always been interested to know how Facebook stores password fingerprints, has it been published anywhere?

  28. The fact that most users re-use passwords across multiple sites is the problem, as pointed out in the article. But therefore what benefit is it if my site uses military level encryption if another site just stores plain passwords?

    My site’s security for that user (and probably most users) is still just as good as that plain password. It’s not a huge comfort to me that I was hacked because of someone else’s lax security.

    Hashing / encryption always sends us off into getting excited about the number of bits, or complexity or whatever – but distracts from all the human aspects where the real security risks are.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

I am a creative.

A List Apart founder and web design OG Zeldman ponders the moments of inspiration, the hours of plodding, and the ultimate mystery at the heart of a creative career.
Career