Yep, that’s a classic tradeoff and everyone will make it differently, since different apps or businesses have different needs. The main point is that people should be aware that you can make this tradeoff, and that the software should have the accommodations to allow the user to make it.
Copy & paste the code below to embed this comment.
lsv20
To get rid of all those rainbow tables.
I just use a some functions
sha1(md5(md5(sha1(sha1(md5(PASSWORD)))))
I talked with some guys from Deloitte – they got me thinking a lot. It has nothing to do with the length of the password, but how many times you may try to login. Fx creditcard codes are normally on 4 numbers, but you only have 3 tries.
So what I did was to make my login function show a captcha after 3 tries, and after 5 tries it will lock the username, and report multiple attacks to an administrator email.
Usernames are twoway encrypted in my database, with a random function, where you will need the source code to get the username.
@lsv20, limiting login attempts is good practice, but it doesn’t protect you from an offline attack if someone gets access to the database itself. They can take the data and crunch it independently of the Web application.
@lsv20, Lyle is right; the techniques under discussion are about stopping attackers from obtaining passwords off-line from a compromised database containing the hashes, not about on-line attempts to bruteforce a successful login.
Also, your approach, if PASSWORD is just the user’s plain password, is just as vulnerable as unsalted single-function hashing wrt rainbow tables: one only has to generate all hashes once for your set of functions. Rainbow tables are about precomputation. Your set of functions only make that precomputation a bit more expensive, but it still only needs to be done once for all deployments of your application or library if you don’t salt your hashes.
Even if your password is salted, your homebrew scheme doesn’t buy you much additional security; the crypt()-based methods I mentioned run blowfish or SHA-2 (both more expensive than SHA-1) for literally thousands of iterations. Your approach is computationally about as expensive as running SHA-1 for 4, maybe 5 iterations…
Jim,
That depends a great deal on your application environment. However, one excellent resource for security issues in general is the Security Now podcast at http://twit.tv/sn. Look through the archives for specific topics. Also, books by Bruce Schneier are excellent.
Usually I like ALA articles a lot, but I felt the need to register to post a comment on this. The discussion of hashing here is dangerously incomplete.
Most cryptographic hash methods are meant to be fast. That’s a good thing for most applications, but it’s terrible for password hashing, because people can generate rainbow tables relatively quickly. (Especially now that Amazon EC2 Cluster GPU instances are available.)
Using standard hash functions like SHA1 is bad security, and telling people that these hash functions are “good enough” is dangerous. If you want your passwords to be secure, you need to be using bcrypt or scrypt. They are specifically designed to be slow, and they do the work of salting for you.
“Thomas Ptacek’s article”:http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html is still the best I’ve seen on password hashing. Please read it before giving people advice on secure password storage. If you are using SHA1, and your database is compromised, your users’ passwords are not secure.
Thank you for writing on this subject. This is interesting I have not thought about it from your point of view.I have been looking for something like this and your blog helps me a lot to understand the topic better. Waiting for your next post.
@samdk, thank you for your comments. This point has been raised in the discussion above and also in some other forums, where several people have felt that I should be recommending a slower hashing algorithm than SHA-1. It’s a valid point, but I think in practice, the gains are fairly small and there are also some drawbacks that the advocates of these algorithms don’t always consider. Here is some of my reasoning:
I absolutely agree that slower is better when it comes to passwords. But slow does not mean impossible. Take the following example. Say we have a moderate user database of 50,000 records, and that an attacker can perform a SHA-1 hash in 1ms. Taking a dictionary of 1000 common passwords, he can generate a custom rainbow table in one second, compare to the database and probably recover a fair number of passwords unless the application enforced strict complexity requirements. This is assuming that the attacker has access to the application code and the salt, a point which I’ll address in a moment. Alternatively, say we’re using bcrypt with a cost factor that requires 100ms for the attacker to process. Using the same 1000-word dictionary, it will now take the attacker 58 days to run the entire list. More difficult, yes, but not infeasible. And that’s assuming that he has to process the list serially. Parallelizing the operation would make it considerably faster, and this is easy with modern GPUs and services like EC2, exactly as you pointed out.
So the slower algorithms have a benefit, but they’re not going to render an attack impractical. Changing the numbers, of course, changes the outcome. A smaller user database will be run faster, and a larger one will take longer. A dictionary of 500 words might yield almost as many results as a larger one. And it’s worth noting that an attacker doesn’t have to exhaust the database to start using information maliciously. You could give bcrypt a higher cost factor, but you’re going to impact legitimate application performance if you take it very far.
Now to drawbacks. These functions can be complex to implement, for various reasons. Often they rely on libraries installed at the OS level. For example, PHP’s crypt() function did not have an internal implementation prior to v5.3. In earlier versions, it would simply rely on the OS, and if your OS didn’t support Blowfish, PHP would just fall back on DES, which as we know has serious problems. There are a lot of servers running versions of PHP older than 5.3. And crypt()-based functions can be especially problematic on Windows, since they’re generally rooted in a *nix library. The MySQL ENCRYPT function simply doesn’t work on Windows, for example, so portability can be a problem. Web developers are not always in control of the operating systems on which their apps depend, and adding complexity makes people less likely to do anything at all. I’d rather have them use an algorithm that’s broadly supported even if it’s faster.
Finally, as I’ve pointed out before, these functions store the salts and algorithm identifier along with the passwords. That means that unless you also use an application-level salt, an attacker who accesses the database has all the information he needs to perform an attack. But a database may be compromised through SQL injection, or from a backup file, or because a management tool was compromised. It is far from given that an attacker who has the database owns the entire machine or even the application code. With a salt that’s stored separately at the application layer, the attacker who has only the database has zero chance of recovering any passwords. Not only does he not have the salt, he doesn’t know its length or how it’s combined with the passwords. Now we certainly shouldn’t be over-reliant on obscurity, but compartmentalizing the salt away from the result, and guarding it like you would guard a symmetric key, at least narrows the type of attack that’s required.
You can certainly do both, of course, and use a global salt along with individual salts—those individual salts will definitely increase the attacker’s time requirement. In the end, though, any hash will be crackable by an attacker who has complete system access, as long as the application allows dictionary words to be used.
Since the output, the hash, is generally shorter than the input. (typically the input can be any length while the output is fixed-length), there must be collisions.
Your time estimates are off by orders of magnitude. GPUs can calculate hundreds of millions of SHA1 hashes in a second (http://www.win.tue.nl/cccc/sha-1-challenge.html), and powerful GPUs are available to anyone who wants them with an EC2 cluster GPU instance. That is, your 50,000,000 SHA1 iterations can be run in under a second. (Even a regular CPU core will take only a few seconds.)
Certainly bcrypt/scrypt are not perfect, but giving your users days/weeks/months to change their passwords is a lot better than no time at all. If you can’t use bcrypt or scrypt for technical or other reasons, then running many (1000+) iterations of a hash function is another alternative.
But really, the problem I have is not with the discussion here in the comments: it’s that none of this is discussed in the article, which is all most people are going to read.
Sam, I said that the custom attacker could generate a custom rainbow table in one second. That was in agreement with your argument. My point is simply that the scenario where an attacker has complete system access is not the one I’m most concerned about defending against in the context of web apps, where database exposure seems to me most likely to occur through SQL injection. So I put a higher priority on separating the salt from the data than on having individual salts.
The essay is intended to be fairly basic and I only wanted to mention tools that are accessible to most developers—not to describe a gold standard of password storage.
Yes, I do agree that this approach makes the attacker’s job harder, but as long as he has access to the entire application and all salts, it’s always a tractable problem. There’s also tension between security and usability. If you stretch the time required to perform the operation too much, then you impact the user experience at signup and login. You also increase the demand on server resources at scale, when many people are attempting to sign up or log in at the same time. On a small scale it’s less of an issue.
One of the problems I’m dealing with right now is that, while lots of authentication systems use SHA-1 out of the box, they all use a different default salt algorithm. That makes it really hard to merge accounts or switch from one system to another without loosing everyone’s password.
I’m not sure it’s true that “The SHA-1 algorithm… is currently the best available hash function that enjoys broad support in most popular programming languages.” To my knowledge, bcrypt is at least as secure,* and there are implementations in all the major programming languages. Our Ubuntu (Debian) Linux servers have it pre-installed.
But the real win with bcrypt is that the salt is stored with the hash in a standardized format. Bcrypt generates a different hash every time it’s called for the same password, though any one of those hashes can validate that password. So you can move your accounts to and from any bcrypt-capable system without worrying about how OpenLDAP (for instance) decided to salt the password.
What it comes down to is, encryption is best left to the experts. Deciding how to salt your hashes isn’t something web developers should have to think about. SHA-1 doesn’t do that for you, bcrypt does.
*I’m not cryptographer, but bcrypt has been around for a while, and I don’t know of any security holes. It’s also newer than SHA-1, and has a number of properties designed to make it harder to crack. In particular, you can set the minimum amount of processing time required to verify a hash, so that it’s plenty fast for authentication, but expensive to brute force attack.
@dleppik, bcrypt has been discussed quite a bit in the comments above. Yes, it is a stronger algorithm in many ways. It does have some drawbacks that I’ve listed on the previous page, but for the most part they have more to do with implementation than cryptography. My sentence about SHA-1 has been a bit misunderstood, I think. The point was not that it’s the strongest hashing algorithm for password storage in general, but that it’s broadly available. I should have been clearer on that point. Bcrypt isn’t supported everywhere, and since hashing is irreversible, that’s a very important consideration to me. But really, the essay was intended to describe the concept of hashing and salting, not to provide a specific tutorial for how everyone should be implementing it. Different environments will have different constraints.
This was an excellently written article, and covers a concern that I’ve tried time and again to communicate to other people. Hashing the raw password is not enough, and an additional check digit or check character is always a great addition as well. That way, the password is modified, the hash is concatenated to a fixed length with the position based on the user ID (not user name), and a check digit is added.
Copy & paste the code below to embed this comment.
violaceous
Be careful to limit your expectations of what a salted hash actually does!
It is fairly simple to extend a theoretical attack to prepend the salt onto a word to test, a small ammount of setup time is insignificant. A salt only prevents a precomuted (rainbow) table being used, it doesn’t make it any more time consuming for an attacker to use a keyspace or dictionary attack on the cyphertext.
On a 8600 GPU (fairly outdated) the search rate for SHA1 hashes is a little over 46M/s, varying slightly with the number of hashes to check against. A linux box with several more powerful dedicated GPU could possibly chew through a bullion a second. Most salting systems used in PHP applications offer little to no protection against these sort of attacks, sheer speed is enough to reverse most passwords.
The term ‘custom rainbow table’ is an oxymoron, the point of a rainbow table is that it can be reused without additional calculation time. Generating tables would be a waste of time and power, a standard attack (finishing when a possible hash is found) would be far more efficient.
I’ve always been interested to know how Facebook stores password fingerprints, has it been published anywhere?
The fact that most users re-use passwords across multiple sites is the problem, as pointed out in the article. But therefore what benefit is it if my site uses military level encryption if another site just stores plain passwords?
My site’s security for that user (and probably most users) is still just as good as that plain password. It’s not a huge comfort to me that I was hacked because of someone else’s lax security.
Hashing / encryption always sends us off into getting excited about the number of bits, or complexity or whatever – but distracts from all the human aspects where the real security risks are.
30 Reader Comments
Back to the ArticlePeter Bex
Lyle,
Yep, that’s a classic tradeoff and everyone will make it differently, since different apps or businesses have different needs. The main point is that people should be aware that you can make this tradeoff, and that the software should have the accommodations to allow the user to make it.
lsv20
To get rid of all those rainbow tables.
I just use a some functions
sha1(md5(md5(sha1(sha1(md5(PASSWORD)))))
I talked with some guys from Deloitte – they got me thinking a lot. It has nothing to do with the length of the password, but how many times you may try to login. Fx creditcard codes are normally on 4 numbers, but you only have 3 tries.
So what I did was to make my login function show a captcha after 3 tries, and after 5 tries it will lock the username, and report multiple attacks to an administrator email.
Usernames are twoway encrypted in my database, with a random function, where you will need the source code to get the username.
Lyle Mullican
@lsv20, limiting login attempts is good practice, but it doesn’t protect you from an offline attack if someone gets access to the database itself. They can take the data and crunch it independently of the Web application.
Peter Bex
@lsv20, Lyle is right; the techniques under discussion are about stopping attackers from obtaining passwords off-line from a compromised database containing the hashes, not about on-line attempts to bruteforce a successful login.
Also, your approach, if PASSWORD is just the user’s plain password, is just as vulnerable as unsalted single-function hashing wrt rainbow tables: one only has to generate all hashes once for your set of functions. Rainbow tables are about precomputation. Your set of functions only make that precomputation a bit more expensive, but it still only needs to be done once for all deployments of your application or library if you don’t salt your hashes.
Even if your password is salted, your homebrew scheme doesn’t buy you much additional security; the crypt()-based methods I mentioned run blowfish or SHA-2 (both more expensive than SHA-1) for literally thousands of iterations. Your approach is computationally about as expensive as running SHA-1 for 4, maybe 5 iterations…
jcgallaher
Hi Lyle,
The only thing I can see the article is missing is…Where can one go to learn more about how to implement these suggestions?
Thanks, Jim
Lyle Mullican
Jim,
That depends a great deal on your application environment. However, one excellent resource for security issues in general is the Security Now podcast at http://twit.tv/sn. Look through the archives for specific topics. Also, books by Bruce Schneier are excellent.
jcgallaher
Thank Lyle for the tips!
samdk
Usually I like ALA articles a lot, but I felt the need to register to post a comment on this. The discussion of hashing here is dangerously incomplete.
Most cryptographic hash methods are meant to be fast. That’s a good thing for most applications, but it’s terrible for password hashing, because people can generate rainbow tables relatively quickly. (Especially now that Amazon EC2 Cluster GPU instances are available.)
Using standard hash functions like SHA1 is bad security, and telling people that these hash functions are “good enough” is dangerous. If you want your passwords to be secure, you need to be using bcrypt or scrypt. They are specifically designed to be slow, and they do the work of salting for you.
“Thomas Ptacek’s article”:http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html is still the best I’ve seen on password hashing. Please read it before giving people advice on secure password storage. If you are using SHA1, and your database is compromised, your users’ passwords are not secure.
pinkbanana
Thank you for writing on this subject. This is interesting I have not thought about it from your point of view.I have been looking for something like this and your blog helps me a lot to understand the topic better. Waiting for your next post.
Lyle Mullican
@samdk, thank you for your comments. This point has been raised in the discussion above and also in some other forums, where several people have felt that I should be recommending a slower hashing algorithm than SHA-1. It’s a valid point, but I think in practice, the gains are fairly small and there are also some drawbacks that the advocates of these algorithms don’t always consider. Here is some of my reasoning:
I absolutely agree that slower is better when it comes to passwords. But slow does not mean impossible. Take the following example. Say we have a moderate user database of 50,000 records, and that an attacker can perform a SHA-1 hash in 1ms. Taking a dictionary of 1000 common passwords, he can generate a custom rainbow table in one second, compare to the database and probably recover a fair number of passwords unless the application enforced strict complexity requirements. This is assuming that the attacker has access to the application code and the salt, a point which I’ll address in a moment. Alternatively, say we’re using bcrypt with a cost factor that requires 100ms for the attacker to process. Using the same 1000-word dictionary, it will now take the attacker 58 days to run the entire list. More difficult, yes, but not infeasible. And that’s assuming that he has to process the list serially. Parallelizing the operation would make it considerably faster, and this is easy with modern GPUs and services like EC2, exactly as you pointed out.
So the slower algorithms have a benefit, but they’re not going to render an attack impractical. Changing the numbers, of course, changes the outcome. A smaller user database will be run faster, and a larger one will take longer. A dictionary of 500 words might yield almost as many results as a larger one. And it’s worth noting that an attacker doesn’t have to exhaust the database to start using information maliciously. You could give bcrypt a higher cost factor, but you’re going to impact legitimate application performance if you take it very far.
Now to drawbacks. These functions can be complex to implement, for various reasons. Often they rely on libraries installed at the OS level. For example, PHP’s crypt() function did not have an internal implementation prior to v5.3. In earlier versions, it would simply rely on the OS, and if your OS didn’t support Blowfish, PHP would just fall back on DES, which as we know has serious problems. There are a lot of servers running versions of PHP older than 5.3. And crypt()-based functions can be especially problematic on Windows, since they’re generally rooted in a *nix library. The MySQL ENCRYPT function simply doesn’t work on Windows, for example, so portability can be a problem. Web developers are not always in control of the operating systems on which their apps depend, and adding complexity makes people less likely to do anything at all. I’d rather have them use an algorithm that’s broadly supported even if it’s faster.
Finally, as I’ve pointed out before, these functions store the salts and algorithm identifier along with the passwords. That means that unless you also use an application-level salt, an attacker who accesses the database has all the information he needs to perform an attack. But a database may be compromised through SQL injection, or from a backup file, or because a management tool was compromised. It is far from given that an attacker who has the database owns the entire machine or even the application code. With a salt that’s stored separately at the application layer, the attacker who has only the database has zero chance of recovering any passwords. Not only does he not have the salt, he doesn’t know its length or how it’s combined with the passwords. Now we certainly shouldn’t be over-reliant on obscurity, but compartmentalizing the salt away from the result, and guarding it like you would guard a symmetric key, at least narrows the type of attack that’s required.
You can certainly do both, of course, and use a global salt along with individual salts—those individual salts will definitely increase the attacker’s time requirement. In the end, though, any hash will be crackable by an attacker who has complete system access, as long as the application allows dictionary words to be used.
petergrece
Since the output, the hash, is generally shorter than the input. (typically the input can be any length while the output is fixed-length), there must be collisions.
samdk
Your time estimates are off by orders of magnitude. GPUs can calculate hundreds of millions of SHA1 hashes in a second (http://www.win.tue.nl/cccc/sha-1-challenge.html), and powerful GPUs are available to anyone who wants them with an EC2 cluster GPU instance. That is, your 50,000,000 SHA1 iterations can be run in under a second. (Even a regular CPU core will take only a few seconds.)
Certainly bcrypt/scrypt are not perfect, but giving your users days/weeks/months to change their passwords is a lot better than no time at all. If you can’t use bcrypt or scrypt for technical or other reasons, then running many (1000+) iterations of a hash function is another alternative.
But really, the problem I have is not with the discussion here in the comments: it’s that none of this is discussed in the article, which is all most people are going to read.
Lyle Mullican
Sam, I said that the custom attacker could generate a custom rainbow table in one second. That was in agreement with your argument. My point is simply that the scenario where an attacker has complete system access is not the one I’m most concerned about defending against in the context of web apps, where database exposure seems to me most likely to occur through SQL injection. So I put a higher priority on separating the salt from the data than on having individual salts.
The essay is intended to be fairly basic and I only wanted to mention tools that are accessible to most developers—not to describe a gold standard of password storage.
Lyle Mullican
Typo above: strike
customattackerbursaevdenevenakliye
Yes, I do agree that this approach makes the attacker’s job harder, but as long as he has access to the entire application and all salts, it’s always a tractable problem. There’s also tension between security and usability. If you stretch the time required to perform the operation too much, then you impact the user experience at signup and login. You also increase the demand on server resources at scale, when many people are attempting to sign up or log in at the same time. On a small scale it’s less of an issue.
dleppik
One of the problems I’m dealing with right now is that, while lots of authentication systems use SHA-1 out of the box, they all use a different default salt algorithm. That makes it really hard to merge accounts or switch from one system to another without loosing everyone’s password.
I’m not sure it’s true that “The SHA-1 algorithm… is currently the best available hash function that enjoys broad support in most popular programming languages.” To my knowledge, bcrypt is at least as secure,* and there are implementations in all the major programming languages. Our Ubuntu (Debian) Linux servers have it pre-installed.
But the real win with bcrypt is that the salt is stored with the hash in a standardized format. Bcrypt generates a different hash every time it’s called for the same password, though any one of those hashes can validate that password. So you can move your accounts to and from any bcrypt-capable system without worrying about how OpenLDAP (for instance) decided to salt the password.
What it comes down to is, encryption is best left to the experts. Deciding how to salt your hashes isn’t something web developers should have to think about. SHA-1 doesn’t do that for you, bcrypt does.
*I’m not cryptographer, but bcrypt has been around for a while, and I don’t know of any security holes. It’s also newer than SHA-1, and has a number of properties designed to make it harder to crack. In particular, you can set the minimum amount of processing time required to verify a hash, so that it’s plenty fast for authentication, but expensive to brute force attack.
Lyle Mullican
@dleppik, bcrypt has been discussed quite a bit in the comments above. Yes, it is a stronger algorithm in many ways. It does have some drawbacks that I’ve listed on the previous page, but for the most part they have more to do with implementation than cryptography. My sentence about SHA-1 has been a bit misunderstood, I think. The point was not that it’s the strongest hashing algorithm for password storage in general, but that it’s broadly available. I should have been clearer on that point. Bcrypt isn’t supported everywhere, and since hashing is irreversible, that’s a very important consideration to me. But really, the essay was intended to describe the concept of hashing and salting, not to provide a specific tutorial for how everyone should be implementing it. Different environments will have different constraints.
MrAdamLunde
This was an excellently written article, and covers a concern that I’ve tried time and again to communicate to other people. Hashing the raw password is not enough, and an additional check digit or check character is always a great addition as well. That way, the password is modified, the hash is concatenated to a fixed length with the position based on the user ID (not user name), and a check digit is added.
violaceous
Be careful to limit your expectations of what a salted hash actually does!
It is fairly simple to extend a theoretical attack to prepend the salt onto a word to test, a small ammount of setup time is insignificant. A salt only prevents a precomuted (rainbow) table being used, it doesn’t make it any more time consuming for an attacker to use a keyspace or dictionary attack on the cyphertext.
On a 8600 GPU (fairly outdated) the search rate for SHA1 hashes is a little over 46M/s, varying slightly with the number of hashes to check against. A linux box with several more powerful dedicated GPU could possibly chew through a bullion a second. Most salting systems used in PHP applications offer little to no protection against these sort of attacks, sheer speed is enough to reverse most passwords.
The term ‘custom rainbow table’ is an oxymoron, the point of a rainbow table is that it can be reused without additional calculation time. Generating tables would be a waste of time and power, a standard attack (finishing when a possible hash is found) would be far more efficient.
I’ve always been interested to know how Facebook stores password fingerprints, has it been published anywhere?
James Gaunt
The fact that most users re-use passwords across multiple sites is the problem, as pointed out in the article. But therefore what benefit is it if my site uses military level encryption if another site just stores plain passwords?
My site’s security for that user (and probably most users) is still just as good as that plain password. It’s not a huge comfort to me that I was hacked because of someone else’s lax security.
Hashing / encryption always sends us off into getting excited about the number of bits, or complexity or whatever – but distracts from all the human aspects where the real security risks are.