Since the output, the hash, is generally shorter than the input. (typically the input can be any length while the output is fixed-length), there must be collisions.
Your time estimates are off by orders of magnitude. GPUs can calculate hundreds of millions of SHA1 hashes in a second (http://www.win.tue.nl/cccc/sha-1-challenge.html), and powerful GPUs are available to anyone who wants them with an EC2 cluster GPU instance. That is, your 50,000,000 SHA1 iterations can be run in under a second. (Even a regular CPU core will take only a few seconds.)
Certainly bcrypt/scrypt are not perfect, but giving your users days/weeks/months to change their passwords is a lot better than no time at all. If you can’t use bcrypt or scrypt for technical or other reasons, then running many (1000+) iterations of a hash function is another alternative.
But really, the problem I have is not with the discussion here in the comments: it’s that none of this is discussed in the article, which is all most people are going to read.
Sam, I said that the custom attacker could generate a custom rainbow table in one second. That was in agreement with your argument. My point is simply that the scenario where an attacker has complete system access is not the one I’m most concerned about defending against in the context of web apps, where database exposure seems to me most likely to occur through SQL injection. So I put a higher priority on separating the salt from the data than on having individual salts.
The essay is intended to be fairly basic and I only wanted to mention tools that are accessible to most developers—not to describe a gold standard of password storage.
Yes, I do agree that this approach makes the attacker’s job harder, but as long as he has access to the entire application and all salts, it’s always a tractable problem. There’s also tension between security and usability. If you stretch the time required to perform the operation too much, then you impact the user experience at signup and login. You also increase the demand on server resources at scale, when many people are attempting to sign up or log in at the same time. On a small scale it’s less of an issue.
One of the problems I’m dealing with right now is that, while lots of authentication systems use SHA-1 out of the box, they all use a different default salt algorithm. That makes it really hard to merge accounts or switch from one system to another without loosing everyone’s password.
I’m not sure it’s true that “The SHA-1 algorithm… is currently the best available hash function that enjoys broad support in most popular programming languages.” To my knowledge, bcrypt is at least as secure,* and there are implementations in all the major programming languages. Our Ubuntu (Debian) Linux servers have it pre-installed.
But the real win with bcrypt is that the salt is stored with the hash in a standardized format. Bcrypt generates a different hash every time it’s called for the same password, though any one of those hashes can validate that password. So you can move your accounts to and from any bcrypt-capable system without worrying about how OpenLDAP (for instance) decided to salt the password.
What it comes down to is, encryption is best left to the experts. Deciding how to salt your hashes isn’t something web developers should have to think about. SHA-1 doesn’t do that for you, bcrypt does.
*I’m not cryptographer, but bcrypt has been around for a while, and I don’t know of any security holes. It’s also newer than SHA-1, and has a number of properties designed to make it harder to crack. In particular, you can set the minimum amount of processing time required to verify a hash, so that it’s plenty fast for authentication, but expensive to brute force attack.
@dleppik, bcrypt has been discussed quite a bit in the comments above. Yes, it is a stronger algorithm in many ways. It does have some drawbacks that I’ve listed on the previous page, but for the most part they have more to do with implementation than cryptography. My sentence about SHA-1 has been a bit misunderstood, I think. The point was not that it’s the strongest hashing algorithm for password storage in general, but that it’s broadly available. I should have been clearer on that point. Bcrypt isn’t supported everywhere, and since hashing is irreversible, that’s a very important consideration to me. But really, the essay was intended to describe the concept of hashing and salting, not to provide a specific tutorial for how everyone should be implementing it. Different environments will have different constraints.
This was an excellently written article, and covers a concern that I’ve tried time and again to communicate to other people. Hashing the raw password is not enough, and an additional check digit or check character is always a great addition as well. That way, the password is modified, the hash is concatenated to a fixed length with the position based on the user ID (not user name), and a check digit is added.
Copy & paste the code below to embed this comment.
violaceous
Be careful to limit your expectations of what a salted hash actually does!
It is fairly simple to extend a theoretical attack to prepend the salt onto a word to test, a small ammount of setup time is insignificant. A salt only prevents a precomuted (rainbow) table being used, it doesn’t make it any more time consuming for an attacker to use a keyspace or dictionary attack on the cyphertext.
On a 8600 GPU (fairly outdated) the search rate for SHA1 hashes is a little over 46M/s, varying slightly with the number of hashes to check against. A linux box with several more powerful dedicated GPU could possibly chew through a bullion a second. Most salting systems used in PHP applications offer little to no protection against these sort of attacks, sheer speed is enough to reverse most passwords.
The term ‘custom rainbow table’ is an oxymoron, the point of a rainbow table is that it can be reused without additional calculation time. Generating tables would be a waste of time and power, a standard attack (finishing when a possible hash is found) would be far more efficient.
I’ve always been interested to know how Facebook stores password fingerprints, has it been published anywhere?
The fact that most users re-use passwords across multiple sites is the problem, as pointed out in the article. But therefore what benefit is it if my site uses military level encryption if another site just stores plain passwords?
My site’s security for that user (and probably most users) is still just as good as that plain password. It’s not a huge comfort to me that I was hacked because of someone else’s lax security.
Hashing / encryption always sends us off into getting excited about the number of bits, or complexity or whatever – but distracts from all the human aspects where the real security risks are.
30 Reader Comments
Back to the Articlepetergrece
Since the output, the hash, is generally shorter than the input. (typically the input can be any length while the output is fixed-length), there must be collisions.
samdk
Your time estimates are off by orders of magnitude. GPUs can calculate hundreds of millions of SHA1 hashes in a second (http://www.win.tue.nl/cccc/sha-1-challenge.html), and powerful GPUs are available to anyone who wants them with an EC2 cluster GPU instance. That is, your 50,000,000 SHA1 iterations can be run in under a second. (Even a regular CPU core will take only a few seconds.)
Certainly bcrypt/scrypt are not perfect, but giving your users days/weeks/months to change their passwords is a lot better than no time at all. If you can’t use bcrypt or scrypt for technical or other reasons, then running many (1000+) iterations of a hash function is another alternative.
But really, the problem I have is not with the discussion here in the comments: it’s that none of this is discussed in the article, which is all most people are going to read.
Lyle Mullican
Sam, I said that the custom attacker could generate a custom rainbow table in one second. That was in agreement with your argument. My point is simply that the scenario where an attacker has complete system access is not the one I’m most concerned about defending against in the context of web apps, where database exposure seems to me most likely to occur through SQL injection. So I put a higher priority on separating the salt from the data than on having individual salts.
The essay is intended to be fairly basic and I only wanted to mention tools that are accessible to most developers—not to describe a gold standard of password storage.
Lyle Mullican
Typo above: strike
customattackerbursaevdenevenakliye
Yes, I do agree that this approach makes the attacker’s job harder, but as long as he has access to the entire application and all salts, it’s always a tractable problem. There’s also tension between security and usability. If you stretch the time required to perform the operation too much, then you impact the user experience at signup and login. You also increase the demand on server resources at scale, when many people are attempting to sign up or log in at the same time. On a small scale it’s less of an issue.
dleppik
One of the problems I’m dealing with right now is that, while lots of authentication systems use SHA-1 out of the box, they all use a different default salt algorithm. That makes it really hard to merge accounts or switch from one system to another without loosing everyone’s password.
I’m not sure it’s true that “The SHA-1 algorithm… is currently the best available hash function that enjoys broad support in most popular programming languages.” To my knowledge, bcrypt is at least as secure,* and there are implementations in all the major programming languages. Our Ubuntu (Debian) Linux servers have it pre-installed.
But the real win with bcrypt is that the salt is stored with the hash in a standardized format. Bcrypt generates a different hash every time it’s called for the same password, though any one of those hashes can validate that password. So you can move your accounts to and from any bcrypt-capable system without worrying about how OpenLDAP (for instance) decided to salt the password.
What it comes down to is, encryption is best left to the experts. Deciding how to salt your hashes isn’t something web developers should have to think about. SHA-1 doesn’t do that for you, bcrypt does.
*I’m not cryptographer, but bcrypt has been around for a while, and I don’t know of any security holes. It’s also newer than SHA-1, and has a number of properties designed to make it harder to crack. In particular, you can set the minimum amount of processing time required to verify a hash, so that it’s plenty fast for authentication, but expensive to brute force attack.
Lyle Mullican
@dleppik, bcrypt has been discussed quite a bit in the comments above. Yes, it is a stronger algorithm in many ways. It does have some drawbacks that I’ve listed on the previous page, but for the most part they have more to do with implementation than cryptography. My sentence about SHA-1 has been a bit misunderstood, I think. The point was not that it’s the strongest hashing algorithm for password storage in general, but that it’s broadly available. I should have been clearer on that point. Bcrypt isn’t supported everywhere, and since hashing is irreversible, that’s a very important consideration to me. But really, the essay was intended to describe the concept of hashing and salting, not to provide a specific tutorial for how everyone should be implementing it. Different environments will have different constraints.
MrAdamLunde
This was an excellently written article, and covers a concern that I’ve tried time and again to communicate to other people. Hashing the raw password is not enough, and an additional check digit or check character is always a great addition as well. That way, the password is modified, the hash is concatenated to a fixed length with the position based on the user ID (not user name), and a check digit is added.
violaceous
Be careful to limit your expectations of what a salted hash actually does!
It is fairly simple to extend a theoretical attack to prepend the salt onto a word to test, a small ammount of setup time is insignificant. A salt only prevents a precomuted (rainbow) table being used, it doesn’t make it any more time consuming for an attacker to use a keyspace or dictionary attack on the cyphertext.
On a 8600 GPU (fairly outdated) the search rate for SHA1 hashes is a little over 46M/s, varying slightly with the number of hashes to check against. A linux box with several more powerful dedicated GPU could possibly chew through a bullion a second. Most salting systems used in PHP applications offer little to no protection against these sort of attacks, sheer speed is enough to reverse most passwords.
The term ‘custom rainbow table’ is an oxymoron, the point of a rainbow table is that it can be reused without additional calculation time. Generating tables would be a waste of time and power, a standard attack (finishing when a possible hash is found) would be far more efficient.
I’ve always been interested to know how Facebook stores password fingerprints, has it been published anywhere?
James Gaunt
The fact that most users re-use passwords across multiple sites is the problem, as pointed out in the article. But therefore what benefit is it if my site uses military level encryption if another site just stores plain passwords?
My site’s security for that user (and probably most users) is still just as good as that plain password. It’s not a huge comfort to me that I was hacked because of someone else’s lax security.
Hashing / encryption always sends us off into getting excited about the number of bits, or complexity or whatever – but distracts from all the human aspects where the real security risks are.