密码散列安全

本部分解释使用散列函数对密码进行安全处理背后的原因，以及如何有效的进行密码散列处理。

为什么需要把应用程序中用户的密码进行散列化？
为何诸如 md5 和 sha1 这样的常见散列函数不适合用在密码保护场景？
如果不建议使用常用散列函数保护密码，那么我应该如何对密码进行散列处理？
“盐”是什么？
我应该如何保存“盐”？

为什么需要把应用程序中用户的密码进行散列化？

当设计一个需要接受用户密码的应用时，对密码进行散列是最基本的，也是必需的安全考虑。如果不对密码进行散列处理，那么一旦应用的数据库受到攻击，那么用户的密码将被窃取。同时，窃取者也可以使用用户账号和密码去尝试其他的应用，如果用户没有为每个应用单独设置密码，那么将面临风险。

通过对密码进行散列处理，然后再保存到数据库中，这样就使得攻击者无法直接获取原始密码，同时还可以保证你的应用可以对原始密码进行相同的散列处理，然后比对散列结果。

需要着重提醒的是，密码散列只能保护密码不会被从数据库中直接窃取，但是无法保证注入到应用中的恶意代码拦截到原始密码。

为何诸如 md5() 和 sha1() 这样的常见散列函数不适合用在密码保护场景？

MD5，SHA1 以及 SHA256 这样的散列算法是面向快速、高效进行散列处理而设计的。随着技术进步和计算机硬件的提升，破解者可以使用“暴力”方式来寻找散列码所对应的原始数据。

因为现代化计算机可以快速的“反转”上述散列算法的散列值，所以很多安全专家都强烈建议不要在密码散列中使用这些散列算法。

如果不建议使用常用散列函数保护密码，那么我应该如何对密码进行散列处理？

当进行密码散列处理的时候，有两个必须考虑的因素：计算量以及“盐”。散列算法的计算量越大，暴力破解所需的时间就越长。

PHP 5.5 提供了一个原生密码散列 API，它提供一种安全的方式来完成密码散列和验证。 PHP 5.3.7 及后续版本中都提供了一个 » 纯 PHP 的兼容库。

PHP 5.3 及后续版本中，还可以使用 crypt() 函数，它支持多种散列算法。针对每种受支持的散列算法，PHP 都提供了对应的原生实现，所以在使用此函数的时候，你需要保证所选的散列算法是你的系统所能够支持的。

当对密码进行散列处理的时候，建议采用 Blowfish 算法，这是密码散列 API 的默认算法。相比 MD5 或者 SHA1，这个算法提供了更高的计算量，同时还有具有良好的伸缩性。

如果使用 crypt() 函数来进行密码验证，那么你需要选择一种耗时恒定的字符串比较算法来避免时序攻击。（译注：就是说，字符串比较所消耗的时间恒定，不随输入数据的多少变化而变化） PHP 中的 == 和 === 操作符和 strcmp() 函数都不是耗时恒定的字符串比较，但是 password_verify() 可以帮你完成这项工作。我们鼓励你尽可能的使用原生密码散列 API。

“盐”是什么？

加解密领域中的“盐”是指在进行散列处理的过程中加入的一些数据，用来避免从已计算的散列值表（被称作“彩虹表”）中对比输出数据从而获取明文密码的风险。

简单而言，“盐”就是为了提高散列值被破解的难度而加入的少量数据。现在有很多在线服务都能够提供计算后的散列值以及其对应的原始输入的清单，并且数据量极其庞大。通过加“盐”就可以避免直接从清单中查找到对应明文的风险。

如果不提供“盐”， password_hash() 函数会随机生成“盐”。非常简单，行之有效。

我应该如何保存“盐”？

当使用 password_hash() 或者 crypt() 函数时， “盐”会被作为生成的散列值的一部分返回。你可以直接把完整的返回值存储到数据库中，因为这个返回值中已经包含了足够的信息，可以直接用在 password_verify() 或 crypt() 函数来进行密码验证。

下图展示了 crypt() 或 password_hash() 函数返回值的结构。如你所见，算法的信息以及“盐”都已经包含在返回值中，在后续的密码验证中将会用到这些信息。

password_hash 和 crypt 函数返回值的组成部分，依次为：所选择的算法，
算法选项，所使用的“盐”，
以及散列后的密码。

用户评论:

[#1] alf dot henrik at ascdevel dot com [2014-03-12 19:58:44]

I feel like I should comment some of the clams being posted as replies here.

For starters, speed IS an issue with MD5 in particular and also SHA1. I've written my own MD5 bruteforce application just for the fun of it, and using only my CPU I can easily check a hash against about 200mill. hash per second. The main reason for this speed is that you for most attempts can bypass 19 out of 64 steps in the algorithm. For longer input (> 16 characters) it won't apply, but I'm sure there's some ways around it.

If you search online you'll see people claiming to be able to check against billions of hashes per second using GPUs. I wouldn't be surprised if it's possible to reach 100 billion per second on a single computer alone these days, and it's only going to get worse. It would require a watt monster with 4 dual high-end GPUs or something, but still possible.

Here's why 100 billion per second is an issue:
Assume most passwords contain a selection of 96 characters. A password with 8 characters would then have 96^8 = 7,21389578984e+15 combinations.
With 100 billion per second it would then take 7,21389578984e+15 / 3600 = ~20 hours to figure out what it actually says. Keep in mind that you'll need to add the numbers for 1-7 characters as well. 20 hours is not a lot if you want to target a single user.

So on essence:
There's a reason why newer hash algorithms are specifically designed not to be easily implemented on GPUs.

Oh, and I can see there's someone mentioning MD5 and rainbow tables. If you read the numbers here, I hope you realize how incredibly stupid and useless rainbow tables have become in terms of MD5. Unless the input to MD5 is really huge, you're just not going to be able to compete with GPUs here. By the time a storage media is able to produce far beyond 3TB/s, the CPUs and GPUs will have reached much higher speeds.

As for SHA1, my belief is that it's about a third slower than MD5. I can't verify this myself, but it seems to be the case judging the numbers presented for MD5 and SHA1. The issue with speeds is basically very much the same here as well.

The moral here:
Please do as told. Don't every use MD5 and SHA1 for hasing passwords ever again. We all know passwords aren't going to be that long for most people, and that's a major disadvantage. Adding long salts will help for sure, but unless you want to add some hundred bytes of salt, there's going to be fast bruteforce applications out there ready to reverse engineer your passwords or your users' passwords.

[#2] loveun1x at yahooc dot om [2013-11-08 19:41:21]

The section "Why are common hashing functions such as md5() and sha1() unsuitable for passwords?" is completely wrong. No one says MD5 and SHA-1 aren't suitable for password hashing due to their speed. (MD5 is broken for digital signatures but not yet for password hashing, but not because it is fast.) A single round of them is unsuitable, but that is not a problem with the underlying algorithms themselves.

Always use a salt, preferably a big (e.g. at least 16 characters) salt. Always run through multiple rounds of of the hashing. Most implementations today use several tens of thousands or hundreds of thousands of rounds, for example 7-zip's encrypted format uses 262,144 rounds of SHA-256.

[#3] sgbeal at googlemail dot com [2012-09-19 13:21:38]

sha1 in conjunction with one or more salt values need not be as insecure as the above makes it out to be. e.g. the Fossil SCM creates an sha1 password hash based on a user's clear-text password combined with the user's name and a shared secret known only to the specific source repository for which the user is set up.

[#4] fluffy at beesbuzz dot biz [2012-06-11 23:33:49]

The security issue with simple hashing (md5 et al) isn't really the speed, so much as the fact that it's idempotent; two different people with the same password will have the same hash, and so if one person's hash is brute-forced, the other one will as well.  This facilitates rainbow attacks.  Simply slowing the hash down isn't a very useful tactic for improving security.  It doesn't matter how slow and cumbersome your hash algorithm is - as soon as someone has a weak password that's in a dictionary, EVERYONE with that weak password is vulnerable.

Also, hash algorithms such as md5 are for the purpose of generating a digest and checking if two things are probably the same as each other; they are not intended to be impossible to generate a collision for.  Even if an underlying password itself requires a lot of brute forcing to determine, that doesn't mean it will be impossible to find some other bit pattern that generates the same hash in a trivial amount of time.

As such: please, please, PLEASE only use salted hashes for password storage.  There is no reason to implement your own salted hash mechanism, either, as crypt() already does an excellent job of this.