Tuesday, September 15, 2009

SHA1 Digest of an Empty String

The use of SHA1 digests of data as unique identifiers is the answer to everything.

I know that appears to be ridiculous hyperbole, but I'll go into why I think that is the case in a longer post at some point. For now you'll have to take my word for it that using them as identifiers has simplified the inner workings of a big project tremendously. It's not just me either, the Software revision control system Git uses them as IDs for all its commits.

Anyway... one issue that can arise with them is that you notice that one specific digest string occurs more frequently than others in this uniformly distributed hash space... yikes... what's going on?

That digest would be da39a3ee5e6b4b0d3255bfef95601890afd80709 - make a note of it!

It represents the SHA1 digest of an Empty String

You might want to consider adding an explicit test for that string somewhere in your code.

Ideally you want a direct test for an empty string being used as an ID, but checking for this digest is a useful catch-all test.

Once I realized that this digest was being generated, and realized what it represented, then I could focus in on the root cause very quickly.