Tech Q&A for the non-tech product manager — Hash function

Duy KN
4 min readApr 23, 2022

I’m starting this series for my colleagues and friends, who is non-tech product manager/owner. This is a kind of note of my daily working experience from my point of view. Please point out my mistake in writing or knowledge if any.

The first topic is about the hash function and how it is used in practice.

From my POV, the hash function commonly can be used as the “fingerprint” of a document. The hash function has three important attributes:

  • Same output for same input: A specific document has one and only one hashed output value of a hash function.
  • You can not “translate” or “convert” the hash value back to the input document
  • In most cases, 99.9999% there is no same output value for two different input documents. It’s actually not 100% guaranteed which aka hash collision.

Some hash functions wisely used are MD5, and SHA. There are some websites that give tools to compute hash: https://www.browserling.com/tools/all-hashes, https://www.md5hashgenerator.com/

Q: In practice, for my product, when the hash function is used?

Actually, as a product manager or product owner, you rarely acknowledge when and where the hash function is used.

IMO, the most explicit way of the hash function is how the password of the user is stored in your system. For security reasons, user passwords should not be stored in plain text. For example, my password is “12345678”, but it must not be stored in the database as “12345678”. It can be taken advantage of and harmful to users.

Considering we choose MD5 as hash function, so the thing is store now is “25d55ad283aa400af464c76d713c07ad”. Now the system can check if the user input password is corrected by using the same hash function and also keep the user password safe. So if a user asks you that “what is my password?”, the answer now is “I don’t know, it is a secret”.

In the second use case, the hash value is can be used as a “shield” to check if a document has been unauthorizedly alternated. Any change in the document will cause the hash value to be changed as a consequence. This is a.k.a “data integrity

In practice, the tech team can use it in some situations:

  • Verify if data transferred between client and server are safe — does not be changed on the fly.
  • Check if a file is changed or corrupted. It can be found in some cloud storage services or P2P file services. Users should check downloaded file checksum is matched with the original checksum provided on the website to make sure you are not downloading the “hacked” file.
  • Trigger some action if data changes, for example checking if the content has been updated that is different from the various version.

Q: Is the hash function also an “encrypt” function?

Actually not. The hash function is not designed for encrypting data. As I say, you can not “translate” or “convert” the hash value back to the input document.

Different from encrypting function, the original data can be retrieved if you use the right encrypt method and the right key.

Q: So the hash function is a kind of checksum?

Yes and no.

Yes, basically checksum is a kind of effort to check if data has been changed. So you can use the hash function as a checksum function.

However, IMO, for non-sensitive checksum purposes, you can consider an easier way e.g parity check.

In common, MD5 and SHA1 are also wisely used as a checksum method.

And No, as you see, the hash function can be used for other purposes, not only for checksum.

Q: In short, in which situation we can consider the hash value?

There are two things:

  • Unique value
  • Something changes

Q: 99.9999% means that it can happen?

Actually, yes it is. Two documents can have the same hash value. But you should not care much about it. Technically we call it “hash collision”.

Depending on which hash function you use, the probability can be different. However, it is small enough for us to consider “impossible”.

Q: What are the differences among hash functions?

As a product manager, you do not need to care much about it. MD5, SHA-1, and SHA-2 are commonly used.

To better collision-resistant, SHA-2 is preferred.

Q: Any notes?

Yes, an important note about security. In theory, there is no way to revert the hash value to the original data, but in practice, it is possible.

For example, we use MD5 to hash “1234”. So the output hash value is always “81dc9bdb52d04dc20036dbd8313ed055"

If we see “81dc9bdb52d04dc20036dbd8313ed055”, we know for sure the input is “1234”. Practically, programmers can use some other tactics to avoid this issue, such as the Salting technique.

You can explore this online tool which tries to revert the MD5 hash

Feel free to query me if you have any feedback or question.

--

--