You use hashing on a daily basis while surfing online but may not know it! Let’s explore how a cryptographic hash works in layman’s terms
Cryptography includes all the tools and techniques that help to protect the data in transit and at rest. A hash function in cryptography is one of the big players, yet many people don’t know what it is or what it does.
Frankly, hashing is everywhere. For example, do you know your passwords are frequently stored on prominent websites with hashing? In fact, the fingerprint locks on our phones and laptops also utilize hashing technology!
So, let’s explore what a hash function in cryptography is and why it’s important. We’ll cover how hashing works, what a cryptographic hash function does (and doesn’t do), and how to make hashing more secure in terms of password storage.
What Is Hashing? The Definition of a Hash Function in Cryptography
If you buy a new phone and its shrink wrap is torn off or damaged, you can immediately tell that someone has opened, used, replaced, or damaged the phone. A cryptographic hash in encryption is much the same but for data instead of a physical object. In the same way, hashing is like putting virtual shrink wrap on a piece of software, application, or data to inform users if it has been modified in any way.
But what is hashing? Hashing, or a hashing algorithm, is a one-way process that converts your input data of any size into fixed-length enciphered data. At the center of the process is where you’ll find the hash function. Basically, you can take either a short sentence or an entire stream of data, run it through a hash function, and wind up with a string of data of a specific length. It’s a way to hide your original data to make it as challenging as possible to reverse engineer.
In a more technical sense, it’s a technique that uses a mathematical operation to shrink a random quantity of input data (called a hash key) into a fixed-length string of bits in a way that’s too impractical to reverse with modern computers. So, the definition of a hash function would be something that takes input data and uses it to create a fixed-length output value that’s unique and virtually irreversible (for all practical intents and purposes).
The output values returned by a hash function are called by a few different names:
- Hash values,
- Hash codes, or
For every input, you get a unique hash output. Once you create a hash, the only way to get the same exact hash is to input the same text. If you change even just one character, the hash value will change as well. We’ll talk more about that shortly.
Hashing vs Encryption — Aren’t They the Same?
In a word? No. Hashing and encryption are two separate cryptographic processes. Encryption is something you can use to convert plaintext (readable) data into something indecipherable using algorithms and a key. However, you can decrypt that data either by using the same (symmetric encryption) or a mathematically-different-but-related cryptographic key (asymmetric encryption).
A cryptographic hash function is different. Once you hash data, you can’t restore it to its original format because it’s a one-way process.
But what do hash functions look like and how do they work? Let’s address the first part of the question and then we can get to the other part a little later.
Examples of Cryptographic Hash Functions
Here’s a simplified illustration to show you what we mean:
The length of the output or hash depends on the hashing algorithm you use. Hash values can be 160 bits for SHA-1 hashes, or 256 bits, 384 bits, or 512 bits for the SHA-2 family of hashes. They’re typically displayed in hexadecimal characters. The input data’s quantity and size can be varied, but the output value always remains the same in terms of size.
For example, let’s consider the following hash inputs and outputs:
|Example Input Texts||Hash Values Using SHA-1|
|Hello! You are reading an article about the cryptographic hash function!||B26BACAB73C46D844CABEC26CE32B030FED1164F|
In this example, you can see that the hash value’s length remains the same whether the input value is just a small word or a complete sentence. (For example, a 160-bit hash value has 40 hexadecimal characters, whereas a 256-bit hash digest has 64 hex characters.) So, even if I hash one of the Harry Potter books — or the entire series of them — using the same algorithm, the hash values’ lengths would remain the same!
Hash functions are used in several different ways. However, throughout this article, we’re going to focus mainly on a few of the ways that they’re useful:
- Ensuring data integrity,
- Creating and verify digital signatures (which are encrypted hashes), and
- Facilitating secure password storage.
The Types of Cryptographic Hash Algorithms
There are many cryptographic hash algorithms out there that businesses and organizations use (although some are now sunset due to theoretical or practical vulnerabilities). Some of the most popular hashing algorithms include:
- The SHA family (SHA-1, SHA-2 [including SHA-256 and SHA-512], and SHA-3)
- The MD family (MD)
- NTLM, and
- LanMan (LM hash).
Now, not all of these are considered secure algorithms for every type of application or purpose. Some hash functions are fast, while others are slow. When it comes to using cryptographic hash functions for password hashing, for example, you’ll want to use a slow hash function rather than a fast one (the slower the better).
Cryptographic Hash Properties
So, what properties make up a strong cryptographic hash function?
- Determinism — Regardless of the size of the input or the key value, the operation should always result in the same consistent length output or hash value.
- Computational Speed — The speed of a hash function is important and should vary based on how it’s being used. For example, in some cases, you need a fast hash function, whereas in others it’s better to use a slow hash function.
- Image Resistance — Hashes should be extremely impractical to reverse (i.e., it should serve as a one-way function for all intents and purposes). The hash function should be so difficult and make the data so obscure that it would be improbable for someone to reverse engineer the hash to determine its original key value. Even one tiny change to the original input should result in an entirely different hash value.
Characteristics of a Hash Function in Cryptography
These are the two prominent qualities of cryptographic hash functions.
1) A Hash Function Is Practically Irreversible
Hashing is often considered a type of one-way function. That’s because it’s highly infeasible (technically possible, though) to reverse it because of the amount of time and computational resources that would be involved in doing so. That means you can’t figure out the original data based on the hash value without an impractical amount of resources at your disposal.
In other words, if the hash function is h, and the input value is x, the hash value will be h(x). If you have access to h(x) and know the value of hash function h, it’s (almost) impossible to figure out the value of x.
2) Hash Values Are Unique
No two different input data should (ideally) generate the same hash value. If they do match, it causes what’s known as a collision, which means the algorithm isn’t safe to use and is vulnerable to what are known as birthday attacks. Collision resistance is something that improves the strength of your hash and helps to keep the data more secure. That’s because a cybercriminal would not only have to crack not only the hash value but the salt value, too.
So, if the hash function is h, and there are two different input data sets x and y, the hash value of h(x) should always be different than h(y). Hence, h(x) ≠ h(y). What this means is that if you make the slightest change in the original data, it’s hash value changes. Hence, no data tampering goes unnoticed.
How Does Hashing Work?
Now that we know what a hash function is in cryptography, let’s break down how it works.
First of all, the hashing algorithm divides the large input data into blocks of equal size. The algorithm then applies the hashing process to each data block separately.
Although one block is hashed individually, all of the blocks are interrelated. The hash value of the first data block is considered an input value and is added to the second data block. In the same way, the hashed output of the second block is lumped with the third block, and the combined input value is hashed again. And so on and so on, the cycle continues until you get the final has output, which is the combined value of all the blocks that were involved.
That means if any block’s data is tampered with, its hash value changes. And because its hash value is fed as an input into the blocks that follow, all of the hash values alter. This is how even the smallest change in the input data is detectable as it changes the entire hash value.
In the graphic, the input value of data block-1 is (B1), and the hash value is h(B1). The next block 2’s input value B2 is combined with the previous hash value h(B1) to produce the hash value h(B2). This process of combining one block’s output value with the next block’s input value continues down the line through all of the blocks.
3 Main Features of a Hash Function in Cryptography
In this next section, let’s explore what hashing does and doesn’t do in cryptography.
1) It Enables Users to Identify Whether Data Has Been Tampered With
When generated using a unique and random number, all hash values are different. So, if an attacker tries to modify, alter, or remove any part of the original input data (text data, software, application, email content, and even the media file), its hash value changes. As soon as the hash value changes, the users are notified about it. Users will immediately know that a message’s content or a software application is not in the same condition as it was sent or created by the original sender/developer.
Hence, if a hacker inserts malicious code into a software program, for example, the user gets a warning not to download or install it because it’s been altered. Likewise, if an attacker changes the content of an email to trick recipients into sharing their confidential information, transfer funds, or download a malicious attachment, users will know that the message was modified. Therefore, they should not take any actions suggested in the message.
2) A Hash Function Prevents Your Data from Being Reverse Engineered
Once you apply a hash function to data, you’re left with an incomprehensible output. So, even if an intruder manages to get their hands on the data’s hashed values through a leaky database or by stealing it through a cyber attack, they can’t easily interpret or guess the original (input) data.
Because the hash value can’t be reversed easily using modern resources, it’s improbably for hackers to decipher the hash value even if they know which hash function (algorithm) has been used to hash the data. It’s just infeasible due to the amount of resources and time such a process would require at scale. Hence, cryptographic hash serves as a means of data protection while data is traveling or at-rest.
3) You Can’t Retrieve the Data Because It Doesn’t Exist
Because the hashing has non-reversible nature, you can’t retrieve the original data from the hashed value. Now, this is a good thing when your intention is to keep hackers from accessing your plaintext data. But when you’re the one who needs to recover the data for some reason, a hash function in cryptography can be an issue.
For example, with regard to password storage, if you have hashed the passwords to store it, you can’t recover it if you or users forget it. The only option you or your users have at your disposal is to reset the password. At the same time, you just send the hash value of a file, the recipient can know the integrity of it, but they can’t actually convert the hash value into plaintext. For that, you need to send the encrypted version of the file along with its hash value.
Applications of Cryptographic Hash Functions
All the above things about the cryptographic hash function are theoretical. But what’s its practical utility? A hash function in cryptography is used to map data integrity. Hashing protects data from leakage, compares the large chunks of data, and detects the data tampering, if any.
Some of the uses of hashing include:
- Digital signatures,
- Password storage,
- SSL/TLS certificates,
- Code signing certificates,
- Document signing certificates, and
- Email signing certificates.
When you have to compare a large piece of data or software, you can’t check each code and word of it. But if you hash it, it converts big data into small, fixed-length hash values, which you can check and compare a lot more easily.
How Hashing Works in Code Signing
Let’s take a few moments to understand how code signing certificates utilize the cryptographic hash function. Say, you’re a software publisher or developer who uses code signing certificates to digitally sign your downloadable software, scripts, applications, and executables. This certificate enables you to assure your users, clients, and their operating systems about your identity (i.e., that you’re you) and that your product is legitimate. It also uses a hash function that warns them if it’s been tampered with since you originally signed it.
Once you have a final version of your code ready to go, you can put the code signing certificate to work. This means that the code signing certificate hashes the entire software, and the hash gets encrypted, which creates the publisher or developer’s digital signature.
So, when a user downloads your software, their OS generates a hash value to see whether it matches the original hash value of your software. If it does, that’s great and means they can proceed safely with that knowledge in mind. But if someone tries to pull a sleight of hand and change your software or your digital signature, the hash value they generate will no longer match your original hash, and the user will be notified about the compromise.
This means that an unmodified hash value vouches for the integrity of your software. So, in this case, the cryptographic hash function ensures that no one can modify your software without someone noticing.
How Hashing Works in Password Storage
If you’re a business or organization that allows your users to store their passwords on your site, then this next part is especially important for you. When a user stores their password on your site (i.e., on your server), there’s a process that takes place that applies a hash function to their plaintext password (hash input). This creates a hash digest that your server stores within its password list or database.
There isn’t a list of your users’ original plaintext passwords anywhere on your server that your employees (or any cybercriminals) could get their hands on. The hashing process takes place within the server, and there’s no “original file” of plaintext data for them to exploit.
This is different from encryption, which involves the use of an encryption key to encrypt data and a decryption key that can decrypt it. Remember, with hashing, the goal is for the data to not be reverted to its original plaintext format (i.e., to only be a one-way function). With encryption, on the other hand, the goal is for the encrypted data to be decryptable with the right key (i.e., a two-way function).
However, that doesn’t necessarily mean that passwords are entirely secure (even when hashed). This is where something called salting comes into play — which we’ll talk about shortly. But first, let’s consider an example of how hashing works.
How Does Hashing Work? A Hypothetical Example
Alice is a vendor whose business supplies stationery to Bob’s office on credit. After a month, she sends Bob an invoice with an inventory list, billing amount, and her bank account details. She applies her digital signature to the document and hashes it before sending it to Bob. However, Todd, a hacker, intercepts the document while it’s in transit and replaces Alice’s bank account details with his.
Upon receiving the letter, Bob’s computer calculates the hash value of the document and notices that it’s different than the original hash value. Bob’s computer immediately notifies him that there’s something fishy about the document and that it’s not trustworthy.
Without a hashed document, Bob would have easily trusted the document’s content because he knew Alice and the transaction details in the document were legitimate. But because the hash values didn’t match, Bob became aware of the alteration. Now, he contacts Alice by phone and shares with her the information in the document he received. Alice confirms that her bank account is different than what is written in the document.
This is how a hashing function saves Alice and Bob from financial fraud. Now, imagine this scenario with your own business and how it could help to prevent you and your customers from becoming the victims of this type of cybercrime.
What Is Salting & Why Do You Use It with Password Hash Functions?
Salting means adding randomly generated characters to the input values before hashing them. It’s a technique that’s used in password hashing. It makes the hashing values unique and more difficult to crack. But why does it matter?
Suppose Bob and Alice have the same password (“Sunshine”) for a social media site. The site is using SHA-2 to store the passwords. Because the input value is same, their hash values are going to be the same “8BB0CF6EB9B17D0F7D22B456F121257DC1254E1F01665370476383EA776DF414.”
Now, let’s suppose a hacker manages to discover Bob’s password (input value) using malware, brute force attacks, or by using other advanced hash cracking tools. They can bypass the authentication mechanism of all other accounts that have the same password “Sunshine.” They just need to see the table of hash values and find the user IDs having the same hash value in their password column.
This is where salting comes in handy. Here, some random alphanumeric characters are added to the input values. So, suppose the salt “ABC123” is added to Bob’s password, and “ABC567” is added to Alice’s password. When the system stores the password, it stores the hash value for the inputs “SunshineABC123” and “SunshineABC567”. Now, even if both the original passwords are the same, their hash values are different because of the salts that were added. And the hacker can’t access Alice’s account even if they have managed to steal Bob’s password.
This is the big difference between encryption and hashing. While encryption is also a process that converts plaintext data into incomprehensible format using a key, you can use the same or another key to decrypt it. With hashing, on the other hand, it uses a hash function to map your input data to a fixed-length output. This is something that you can’t restore because it essentially serves as a one-way process.
Hash Function Weaknesses
Just like other technologies and processes, hash functions in cryptography aren’t perfect, either. There are a few key issues that are worth mentioning.
- In the past, there were incidences where popular algorithms like MD5 and SHA-1 were producing the same hash value for different data. Hence, the quality of collision-resistance was compromised.
- There is a technology named “rainbow tables” that hackers use to try to crack unsalted hash values. This is why salting before hashing is so crucial to secure password storage.
- There are some software services and hardware tools (called “hash cracking rigs”) that attackers, security researchers, or even government agencies use to crack the hashed passwords.
- Some types of brute force attacks can crack the hashed data.
Wrapping Up on the Topic of a Hash Function in Cryptography
Hashing is, indeed, a very helpful cryptographical tool for verification (verifying digital signatures, file or data integrity, passwords, etc.) in information technology. Cryptographic hash functions vary in terms of functionalities and applications for specific purposes. And a big part of using hashing involves understanding which hashing algorithms to use (or avoid) in specific contexts.
While not perfect, cryptographic hash functions serve as great checksums and authentication mechanisms. As a method through which to securely store passwords (when a salting technique is applied) in a way that’s just too impractical for cybercriminals to try to invert into something usable.