Hashing

05/05/2025

What is Hashing?

Hashing is a cryptographic technique used to transform any input data—regardless of its size—into a fixed-size string of characters. This string, called the hash or digest, uniquely represents the original data in a condensed form. Hashing is a one-way function, meaning you cannot reverse a hash to retrieve the original input. This is a key distinction from encryption, which is two-way (encrypt/decrypt). Hash functions are deterministic: the same input will always produce the same output. But even the slightest change in input—adding a space or changing a letter—produces a completely different output, a property known as the avalanche effect.

How Does It Work?

A hash function processes input data (like a string, file, or even an entire disk image) using a mathematical algorithm. This algorithm digests the input into a fixed-length output. For example, the SHA-256 (Secure Hash Algorithm 256-bit) always produces a 256-bit (or 64-character hexadecimal) hash no matter how large or small the input is.

Let's consider the string "hello world". When hashed using SHA-256, it yields:

b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9

Now, let's change just one letter in the input to "hello World" (note the capital 'W'). The new SHA-256 hash becomes:

7f83b1657ff1fc53b92dc18148a1d65dfa135014e9e5b4cdbfbd9b5a7aadf30c

As you can see, even though only one character was changed, the hash value is completely different. This sensitivity ensures the integrity of data.

Properties of a Good Hash Function

  1. Deterministic – The same input always gives the same output.

  2. Fast to compute – Should quickly return the hash for any input.

  3. Irreversible – It should be computationally infeasible to reverse the hash and retrieve the original input.

  4. Collision-resistant – It should be infeasible for two different inputs to produce the same hash.

  5. Avalanche effect – A small change in input results in a completely different hash.

 

Real-Life Example: Password Hashing

When a user creates a password, websites do not store the actual password. Instead, they use a hash function to transform the password into a hash and store that in the database. For instance, if the password is "password123", a SHA-256 hash of that would be:

ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f

When you later log in, the system hashes the password you provide again and compares it to the stored hash. If they match, access is granted. Even if an attacker gains access to the database, they only see the hashes—not the actual passwords.

To improve security further, salting is used. This means a random string (the salt) is added to the password before hashing, making it unique even if two users have the same password. This also defends against precomputed attacks like rainbow tables

Other Applications of Hashing

  • Data Integrity Verification: Hashes are used in file verification. For example, when downloading software, you might see a "SHA-256 checksum" on the website. After downloading, you can hash the file and compare your result to the published hash. If they match, the file hasn't been tampered with.

  • Digital Signatures: Hashing is integral to digital signatures. Before signing a document, it's first hashed, and only the hash is signed (not the full document), ensuring efficiency and integrity.

  • Blockchain and Cryptocurrencies: In systems like Bitcoin, each block's header is hashed to produce a unique ID. These hashes link blocks together and protect against tampering.

  • Hash Tables: In programming, hash functions are used in data structures like hash tables or dictionaries to quickly find a value associated with a given key.

Hashing is a core concept in computer science and cybersecurity. It enables systems to verify data integrity, securely store passwords, and much more. A good hash function is irreversible, consistent, fast, and resistant to collisions. Though it may seem simple on the surface, hashing underpins many technologies that we rely on every day.

🔐 What Is Salting?

Salting is the process of adding a unique, random string (the "salt") to the input (usually a password) before hashing it. This makes the hash output unique even if two users have the same password.

Without a salt, identical passwords always produce identical hashes — making them vulnerable to attacks like rainbow tables, which are precomputed lists of hashes for common passwords.

🔁 Example Without Salt

Let's say two users have the same password:

password123

Using SHA-256:

ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f

Both users get the same hash, so an attacker just has to crack it once.

🧂 Example With Salt

Now add a unique salt to each password before hashing.

  • Salt 1: "a7B3f2"

  • Salt 2: "zX9@L1"

User 1 input:

a7B3f2password123

Hash:

f8dc1f5dbdfd30b9d3f24ad2d7029e70f68b8a5efc34eac285edb9d6d3e1fc6a

User 2 input:

zX9@L1password123
 

Hash:

3dc0a7edb3f90ec2ae12aa7c1e01847f2c9f218f1864d1cd3c4f3c9263c7c7f1

Now both hashes are different, even though the original passwords were the same.

🧑‍💻 Code Example in Python (with Salt) 

import hashlib

import os

# Generate a random salt

salt = os.urandom(16) # 16 bytes of randomness

# Original password

password = "password123"

# Combine salt and password

salted_password = salt + password.encode()

# Hash it using SHA-256

hashed = hashlib.sha256(salted_password).hexdigest()

# Store both the salt and hash

print("Salt (hex):", salt.hex())

print("Hash:", hashed)


🔁 Verifying Passwords

When a user logs in:

  1. Retrieve the stored salt and hash from the database.

  2. Combine the same salt with the input password.

  3. Hash it again.

  4. Compare the result to the stored hash.

✅ Benefits of Salting

  • Prevents attackers from using precomputed hash databases (rainbow tables).

  • Ensures that even identical passwords have different hashes.

  • Increases entropy and randomness, especially when salts are long and unique.

🔐 Tip: Use Per-User Salts

Always generate a unique salt for each user. Do not use a fixed or global salt, as that only provides limited protection.

🔄 Types of Hash Functions

1. MD5 (Message Digest Algorithm 5)

  • Output: 128 bits (32 hex characters)

  • Speed: Very fast

  • Security: Broken — vulnerable to collision attacks

  • Use Cases: Legacy checksums, file verification (non-secure)

  • Fun fact: You can find different files with the exact same MD5 hash.

2. SHA-1 (Secure Hash Algorithm 1)

  • Output: 160 bits (40 hex characters)

  • Speed: Fast

  • Security: Weak — Google broke it in 2017 via a collision

  • Use Cases: Older digital certificates, version control (e.g., Git uses SHA-1 internally but plans to move away)

3. SHA-256 (part of SHA-2 family)

  • Output: 256 bits (64 hex characters)

  • Speed: Medium

  • Security: Strong for now

  • Use Cases: Password hashing (with salt), blockchain (Bitcoin), TLS certificates

4. SHA-3 (Keccak)

  • Output: Flexible (SHA3-256, SHA3-512, etc.)

  • Speed: Slower than SHA-2 but more flexible

  • Security: Very strong, new design (not Merkle–Damgård like SHA-1/2)

  • Use Cases: Cryptography, future-proof applications

5. BLAKE2 / BLAKE3

  • Output: Customizable (e.g., 256 bits)

  • Speed: Extremely fast (faster than SHA-2/SHA-3)

  • Security: Very strong

  • Use Cases: Modern applications, password storage, digital signatures

  • Fun fact: BLAKE2 is faster and more secure than MD5 and SHA-1.

6. bcrypt

  • Output: 192 bits (uses a Base64-like format)

  • Speed: Intentionally slow

  • Security: Strong — includes built-in salting and key stretching

  • Use Cases: Password hashing

  • Bonus: Slowness is a feature — it thwarts brute-force attacks.

7. scrypt

  • Output: Configurable

  • Speed: Very slow, also memory-intensive

  • Security: Very strong — resists ASIC/GPU attacks

  • Use Cases: Password hashing (especially in cryptocurrencies like Litecoin)

8. Argon2 (winner of Password Hashing Competition)

  • Output: Configurable

  • Speed: Tunable (time, memory, and parallelism)

  • Security: State-of-the-art

  • Use Cases: Password hashing, key derivation

  • Variants:

    • Argon2i: optimized for password hashing

    • Argon2d: optimized for resistance to GPU attacks

    • Argon2id: hybrid of both (most recommended)

🚀 Creative Use Case Ideas

  • File de-duplication system? Use BLAKE3 for lightning-fast content comparison.

  • Password manager backend? Use Argon2id with high memory cost for each stored password.

  • Lightweight device like IoT sensors? Consider BLAKE2 for speed/security balance.

  • Blockchain design experiment? SHA-256 is standard, but SHA-3 or BLAKE3 are modern alternatives.

Multiple Choice Questions (MCQs) on Hashing

🔐 Basic Hashing Concepts

1. What is the main purpose of a hash function?

A) To encrypt data for secure transmission
B) To generate random numbers
C) To map data of arbitrary size to a fixed size
D) To compress data for storage

Correct Answer: C
Explanation: A hash function converts data of arbitrary length into a fixed-length output.

2. Which of the following is not a desirable property of a cryptographic hash function?

A) Determinism
B) Collision resistance
C) Preimage resistance
D) Reversibility

Correct Answer: D
Explanation: Hash functions should be irreversible — you shouldn't be able to retrieve the original input from the hash.

3. What does "collision resistance" mean in hashing?

A) Two different inputs always produce the same output
B) It's hard to find two inputs that produce the same hash
C) Hashes are encrypted before storage
D) The hash function requires a secret key

Correct Answer: B
Explanation: Collision resistance means it's computationally difficult to find two different inputs with the same hash.

4. Which of the following is a non-cryptographic use of a hash function?

A) Password verification
B) Digital signatures
C) Hash tables for quick lookup
D) Blockchain integrity

Correct Answer: C
Explanation: Hash functions are used in data structures like hash tables for efficient lookups — not necessarily requiring cryptographic strength.

5. Which hash function is considered broken and should not be used for cryptographic purposes?

A) SHA-256
B) SHA-3
C) MD5
D) BLAKE3

Correct Answer: C
Explanation: MD5 is vulnerable to collision attacks and should be avoided in secure applications.

🔒 Hashing and Security

6. What is the role of a salt in password hashing?

A) To increase hash length
B) To prevent precomputed attacks like rainbow tables
C) To make the hash reversible
D) To improve encryption strength

Correct Answer: B
Explanation: A salt makes each hash unique, preventing attackers from using rainbow tables to reverse hashes.

7. Which hashing algorithm is most recommended for securely storing passwords today?

A) SHA-1
B) bcrypt
C) MD5
D) CRC32

Correct Answer: B
Explanation: bcrypt is a slow, salted hash designed specifically for password security.

8. In which of the following scenarios is a fast hash function like SHA-256 not recommended?

A) File integrity verification
B) Password storage
C) Digital signatures
D) Blockchain mining

Correct Answer: B
Explanation: Fast hash functions make password cracking easier. Use slow hashes like bcrypt or Argon2 for password storage.

9. Which of the following hash algorithms was designed specifically to be memory-hard and slow to defend against brute-force attacks?

A) SHA-256
B) MD5
C) scrypt
D) HMAC

Correct Answer: C
Explanation: scrypt is memory-intensive and slows down brute-force and hardware-accelerated attacks.

10. What is a "preimage attack" in the context of hashing?

A) Finding the original input given its hash
B) Finding two inputs that hash to the same value
C) Modifying a hash without changing the input
D) Encrypting a message using a public key

Correct Answer: A
Explanation: A preimage attack attempts to find the original input from a known hash — something secure hash functions are designed to prevent.