Birthday attack

A birthday attack is a bruteforce collision attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likelihood of collisions found between random attack attempts and a fixed degree of permutations (pigeonholes). Let ${\textstyle H}$ be the number of possible values of a hash function, with ${\textstyle H=2^{l}}$ . With a birthday attack, it is possible to find a collision of a hash function with ${\textstyle 50\%}$ chance in ${\textstyle {\sqrt {2^{l}}}=2^{l/2},}$ where ${\textstyle l}$ is the bit length of the hash output,^[1]^[2] and with ${\textstyle 2^{l-1}}$ being the classical

preimage resistance security with the same probability.^[2] There is a general (though disputed^[3]) result

that quantum computers can perform birthday attacks, thus breaking collision resistance, in

{\textstyle {\sqrt[{3}]{2^{l}}}=2^{l/3}}

.[4]

Although there are some digital signature vulnerabilities associated with the birthday attack, it cannot be used to break an encryption scheme any faster than a brute-force attack.^[5]^: 36

Understanding the problem

As an example, consider the scenario in which a teacher with a class of 30 students (n = 30) asks for everybody's birthday (for simplicity, ignore leap years) to determine whether any two students have the same birthday (corresponding to a hash collision as described further). Intuitively, this chance may seem small. Counter-intuitively, the probability that at least one student has the same birthday as any other student on any day is around 70% (for n = 30), from the formula $1-{\frac {365!}{(365-n)!\cdot 365^{n}}}$ .^[6]

If the teacher had picked a specific day (say, 16 September), then the chance that at least one student was born on that specific day is $1-(364/365)^{30}$ , about 7.9%.

In a birthday attack, the attacker prepares many different variants of benign and malicious contracts, each having a

SHA-256

hash. The pair found is indicated in green – note that finding a pair of benign contracts (blue) or a pair of malicious contracts (red) is useless. After the victim accepts the benign contract, the attacker substitutes it with the malicious one and claims the victim signed it, as proven by the digital signature.

Relation to the balls into bins problem

The birthday attack can be modelled as a variation of the balls into bins problem, where balls (hash function inputs) are randomly placed into bins (hash function outputs). A hash collision occurs when at least two balls are placed into the same bin.

Mathematics

Given a function $f$ , the goal of the attack is to find two different inputs $x_{1},x_{2}$ such that $f(x_{1})=f(x_{2})$ . Such a pair $x_{1},x_{2}$ is called a collision. The method used to find a collision is simply to evaluate the function $f$ for different input values that may be chosen randomly or pseudorandomly until the same result is found more than once. Because of the birthday problem, this method can be rather efficient. Specifically, if a function $f(x)$ yields any of $H$ different outputs with equal probability and $H$ is sufficiently large, then we expect to obtain a pair of different arguments $x_{1}$ and $x_{2}$ with $f(x_{1})=f(x_{2})$ after evaluating the function for about $1.25{\sqrt {H}}$ different arguments on average.

We consider the following experiment. From a set of H values we choose n values uniformly at random thereby allowing repetitions. Let p(n; H) be the probability that during this experiment at least one value is chosen more than once. This probability can be approximated as

p(n;H)\approx 1-e^{-n(n-1)/(2H)}\approx 1-e^{-n^{2}/(2H)}

^[7]

where $n$ is the number of chosen values (inputs) and $H$ is the number of possible outcomes (possible hash outputs).

Let n(p; H) be the smallest number of values we have to choose, such that the probability for finding a collision is at least p. By inverting this expression above, we find the following approximation

n(p;H)\approx {\sqrt {2H\ln {\frac {1}{1-p}}}}

and assigning a 0.5 probability of collision we arrive at

n(0.5;H)\approx 1.1774{\sqrt {H}}

Let Q(H) be the expected number of values we have to choose before finding the first collision. This number can be approximated by

Q(H)\approx {\sqrt {{\frac {\pi }{2}}H}}

As an example, if a 64-bit hash is used, there are approximately 1.8×10¹⁹ different outputs. If these are all equally probable (the best case), then it would take 'only' approximately 5 billion attempts (5.38×10⁹) to generate a collision using brute force.^[8] This value is called birthday bound^[9] and it could be approximated as 2^l/2, where l is the number of bits in H.^[10] Other examples are as follows:

Bits	Possible outputs (H)	Desired probability of random collision (2 s.f.) (p)
Bits	Possible outputs (H)	10⁻¹⁸	10⁻¹⁵	10⁻¹²	10⁻⁹	10⁻⁶	0.1%	1%	25%	50%	75%
16	2¹⁶ (~6.5 x 10⁴)	<2	<2	<2	<2	<2	11	36	190	300	430
32	2³² (~4.3×10⁹)	<2	<2	<2	3	93	2900	9300	50,000	77,000	110,000
64	2⁶⁴ (~1.8×10¹⁹)	6	190	6100	190,000	6,100,000	1.9×10⁸	6.1×10⁸	3.3×10⁹	5.1×10⁹	7.2×10⁹
96	2⁹⁶ (~7.9×10²⁸)	4.0×10⁵	1.3×10⁷	4.0×10⁸	1.3×10¹⁰	4.0×10¹¹	1.3×10¹³	4.0×10¹³	2.1×10¹⁴	3.3×10¹⁴	4.7×10¹⁴
128	2¹²⁸ (~3.4×10³⁸)	2.6×10¹⁰	8.2×10¹¹	2.6×10¹³	8.2×10¹⁴	2.6×10¹⁶	8.3×10¹⁷	2.6×10¹⁸	1.4×10¹⁹	2.2×10¹⁹	3.1×10¹⁹
192	2¹⁹² (~6.3×10⁵⁷)	1.1×10²⁰	3.7×10²¹	1.1×10²³	3.5×10²⁴	1.1×10²⁶	3.5×10²⁷	1.1×10²⁸	6.0×10²⁸	9.3×10²⁸	1.3×10²⁹
256	2²⁵⁶ (~1.2×10⁷⁷)	4.8×10²⁹	1.5×10³¹	4.8×10³²	1.5×10³⁴	4.8×10³⁵	1.5×10³⁷	4.8×10³⁷	2.6×10³⁸	4.0×10³⁸	5.7×10³⁸
384	2³⁸⁴ (~3.9×10¹¹⁵)	8.9×10⁴⁸	2.8×10⁵⁰	8.9×10⁵¹	2.8×10⁵³	8.9×10⁵⁴	2.8×10⁵⁶	8.9×10⁵⁶	4.8×10⁵⁷	7.4×10⁵⁷	1.0×10⁵⁸
512	2⁵¹² (~1.3×10¹⁵⁴)	1.6×10⁶⁸	5.2×10⁶⁹	1.6×10⁷¹	5.2×10⁷²	1.6×10⁷⁴	5.2×10⁷⁵	1.6×10⁷⁶	8.8×10⁷⁶	1.4×10⁷⁷	1.9×10⁷⁷

Table shows number of hashes n(p) needed to achieve the given probability of success, assuming all hashes are equally likely. For comparison, 10⁻¹⁸ to 10⁻¹⁵ is the uncorrectable bit error rate of a typical hard disk.^[11] In theory, MD5 hashes or UUIDs, being roughly 128 bits, should stay within that range until about 820 billion documents, even if its possible outputs are many more.

It is easy to see that if the outputs of the function are distributed unevenly, then a collision could be found even faster. The notion of 'balance' of a hash function quantifies the resistance of the function to birthday attacks (exploiting uneven key distribution.) However, determining the balance of a hash function will typically require all possible inputs to be calculated and thus is infeasible for popular hash functions such as the MD and SHA families.^[12] The subexpression $\ln {\frac {1}{1-p}}$ in the equation for $n(p;H)$ is not computed accurately for small $p$ when directly translated into common programming languages as log(1/(1-p)) due to

loss of significance. When log1p is available (as it is in C99) for example, the equivalent expression -log1p(-p) should be used instead.^[13]

If this is not done, the first column of the above table is computed as zero, and several items in the second column do not have even one correct significant digit.

Simple approximation

A good rule of thumb which can be used for mental calculation is the relation

p(n)\approx {n^{2} \over 2H}

which can also be written as

H\approx {n^{2} \over 2p(n)}

.

or

n\approx {\sqrt {2H\times p(n)}}

.

This works well for probabilities less than or equal to 0.5.

This approximation scheme is especially easy to use when working with exponents. For instance, suppose you are building 32-bit hashes ( $H=2^{32}$ ) and want the chance of a collision to be at most one in a million ( $p\approx 2^{-20}$ ), how many documents could we have at the most?

n\approx {\sqrt {2\times 2^{32}\times 2^{-20}}}={\sqrt {2^{1+32-20}}}={\sqrt {2^{13}}}=2^{6.5}\approx 90.5

which is close to the correct answer of 93.

Digital signature susceptibility

Digital signatures can be susceptible to a birthday attack or more precisely a chosen-prefix collision attack. A message $m$ is typically signed by first computing $f(m)$ , where $f$ is a cryptographic hash function, and then using some secret key to sign $f(m)$ . Suppose

fraudulent

contract. Mallory prepares a fair contract

m

and a fraudulent one

m'

. She then finds a number of positions where

m

can be changed without changing the meaning, such as inserting commas, empty lines, one versus two spaces after a sentence, replacing synonyms, etc. By combining these changes, she can create a huge number of variations on

m

which are all fair contracts.

In a similar manner, Mallory also creates a huge number of variations on the fraudulent contract $m'$ . She then applies the hash function to all these variations until she finds a version of the fair contract and a version of the fraudulent contract which have the same hash value, $f(m)=f(m')$ . She presents the fair version to Bob for signing. After Bob has signed, Mallory takes the signature and attaches it to the fraudulent contract. This signature then "proves" that Bob signed the fraudulent contract.

The probabilities differ slightly from the original birthday problem, as Mallory gains nothing by finding two fair or two fraudulent contracts with the same hash. Mallory's strategy is to generate pairs of one fair and one fraudulent contract. For a given hash function $2^{l}$ is the number of possible hashes, where $l$ is the bit length of the hash output. The birthday problem equations do not exactly apply here. For a 50% chance of a collision, Mallory would need to generate approximately $2^{(l/2)+1}$ hashes, which is twice the number required for a simple collision under the classical birthday problem.

To avoid this attack, the output length of the hash function used for a signature scheme can be chosen large enough so that the birthday attack becomes computationally infeasible, i.e. about twice as many bits as are needed to prevent an ordinary brute-force attack.

Besides using a larger bit length, the signer (Bob) can protect himself by making some random, inoffensive changes to the document before signing it, and by keeping a copy of the contract he signed in his own possession, so that he can at least demonstrate in court that his signature matches that contract, not just the fraudulent one.

Pollard's rho algorithm for logarithms is an example for an algorithm using a birthday attack for the computation of discrete logarithms.

Reverse attack

The same fraud is possible if the signer is Mallory, not Bob. Bob could suggest a contract to Mallory for a signature. Mallory could find both an inoffensively-modified version of this fair contract that has the same signature as a fraudulent contract, and Mallory could provide the modified fair contract and signature to Bob. Later, Mallory could produce the fraudulent copy. If Bob doesn't have the inoffensively-modified version contract (perhaps only finding their original proposal), Mallory's fraud is perfect. If Bob does have it, Mallory can at least claim that it is Bob who is the fraudster.

Notes

^ "Avoiding collisions, Cryptographic hash functions" (PDF). Foundations of Cryptography, Computer Science Department, Wellesley College.
^
doi:10.6028/nist.sp.800-107r1
.

^ Daniel J. Bernstein. "Cost analysis of hash collisions : Will quantum computers make SHARCS obsolete?" (PDF). Cr.yp.to. Retrieved 29 October 2017.
S2CID 118940551
.

doi:10.17487/RFC4949. RFC 4949
. Informational.

^ "Birthday Problem". Brilliant.org. Brilliant_(website). Retrieved 28 July 2023.

^ Bellare, Mihir; Rogaway, Phillip (2005). "The Birthday Problem". Introduction to Modern Cryptography (PDF). pp. 273–274. Retrieved 2023-03-31.

ISBN 978-3-540-46885-1
.

^ See upper and lower bounds.

^ Jacques Patarin, Audrey Montreuil (2005). "Benes and Butterfly schemes revisited" (PostScript, PDF). Université de Versailles. Retrieved 2007-03-15. {{cite journal}}: Cite journal requires |journal= (help)

arXiv:cs/0701166
.

^ "CiteSeerX". Archived from the original on 2008-02-23. Retrieved 2006-05-02.

^ "Compute log(1+x) accurately for small values of x". Mathworks.com. Retrieved 29 October 2017.

References

EUROCRYPT
2004: pp401–418

Applied Cryptography, 2nd ed. by Bruce Schneier

External links

"What is a digital signature and what is authentication?" from
RSA Security's crypto FAQ
.

"Birthday Attack" X5 Networks Crypto FAQs

v
t
e
Cryptographic hash functions and message authentication codes

List

Comparison

Known attacks

Common functions

MD5 (compromised)

SHA-1 (compromised)

SHA-2

SHA-3

BLAKE2

SHA-3 finalists

BLAKE

Grøstl

JH

Skein

Keccak (winner)

Other functions

BLAKE3

CubeHash

ECOH

FSB

Fugue

GOST

HAS-160

HAVAL

Kupyna

LSH

Lane

MASH-1

MASH-2

MD2

MD4

MD6

MDC-2

N-hash

RIPEMD

RadioGatún

SIMD

SM3

SWIFFT

Shabal

Snefru

Streebog

Tiger

VSH

Whirlpool

Password hashing/
key stretching functions

Argon2

Balloon

bcrypt

Catena

crypt

LM hash

Lyra2

Makwa

PBKDF2

scrypt

yescrypt

General purpose
key derivation functions

HKDF

KDF1/KDF2

MAC functions

CBC-MAC

DAA

GMAC

HMAC

NMAC

OMAC/CMAC

PMAC

Poly1305

SipHash

UMAC

VMAC

Authenticated
encryption modes

CCM

ChaCha20-Poly1305

CWC

EAX

GCM

IAPM

OCB

Attacks

Collision attack

Preimage attack

Birthday attack

Brute-force attack

Rainbow table

Side-channel attack

Length extension attack

Design

Avalanche effect

Hash collision

Merkle–Damgård construction

Sponge function

HAIFA construction

Standardization

CAESAR Competition

CRYPTREC

NESSIE

NIST hash function competition

Password Hashing Competition

NSA Suite B

CNSA

Utilization

Hash-based cryptography

Merkle tree

Message authentication

Proof of work

Salt

Pepper

v
t
e
Cryptography
General

History of cryptography

Outline of cryptography

Classical cipher

Cryptographic protocol
Authentication protocol

Cryptographic primitive

Cryptanalysis

Cryptocurrency

Cryptosystem

Cryptographic nonce

Cryptovirology

Hash function
Cryptographic hash function

Key derivation function

Secure Hash Algorithms

Digital signature

Kleptography

Key (cryptography)

Key exchange

Key generator

Key schedule

Key stretching

Keygen

Machines

Cryptojacking malware

Ransomware

Random number generation
Cryptographically secure pseudorandom number generator (CSPRNG)

Pseudorandom noise (PRN)

Secure channel

Insecure channel

Subliminal channel

Encryption

Decryption

End-to-end encryption

Harvest now, decrypt later

Information-theoretic security

Plaintext

Codetext

Ciphertext

Shared secret

Trapdoor function

Trusted timestamping

Key-based routing

Onion routing

Garlic routing

Kademlia

Mix network

Mathematics

Cryptographic hash function

Block cipher

Stream cipher

Symmetric-key algorithm

Authenticated encryption

Public-key cryptography

Quantum key distribution

Quantum cryptography

Post-quantum cryptography

Message authentication code

Random numbers

Steganography

Category

Retrieved from "https://en.wikipedia.org/w/index.php?title=Birthday_attack&oldid=1276497878"

[1] "Avoiding collisions, Cryptographic hash functions" (PDF). Foundations of Cryptography, Computer Science Department, Wellesley College.

[:0-2] 
doi:10.6028/nist.sp.800-107r1
.

[3] Daniel J. Bernstein. "Cost analysis of hash collisions : Will quantum computers make SHARCS obsolete?" (PDF). Cr.yp.to. Retrieved 29 October 2017.

[4] S2CID 118940551
.

[rfc4949-5] :10.17487/RFC4949. RFC 4949
. Informational.

[6] "Birthday Problem". Brilliant.org. Brilliant_(website). Retrieved 28 July 2023.

[7] Bellare, Mihir; Rogaway, Phillip (2005). "The Birthday Problem". Introduction to Modern Cryptography (PDF). pp. 273–274. Retrieved 2023-03-31.

[8] ISBN 978-3-540-46885-1
.

[9] See upper and lower bounds.

[10] Jacques Patarin, Audrey Montreuil (2005). "Benes and Butterfly schemes revisited" (PostScript, PDF). Université de Versailles. Retrieved 2007-03-15. {{cite journal}}: Cite journal requires |journal= (help)

[11] arXiv:cs/0701166
.

[12] "CiteSeerX". Archived from the original on 2008-02-23. Retrieved 2006-05-02.

[13] "Compute log(1+x) accurately for small values of x". Mathworks.com. Retrieved 29 October 2017.

[1]

[2]

[3]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]