Version 1.5

Version 0.99.1 - please mail me with comments, corrections etc.

The datablock of 64 bits is split up in 4 16-bit parts, labeled X1, X2, X3 and X4. These are fed into IDEA. Add stand for addition modulo 2^16, Mul for Multiplication modulo 2^16+1 (and the all-zero subblock is treated as representing 2^16), and Xor for a bitwise Exclusive Or. From the 128 bit key 52 16-bit subkeys are generated, which are used in the rounds. The subkeys are labeled Zx_y, where y is the round where the key is used, and x the number of the subkey within the round. A detailed explanation of how the subkeys are derived can be found below the schematic diagram.

One round of IDEA looks like this :

X1 X2 X3 X4 | | | | V V V V Z1_1->Mul Z2_1->Add Z3_1->Add Z4_1->Mul | | | | | | | | +-------------|------>Xor<------------+ | | | | | | | | | | | | +--------|--->Xor<------|------------+ | | | | | | | | | | | | | | V V | | | | Z5_1->Mul-->Add | | | | | | | | | | | | | | | | V V | | | | Add<--Mul<-Z6_1 | | | | | | | | V | | | V | Xor<-----------|--------|-----+------>Xor | | V | | V | Xor<------+--------------|---------->Xor | | | | | +-----------\/----------+ | | ____________/\___________ | | | | | V V V V

And seven more of these rounds. and after those rounds the output transformation :

| | | | | +-----------\/----------+ | | ____________/\___________ | | | | | V V V V Z1_9->Mul Z2_9->Add Z3_9->Add Z4_9->Mul | | | | Y1 Y2 Y3 Y4

The 128 bit key is partitioned into 8 subblocks, that are directly used for the first 8 subkeys. (Keys Z1_1...Z1_6 Z2_1 Z2_2 ). The master key is then cyclicly rotated 25 positions to the left, and again partitioned into 8 blocks, which are used for the next 8 subkeys. This procedure is repeated until all 52 subkeys are generated.

Decryption with IDEA works exactly the same, only the subkeys are different. The subkeys for decryption are derived as follows from the encryption subkeys :

round 1 Z1_9^-1 -Z2_9 -Z3_9 Z4_9^-1 Z5_8 Z6_8 round 2 Z1_8^-1 -Z3_8 -Z2_8 Z4_8^-1 Z5_7 Z6_7 round 3 Z1_7^-1 -Z3_7 -Z2_7 Z4_7^-1 Z5_6 Z6_6 ... round 8 Z1_2^-1 -Z3_2 -Z2_2 Z4_2^-1 Z5_1 Z6_1 output Z1_1^-1 -Z2_1 -Z3_1 Z4_1^-1

The Zn_m denote the same subkeys used in the encryption. The ^-1 means the multiplicative inverse (modulo 2^16+1), and the - is simply the negative of a subkey.

There are 60x60x24x365 = 3.15x10^7 seconds in a year.

With 10^38 keys, it would take this hypothetical computer 3x10^38/(10^9 x 3x10^7) = 10^22 years to find one key.

If every person on earth had one such computer, it would 'only' take 10^22 / 5x10^9 = 2x10^12 years. This is hundreds of times longer than the age of the earth (4.5x10^9 years). So we can conclude that IDEA is quite safe from a brute-force attack.

- Bruce Schneier,
*Applied Cryptography 2nd edition*John Wiley & Sons - Chapter three of Xuejia Lai's Ph.D. thesis, describing IDEA. Available as a postscript file on the Internet.
- Weak keys for IDEA, paper by Joan Daemen, René Govaerts and Joos Vandewalle. Available as a postscript file on the Internet.

With public-key cryptography the whole situation is different. Here there are two keys, one public key and one private or secret key. The public key can be used to encrypt messages, but once the messages are encrypted with the public key, they can only be decrypted with the secret key.

Thus the public key can be transmitted in any manner, since it doesn't have to be secret. The only thing that must be kept secret is the private key, and that is easier since this key never has to be transmitted to anyone.

Public-key cryptography was invented in 1976 by Whitfield Diffie and Martin Hellman , and independently by Ralph Merkle. The basis of any public-key algorithm is a mathematical function that can easily be calculated, but that is very hard to reverse without some extra knowledge. The RSA public-key algorithm was invented in 1978 by Rivest, Shamir and Adleman, and thus derives its name from their initials. RSA relies on the difficulty of factoring a large number into into its prime components. Remember, a prime is a number that can only divided by 1 and by itself. If a number is not prime, but composite, it can be uniquely factored. For example, 55=5x11. However, when the number to be factored is not 55, but a few hundred digits long, this is a very hard problem.

`C = M^e mod n`

.

C is now the encrypted message.

If we wish to decode the message, we calculate :

`M = C^d mod n`

.

**Example:**

p=5, q =11, n=55, (p-1)(q-1) = 40. e must be between 3 and 40, and no common
factors with 40.
We choose e = 7.
`de = 1 mod (p-1)(q-1)`

.
d = 23, because 23*7 = 161 = 1 mod 40

**Encryption:**
`M = 2 => C = 2^7 mod 55 = 18`

**Decryption:** ```
M = C^d = 18 ^ 23 mod 55
= 18 ^1 * 18^2 * 18^4* 18^16 mod 55
```

doing the modulo-operation on each of the numbers

` = 18 * 49 * 36 * 26 mod 55 = 2`

Our public key consists of the numbers (e, n) and our secret key is (d, n).

The trick is that given p and q, d can easily be calculated, but without knowing p and q, d cannot be found.

The various operations that have to be done (modular exponentiation, modular inverses etc) can be done quite efficiently.

The Prime Number Theorem states that there are on the order of N/lnN primes below the number N. With N 2^256, there are 6.5*10^74 primes below 2^256 (1.1*10^77). If we wish to know the number of primes between 2^256 and 2^255, we have to subtract the number of primes below 2^255 . There are about 3.2*10^74 primes below 2^255. And thus, there are about 3.3*10^74 primes of exactly 256 bits. As the number of particles in the Universe if estimated at around 10^80, it is obvious that running out of primes, or listing all the primes is totally out of the question even for 512 bit keys.

Finding such primes is not a very hard problem. This is a good
thing, because if it was using PGP would be impossible ! To find such a
prime, we just generate a random number of appropriate size, and test if
it is a prime. If it isn't we try again until we succeed. There are
several very fast algorithms that can test if a number is prime very
quickly. The fastest ones are probabilistic. That is, they will never
declare a prime number composite, but they will with a certain chance
declare a composite number prime. But if we test the same number again
and again against different bases, the chance that it is composite and
yet not found by any of these test gets very small. There *do* exist
exact primality tests (that prove for certain if a number is prime), but
they take a lot more work. PGP first does a trial division with all primes
up to 8191, and then uses the Fermat-test with bases 2, 3, 5, 7.

It has not been *proven* that breaking RSA is as hard as factoring n.
Therefore, it just might be that there is some method for breaking RSA
that does not require the factorization of a large number. And
second, factoring itself has not been *proven* to be a truly hard
problem. I use "hard problem" here in the mathematical sense. If time
required to solve a problem grows as a linear function of the input, or
as a polynomial function, a problem is considered tractable. For
example, counting n objects takes in principle just n operations. But
mean the workload rises as an exponential function of the number of
inputs (like 2^n, or 10^n or worse) than the problem is considered
intractable, because exponential functions grow so fast.

In the case of PGP we are dealing with general numbers. It is believed that with a large effort, 512 bit keys can be factored now. The "large effort" should be seen in the order of several thousand workstations for many months .

One of the most frequently asked questions is "how long should my key be", or "how long would it take to break a <n> bit key ". Unfortunately, it is very hard to make even moderately accurate predictions about the time needed to factor a number that is much larger than those that have been factored up till now. And for predicting in the far future, we also have to take advances in computer power into account.

Just for those who wish to plug in numbers, here is the
heuristic runtime formula (not proven to be correct) for the
General Number Field Sieve:
```
exp[(1.932+o(1))*(log n)^1/3* loglog(n)^(2/3)]
```

Log is the natural logarithm, and `exp(x) = e^x = 2.718..^x`

Here's a table from *Applied Cryptography*, referenced with
an unpublished paper (as of Feb. 1995) by Andrew Odlyzko
"Progress in Integer Factorization and Discrete Logarithms"

Mips years required to factor a number with the GNFS:

Bits | Mips-years |
---|---|

512 | 30,000 |

768 | 2*10^8 |

1024 | 3*10^11 |

1280 | 1*10^14 |

1536 | 3*10^16 |

2048 | 3*10^20 |

In another table, making generous assumptions about the increase in computer power and assuming that factoring algorithms will be as fast the Special Number Field Sieve (which today can only be used on special numbers), Bruce Schneier gives the following keylenghts against various opponents :

Year | vs Individual | vs Corporation | vs Government |
---|---|---|---|

1995 | 768 | 1280 | 1536 |

2000 | 1024 | 1280 | 1536 |

2005 | 1280 | 1536 | 2048 |

2010 | 1280 | 1536 | 2048 |

2015 | 1536 | 2048 | 2048 |

Please note that there is lots of assumptions and handwaving in these numbers. Very few of the factorizations experts dare to make predictions beyond a few years.

Also keep in mind that there are, especially for Governments,
other methods for obtaining your keys than spending billions
of dollars and waiting one or more years for the factorization
of your key. Burglary and electronic snooping (Tempest) are
much more effective and *much* cheaper.

This attack is of little consequence for PGP, because anybody who can measure the time it takes your CPU to process a message with this accuracy, can also just copy the key from memory. The attack is no more dangerous than a virus or trojan designed to intercept your passphrase and secret key.

- Bruce Schneier,
*Applied Cryptography 2nd edition*. - Paul Kocher: timing attack (http://www.cryptography.com/)

More information about factoring and the two main algorithms used for large numbers:

- Factoring Integers Using The Web And The Number Field Sieve - Arjen K. Lenstra: ftp://ftp.bellcore.com/pub/lenstra/nfs.ps.Z
- The Magic Words Are Squeamish Ossifrage

Extended Abstract. Derek Atkins, Michael Graff, Arjen K. Lenstra, Paul C. Leyland. (net-wide factoring of 129 digit/426 bit number) - Factoring by Email - Arjen K. Lenstra, Mark S. Manasse: ftp://ftp.ox.ac.uk/pub/math/rsa129/fact_by_email.Z
- Multiple Polynomial Quadratic Sieve (mpqs) sans_math handwaving explanation of the mpqs without the math - Paul Leyland ftp://ftp.ox.ac.uk/pub/math/rsa129/mpqs_sans_math.Z
- Factoring integers with large prime variations of the quadratic sieve - H.J.J. Boender and H.J.J. te Riele: ftp://ftp.cwi.nl/pub/CWIreports/NW/NM-R9513.ps.Z
- An implementation of the number field sieve - R.M. Huizing: ftp://ftp.cwi.nl/pub/CWIreports/NW//NM-R9511.ps.Z. Also: http://www.cwi.nl/cwi/publications/index.html
- The Block Lanczos algorithm used to solve the HUGE
matrices that arise when factoring such numbers:
A block Lanczos algorithm for finding dependencies over GF(2) - Peter L. Montgomery:

A simple checksum, that adds up all the all the bytes in a file and then truncates the result to last n digits, or a CRC (Cyclical Redundancy Check) are examples of hashes, but they are not one-way. In order to be one-way, the functions must also have the following properties :

- Given M, it must be easy to compute h.
- Given h, it must be hard to compute an M such that h=H(M)
- Given M, it must be hard to find a M' such that H(M) = H(M')

In some applications, one-wayness as defined above is not enough. The hash function must also be collision-resistant. That is, it must be hard to find two random M and M' that have the same hash-value.

In order to satisfy all the requirement, a hash value must have a certain length. It is obvious that as soon as we have documents that are longer than the output of the hash function, collisions must occur. This is called the pigeon-hole principle. If our hash function can output 4 distinct values, and we hash 5 documents, at least two documents will have the same hash value.

A related problem is posed by the 'birthday-attack'. This method is called so because it based on the often unknown fact, that if there are 23 people in the room, there's a 50% chance that two of them will have the same birthday.

This follows from the following calculation.

The chance that two people do not share their birthdays, is 364/365 The chance that three people ... is 364/365 * 363/365.

And so on, until we are certain that among 366 people at least two will the same birthday. But with 364/365 * 363/365 * ... * 343/365 ~= 0.5, we already have a 50/50 chance of two people NOT having the same birthday, and thus also a 50/50 chance of them sharing it.

It's the same with hashes. If we have a hash function with a length of n bits, we have a good chance of finding two messages with the same hash value if we try hashing 2^0.5*n random documents.

First 4 variables are initialized :

A=0x01234567 B=0x89abcdef C=0xfedcab98 D=0x76543210

These four variables are now copied into the variables a, b, c, d and the main loop begins. This loop is run as long as there are message blocks.

+-------------------------+ | Message Block | +-------------------------+ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ +-------+ +-------+ +-------+ +-------+ A-+---| round |--->| round |--->| round |--->| round |->Add-------------A B-|+--| |--->| |--->| |--->| |---|->Add---------B C-||+-| 1 |--->| 2 |--->| 3 |--->| 4 |---|---|->Add-----C D-|||+| |--->| |--->| |--->| |---|---|---|->Add-D ||||+-------+ +-------+ +-------+ +-------+ | | | | +|||---------------------------------------------------+ | | | +||-------------------------------------------------------+ | | +|-----------------------------------------------------------+ | +---------------------------------------------------------------+

The final MD5-hash is are the A, B, C, D after the last message block is processed.

In the rounds there are 4 nonlinear functions :

F(X, Y, Z) = (X And Y) Or ( (Not X) And Z) G(X, Y, Z) = (X And Z) Or ( Y And (Not Z)) H(X, Y, Z) = X Xor Y Xor Z I(X, Y, Z) = Y Xor (X Or (Not Z))

One round looks like this :

+------------------------------------------------------------------+ | | | +----+ | | | | | +->| a |-------------------------+ | | | | | +----+ | M_j t_i | | | | | | | | b |+------------------------|-----|-----|-----------------+ | | | \ | | | | | +----+ \ __________ | | | | | | | +->/Nonlinear \ V V | V | | c |----->| Function |---->Add-->Add-->Add-->Rotateleft->Add-+ | | +->\__________/ +----+ / | | / | d |+ | | +----+

Now if we write this as :

FF(a,b,c,d,M_j,s,t_i) denotes a = b+((a+F(b,c,d) + M_j + t_i <<<s) GG(a,b,c,d,M_j,s,t_i) denotes a = b+((a+G(b,c,d) + M_j + t_i <<<s) HH(a,b,c,d,M_j,s,t_i) denotes a = b+((a+H(b,c,d) + M_j + t_i <<<s) II(a,b,c,d,M_j,s,t_i) denotes a = b+((a+I(b,c,d) + M_j + t_i <<<s)

(<<<s is a rotating the bits to left over s positions)

/* Round 1 */ FF (a, b, c, d, M_0, 7, 0xd76aa478); /* 1 */ FF (d, a, b, c, M_1, 12, 0xe8c7b756); /* 2 */ FF (c, d, a, b, M_2, 17, 0x242070db); /* 3 */ FF (b, c, d, a, M_3, 22, 0xc1bdceee); /* 4 */ FF (a, b, c, d, M_4, 7, 0xf57c0faf); /* 5 */ FF (d, a, b, c, M_5, 12, 0x4787c62a); /* 6 */ FF (c, d, a, b, M_6, 17, 0xa8304613); /* 7 */ FF (b, c, d, a, M_7, 22, 0xfd469501); /* 8 */ FF (a, b, c, d, M_8, 7, 0x698098d8); /* 9 */ FF (d, a, b, c, M_9, 12, 0x8b44f7af); /* 10 */ FF (c, d, a, b, M_10,17, 0xffff5bb1); /* 11 */ FF (b, c, d, a, M_11,22, 0x895cd7be); /* 12 */ FF (a, b, c, d, M_12, 7, 0x6b901122); /* 13 */ FF (d, a, b, c, M_13,12, 0xfd987193); /* 14 */ FF (c, d, a, b, M_14,17, 0xa679438e); /* 15 */ FF (b, c, d, a, M_15,22, 0x49b40821); /* 16 */ /* Round 2 */ GG (a, b, c, d, M_1, 5, 0xf61e2562); /* 17 */ GG (d, a, b, c, M_6, 9, 0xc040b340); /* 18 */ GG (c, d, a, b, M_11,14, 0x265e5a51); /* 19 */ GG (b, c, d, a, M_0, 20, 0xe9b6c7aa); /* 20 */ GG (a, b, c, d, M_5, 5, 0xd62f105d); /* 21 */ GG (d, a, b, c, M_10, 9, 0x2441453); /* 22 */ GG (c, d, a, b, M_15,14, 0xd8a1e681); /* 23 */ GG (b, c, d, a, M_4, 20, 0xe7d3fbc8); /* 24 */ GG (a, b, c, d, M_9, 5, 0x21e1cde6); /* 25 */ GG (d, a, b, c, M_14, 9, 0xc33707d6); /* 26 */ GG (c, d, a, b, M_3, 14, 0xf4d50d87); /* 27 */ GG (b, c, d, a, M_8, 20, 0x455a14ed); /* 28 */ GG (a, b, c, d, M_13, 5, 0xa9e3e905); /* 29 */ GG (d, a, b, c, M_2, 9, 0xfcefa3f8); /* 30 */ GG (c, d, a, b, M_7, 14, 0x676f02d9); /* 31 */ GG (b, c, d, a, M_12,20, 0x8d2a4c8a); /* 32 */ /* Round 3 */ HH (a, b, c, d, M_5, 4, 0xfffa3942); /* 33 */ HH (d, a, b, c, M_8, 11, 0x8771f681); /* 34 */ HH (c, d, a, b, M_11,16, 0x6d9d6122); /* 35 */ HH (b, c, d, a, M_14,23, 0xfde5380c); /* 36 */ HH (a, b, c, d, M_1, 4, 0xa4beea44); /* 37 */ HH (d, a, b, c, M_4, 11, 0x4bdecfa9); /* 38 */ HH (c, d, a, b, M_7, 16, 0xf6bb4b60); /* 39 */ HH (b, c, d, a, M_10,23, 0xbebfbc70); /* 40 */ HH (a, b, c, d, M_13, 4, 0x289b7ec6); /* 41 */ HH (d, a, b, c, M_0, 11, 0xeaa127fa); /* 42 */ HH (c, d, a, b, M_3, 16, 0xd4ef3085); /* 43 */ HH (b, c, d, a, M_6, 23, 0x4881d05); /* 44 */ HH (a, b, c, d, M_9, 4, 0xd9d4d039); /* 45 */ HH (d, a, b, c, M_12,11, 0xe6db99e5); /* 46 */ HH (c, d, a, b, M_15,16, 0x1fa27cf8); /* 47 */ HH (b, c, d, a, M_2, 23, 0xc4ac5665); /* 48 */ /* Round 4 */ II (a, b, c, d, M_0, 6, 0xf4292244); /* 49 */ II (d, a, b, c, M_7, 10, 0x432aff97); /* 50 */ II (c, d, a, b, M_14,15, 0xab9423a7); /* 51 */ II (b, c, d, a, M_5, 21, 0xfc93a039); /* 52 */ II (a, b, c, d, M_12, 6, 0x655b59c3); /* 53 */ II (d, a, b, c, M_3, 10, 0x8f0ccc92); /* 54 */ II (c, d, a, b, M_10,15, 0xffeff47d); /* 55 */ II (b, c, d, a, M_1, 21, 0x85845dd1); /* 56 */ II (a, b, c, d, M_8, 6, 0x6fa87e4f); /* 57 */ II (d, a, b, c, M_15,10, 0xfe2ce6e0); /* 58 */ II (c, d, a, b, M_6, 15, 0xa3014314); /* 59 */ II (b, c, d, a, M_13,21, 0x4e0811a1); /* 60 */ II (a, b, c, d, M_4, 6, 0xf7537e82); /* 61 */ II (d, a, b, c, M_11,10, 0xbd3af235); /* 62 */ II (c, d, a, b, M_2, 15, 0x2ad7d2bb); /* 63 */ II (b, c, d, a, M_9, 21, 0xeb86d391); /* 64 */

The constant t_i are the integer parts of 2^32* abs(sin(i)), with i in radians.

Finally, as the last message block is processed, the numbers a, b, c, d form the MD5 hash for the file that has been processed.

Are you confused now? Well, at least the bits in the message are, and that's what it was all about.

- The reference implementation of MD5 can be found in RFC 1321.
*Applied Cryptography 2nd edition*, by Bruce Schneier

[ Table of Contents | About this FAQ | Glossary ]

Comments, additions and suggestions can be sent to <faq-admin@mail.pgp.net>.

This FAQ was generated by Orb v1.3 for OS/2.