Classical Ciphers
Created:
20170526
Updated:
20231118
All modern ciphers are based on some kind of substitution operation.
A block of bits is substituted with another block of bits according to a given table or algorithm.
Monoalphabetic cipher: the same element in the plaintext alphabet is always encrypted to the same element in the ciphertext alphabet.
Polyalphabetic cipher: the same element in the plaintext alphabet may be encrypted to different elements in the ciphertext alphabet.
Polyalphabetic ciphers algorithms typically depend on a key that is cyclically used to encrypt the single letters. Thus are equivalent to a set of monoalphabetic ciphers that are alternatively used.
Note that given a key with finite length, at some point the same letters are encoded again to the same ciphertext (we loop through the key), thus in practice polyalphabetic ciphers are equivalent to a monoalphabetic block cipher with block length equal to the key length.
The distinction between a monoalphabetic and a polyalphabetic cipher thus technically depends on the choice of what is the alphabet we work with, if each block of a polyalphabetic cipher is interpreted as an element of a bigger alphabet then a polyalphabetic cipher is still a monoalphabetic cipher.
Strictly speaking, a real polyalphabetic cipher has no cycles e.g. a stream cipher using a TRNG for the keystream.
Conventions
Each cipher is described by a tuple (P, C, K, E, D)
with:
P
all the possible plaintextsC
all the possible ciphertextsK
all the possible keys (aka keyspace)E
encryption functionD
decryption function
We assume to work with the alphabet A = {a,..,z}
. Where more convenient the
alphabet is interpreted as Z_26
by assigning each letter to the corresponding
position within the English alphabet (e.g. a→0
, .., z→25
).
The set A*
is the set of strings composed by elements of A
.
In the following algorithms:
P = C = A*
K
,E
,D
are dependent on the specific cipher
Substitution Cipher
The substitution is not driven by any particular algorithm but by a permutation
table pi
.
E_pi[p] = pi(p) = c
D_pi[c] = pi^1(c) = p
The keyspace consists of all the possible tables we can construct, or in other words all the permutations of the alphabet symbols.
K = A! = 26! ≥ 4·10^26
The key is one of all the possible permutations.
In a general substitution cipher the substitution rule may not be describable algorithmically, thus to specify the key we have to explicitly enumerate all the substitutions. This is impractical even with small alphabets.
For example if our alphabet consists of bitstrings of length 64 bits, then the key size is 64⋅2^64 bits, i.e. the table of all the substitutions that may be applied during the encryption procedure (the codebook).
Instead of allowing arbitrary plaintextciphertext association we may derive the mapping via some kind of algorithm.
The key size is thus reduced to some kind of secret input that is required by the algorithm definition, that is short enough to be practical for sharing.
Unfortunately key size reduction also comes with keyspace reduction, the size will be bound to the number of associations that are possible via such an algorithm.
Substitution ciphers driven by very simple algorithms are for example shift cipher, affine, atbash and Vigenere.
Attack
Frequency analysis. Every language has a characteristic frequency for the letters (e.g. in English the letter ’e’ has an approximate frequency of 12%).
Simple substitution ciphers leaves the letters frequencies intact and thus allow to attack the cipher via a trivial frequency analysis.
As an extension we can try to conceal the frequencies by considering group of letters of fixed length (blocks) such as digrams (couples of two letters). Some blocks have higher frequency (e.g. ’th’ in English) anyway.
Potential solutions for frequency analysis:
 Blocks substitution ciphers: instead of replacing single letters we work on blocks of m letters. Thus flattening the frequencies of the blocks.
 Polyalphabetic ciphers: use more than one monoalphabetic cipher by rotating their usage (e.g. Vigenere and Enigma).
As said before, note that a polyaphabetic cipher with a cycle is equivalent to a monoalphabetic block cipher. Working with blocks is just like working with a larger alphabet.
For example if the block length is 3
there are 26^3
elements in the alphabet
(equal to the key length) and if the cipher is a pure substitution cipher then
the keyspace size of (26^3)!
elements.
Polyalphabetic ciphers are thus preferred since the key is very compact, and we still have a fairly big keyspace.
For example, with a polyalphabetic shift cipher, if the block length is 3
then
we have 26^3
possible keys and the key length is 3
.
Shift Cipher
The key is defined as an integer k ∈ Z_26
.
We shift each plaintext letter by k
positions with respect to their position
in the alphabet.
E_k[p] = (p + k) mod 26 = c
D_k[p] = (p  k) mod 26 = p
The same shift quantity is applied for each word in the input string.
For example, shift every character in the plaintext by k = 3
positions:
a → d, b → e, ... , v → b, z → c
When k = 3
the cipher is also known as the Caesar cipher.
Attack
The keyspace is trivially small (26), thus can be easily brute forced.
Vigenere Cipher
A polyalphabetic cipher composed of several shift ciphers.
The keyspace is A^m
, where m
is the key length.
The scheme can be easily visualized by writing the key and the plaintext one above the other, the key is repeated as required till the end of the plaintext.
The encryption/decryption functions for the ith element of the plaintext/ ciphertext:
E_k[p_i] = [p_i + k_(i mod m)] mod A = c_i
D_k[c_i] = [c_i  k_(i mod m)] mod A = p_i
In practice the shift cipher can be reduced to a Vigenere cipher with m=1.
Attack
The cipher is trivially vulnerable to a known plaintext attack:
k_(i mod m) = (c_i  p_i) mod A
The cipher is also vulnerable to statistical analysis. In this case the vulnerability stems from the key repetition. The shortest is the key the more it is vulnerable.
Note that is m
is equal to the plaintext length then this cipher is equivalent
to the onetime pad, a well known unconditionally secure cipher.
Kasiski’s Test (~1863).
When there are sequences that repeat in the plaintext, is possible that this repetition is propagated to the ciphertext. This happens if they are at a distance that is a multiple of the key length and thus end up being encrypted using the same elements of the key (i.e. the same monoalphabetic cipher).
Finding a repetition in the ciphertext can thus suggest that the distance between the repeated sequence is equal to a multiple of the key length.
Hint: consider only repetitions of 3+ letters.
Steps:
 Compute the distances
d1
, ..,dz
between all the repetitions.  The key length
m
may divided1
, ..,dz
and dividesgcd(d1,..,dz)
.
Once that the key length has been guessed a frequency analysis attack can be
carried on the subsets encrypted with the same k_i
(the same attack used for
the trivial shift cipher).
Each of these subsets is a simple shift cipher.
C1 = { ci such that ci = pi + k1 }
...
Cm = { ci such that ci = pi + km }
May happen to find some bogus di
values, in this case proceed retry or use the
Friedman Test.
Friedman Test (~1920)
Index of Coincidence
Given a vector of characters x = (x1,..,xn)
in A*
then Ic(x)
is the
probability to extract two elements from x
, without reinsertion, with the
same value.
Example:
Ic(a, a, a) = P(a)·P(aa) = 1
Ic(a, b, a) = P(a)·P(aa) + P(b)·P(bb) = 2/3·1/2 + 1/3·0 = 2/6 = 1/3
Given the vector x
, we define the frequency f(i)
to be the number of
occurrences of the ith character of the alphabet in x
(e.g. A = {a, b}
,
x = (a, b, a)
→ f(0) = 2
, f(1) = 1
).
We also define the probability to extract the ith character of the alphabet
from x
as P(i) = f(i)/N
, with N = len(x)
.
Pr(0) = f(0)/N Pr(00) ≈ (f(0)1)/(N1)
... ...
Pr(25) = f(25)/N Pr(2525) ≈ (f(25)1)/(N1)
When both f(i)
and N
are big, then (f(i)1)/(N1) ≈ f(i)/N
, thus:
Ic(x) ≈ Pr(0)^2 + ... + Pr(25)^2 = ∑ Pr(i)^2
The Ic
measures some redundancy value for a sequence of characters.
Three interesting cases for x
.

x
is a (long enough) English text.Pr(i)
follows the well known probability values of English lettersIc(x) ≈ 0.065

x
is obtained from a monoalphabetic substitution cipher Individual probabilities are permuted, but the overall result is unchanged.
Ic(x)
is the same as for the cleartext (≈ 0.065
).

x
is a uniformly distributed random sequencePr(i) = 1/26
, for everyi
Ic(x) = 26·[(1/26)^2] = 1/26 ≈ 0.038
In short:
 high value → no randomness →
Ic(x) ≈ 0.065
 low value → max randomness →
Ic(x) ≈ 0.038
Key Length disclosure
Given the ciphertext:
y = (y1, ..., yn)
We gain some key length candidates (e.g. via the Kasiski test or exhaustive search).
Dispose the ciphertext in a matrix of m
rows, with m
a candidate.
The ciphertext is then divided in the columns of the matrix, i.e. the first column contains the first m characters of the ciphertext:
⌈ y1 y_m+1 ... ⌉ ⌈ R1 ⌉
 y2 y_m+2 ...  =  R2 
 ... ... ...   .. 
⌊ ym y_2m ... ⌋ ⌊ Rm ⌋
Compute the Ic
for each row Ri
.
 If the key length
m
is correct thenRi
is a sequence which elements are all encrypted with the same key value, thusIc(Ri)
should be high (close to0.065
).  If the key length is incorrect then
Ic(Ri)
will be low (close to0.038
).
TODO: can’t we use the entropy of x
in place of the index of coincidence?
Key disclosure
Once that the key length m
is disclosed, we proceed determining the single
letters of the key k = (k1,..,km)
.
[ R1 ] → encrypted with k1
N1 = len(R1)
p(i) = prob for the ith char when using English language
How encryption is done using k1
on the single plaintext characters:
Plaintext  English prob. P(i)  Ciphertext  Cipher frequencies F(k1) 

0  p(0)  k1+0  f(k1+0)/N1 
1  p(1)  k1+1  f(k1+1)/N1 
…  …  …  … 
25  p(25)  k1+25  f(k1+25)/N1 
As the value of k1
we need to choose the value (one out of the 26 alphabet
values) that better approximates the English typical frequencies.
In other words k1
is equal to j ∈ Z_26
if the distance between F(j)
and
P
vectors F(j)  P
is minimal.
An equivalent technique often reported in literature is to get j
that
maximize the scalar product between F(j)
and P
.
The procedure is repeated for every row Ri
to gain different parts of the key.
Note that this is exactly the same attack used for the shift encryption (only
more structured), with shift encryption m = 1
, i.e. we have a single row R
containing the full ciphertext.
As the opposite case, also note that if the key length is equal to the length
of the plain text this cipher is equivalent to the onetime pad. In this case
each row Ri
has just one element, thus we have no material to analyze the
frequencies.
Affine Cipher
A substitution cipher where substitution algorithm requires multiplication and addition.
The key is defined as a couple of integers (a, b)
in Z_26
.
E_(a,b)[p] = a·p + b mod 26 = c
D_(a,b)[c] = (c  b)·a^1 mod 26 = p
Note that a
is required to be invertible modulo A = 26
, thus is required
that gcd(a, 26) = 1
. Because 26 = 13·2
then a
can’t be even or 13.
Attacks
The affine cipher defined over A = Z_26
has only 26·12 = 312
possible keys.
A brute force attack is trivial over such a small set.
Frequency analysis can also be used like any other substitution cipher.
The cipher is vulnerable to a known plaintext attack when two couples (p1,c1)
and (p2,c2)
are known:
c1 = a·p1 + b
c2 = a·p2 + b
→ c1  c2 = a·(p1  p2)
→ a = (c1c2)/(p1p2)
→ b = c1  a·p1
The only requirement is that (p1p2)
is invertible modulo A
.
Transposition Cipher
Scrambles the position of the plaintext characters without changing the characters themselves.
Given a plaintext of length N
and assuming a relative frequency of the ith
alphabet character as fi
then the number of possible ciphertexts are N!/∏fi!
for each i
.
For example, the plaintext “santarealfun” can be encrypted in 12!/(3!·2!) = 39,916,800
different ways, One of these is “satanfuneral”.
By dividing and processing the plaintext in blocks of length m
, encryption can
be formalized by multiplying each block by a transposition matrix
Both encryption and decryption are similar to the Hill cipher, described next.
Attacks
Since transposition does not affect the frequency of individual symbols, simple transposition can be easily detected by doing a frequency count. If the ciphertext exhibits a frequency distribution very similar to plaintext, it is most likely a transposition.
There are several methods for attacking the cipher. These include:
 Knownplaintext attack: see Hill cipher
 Using known or guessed parts of the plaintext to assist in reverseengineering.
To decipher the encrypted message an attacker could try to guess possible words with the characters found in the ciphertext.
In general, transposition methods are vulnerable to anagramming, sliding pieces of ciphertext around, then looking for sections that look like anagrams of words, and solving the anagrams. Once such anagrams have been found, they reveal information about the transposition pattern, and can consequently be extended.
Hill Cipher
A block cipher more explicitly derived from linear algebra.
With a block size m
, if we interpret the cipher as a monoalphabetic cipher,
the alphabet can be also viewed as Z_26^m
.
Each plaintext and ciphertext block is represented as a m×1
column vector:
p = [ p1 ... pm ]'
c = [ c1 ... cm ]'
The block transformation is driven by a key K
, a m×m
square matrix:
⌈ k11 k12 ... k1m ⌉
K =  k21 k22 ... k2m 
 ... 
⌊ km1 km2 ... kmm ⌋
Encryption and decryption functions are defined as:
E_K[p] = K·p mod 26 = c
D_K[p] = K^1·c mod 26 = p
With modular operation applied to the result of row by column product.
The transformation is a matrixvector product:
c1 = k11·p1 + k12·p2 + ... + k1m·pm (mod 26)
...
cm = km1·p1 + km1·p2 + ... + kmm·pm (mod 26)
Diffusion principle: a single ciphertext character depends on all the plaintext characters within the same block.
Diffusion principle is quite effective against frequency analysis attacks by reducing the redundancy of the single letters.
Note that decryption requires the key matrix to be invertible modulo A
.
When operating over real numbers a matrix is invertible if the columns are
linearly independent (i.e. we cannot express one column as a linear combination
of the others). In other, equivalent, terms a matrix K
is invertible if and
only if det(K) ≠ 0
.
In modular arithmetic the inverse of a matrix exists if and only if the
determinant is coprime with the alphabet size or in other words if and only if
det(K)
is invertible modulo A
.
K is invertible modulo A iff gcd(det(K), A) = 1
The determinant is typically computed via the Cramer’s formula.
The inverse matrix B = K^1
elements are computed as:
b_ij = (1)^(i+j) · det(K*_ji) · det(K)^1 mod A
Where:
K*_ji
: is the matrixK
minor obtained by removing fromK
the rowj
and columni
.det(K)^1
is the inverse of the determinant moduloA
. The inverse exists whengcd(det K, A) = 1
.
Attack
Because of the diffusion principle, the cipher is fairly resistant to ciphertext only attacks, but easily fails with a knownplaintext attack.
Assume that the attacker knows m
couples of plaintext/ciphertext blocks of
length m
encrypted using the same key K
(and thus knows m
as well).
(pi,ci) → ci = K·pi mod A
We merge the ci
and pi
column vectors to obtain two m×m
matrix
[c1  c2  ...  cm ] = K·[p1  p2  ...  pm ]
C = K·P
Next we can use one of the wellknown matrix resolution methods (e.g. Gaussian
elimination method, LRU, …) to compute P^1
.
If P
is not invertible modulo A
, then we should try with a different set
of (pi,ci)
.
Follows that the first thing the attacker should do is to check if P
is
invertible by computing the determinant.
Wrapping Up
Reduction to a linear transformation
In practice, excluded the pure nonalgorithmic substitution cipher, all the classical ciphers viewed so far can be expressed using exactly the same linear transformation:
E[p] = A·p + b = c
D[c] = A^1·(c  b) = p
With A
a matrix providing diffusion and b
a vector providing polyalphabetic
shift encryption.
 Shift cipher:
A
is a1⨯1
identity matrix andb
a vector of length 1.  Vigenere cipher:
A
is am⨯m
identity matrix andb
am⨯1
vector.  Affine cipher:
A
is a1⨯1
invertible matrix andb
a vector of length 1.  Transposition cipher:
A
is am⨯m
transposition matrix andb
is a zero vector.  Hill cipher:
A
is am⨯m
invertible matrix andb
is am⨯1
zero vector.
We can obviously extend the Hill cipher by using an arbitrary m⨯1
vector b
.
Lesson Learned
In general, once the key has been fixed, we obtain a function from plaintext to ciphertext. This function should NOT be a linear function.
If the cipher is based on a linear function then we can always apply some simple technique to break it (or weaken it if it has some a linear component).
For example, Lucifer (an ancestor of DES) was having a linear component.
Modern standards require a cipher should resist to knownplaintext attacks.
If we combine an operation providing diffusion (as Hill) with some elements of confusion (as a nonlinear substitution) we can obtain a strong cipher.
This naturally brings us to the more modern SubstitutionPermutationNetwork (SPN) design for ciphers (e.g. Feistel).
More on polyalphabetic ciphers
We consider each block of a polyalphabetic cipher as a single word of a substitution cipher where the substitution is driven by some sort of algorithm.
Doing such substitution algorithmically we are not using the entire
possibilities for the mapping between the plaintext and ciphertext
(K < A!
).
For example, with Vigenere cipher with blocks of 64 bits (keylen = 64) we have a
keyspace of 2^64
and not (2^64)!
as for a substitution cipher derived from a
permutation table
When using a particular key then such a key is reused over the whole plaintext set. With a polyalphabetic cipher is not possible to have a mapping where we apply to two different blocks two different keys. Well, to be fair is possible to extend the block cipher to use different keys a la Vigenere cipher but we’ve just moved the problem forward since this is just equivalent to an algorithmic substitution cipher with a bigger alphabet.
References
 cry affine cipher
 Feistel ciphers