cipher-detective-ai / docs /cryptanalysis-cheatsheet.md
Paul Clark
Add project polish: LICENSE, CONTRIBUTING, SECURITY, CITATION, CHANGELOG, Makefile, pyproject, docs/
56db95a

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Classical Cryptanalysis Cheat Sheet

These are the signals Cipher Detective AI surfaces in Explain Mode and uses inside the heuristic baseline. None of them are proofs; they are clues.

Frequency analysis

English text has a very uneven letter distribution. E T A O I N S H R cover roughly 70% of letters in normal prose. A monoalphabetic substitution preserves that shape but relabels the letters — so the histogram still looks "spiky", just with different letters on top.

Index of coincidence (IoC)

IoC=ini(ni1)N(N1) \mathrm{IoC} = \frac{\sum_i n_i (n_i - 1)}{N (N-1)}

Rough reference values:

Text type Typical IoC
English plaintext ~0.066
Monoalphabetic substitution ~0.066
Random uniform letters ~0.038
Vigenère with long key ~0.040–0.045

If the IoC is close to English, the cipher is likely plaintext, Caesar, Atbash, transposition, or substitution. If it drops toward random, suspect a polyalphabetic cipher like Vigenère.

Shannon entropy

Entropy of letter frequencies, in bits per letter. English prose sits around 4.1–4.2 bits/letter. Higher values suggest more "mixed" output (e.g., Vigenère, transposition over a varied alphabet, or short noisy samples).

Chi-squared vs English

Sum of $(O - E)^2 / E$ comparing observed letter counts to English expectations. Lower is more English-like. Useful for ranking Caesar / Affine candidates.

Caesar / ROT brute force

Only 26 shifts. Try all of them, score each against an English dictionary or chi-squared, and the answer usually pops out.

Atbash check

Atbash is self-inverse. Decrypting once with A↔Z, B↔Y, ... is a one-line test that costs nothing.

Kasiski / Friedman (Vigenère key length)

  • Kasiski: repeated trigrams in the ciphertext often appear at distances that are multiples of the key length. The GCD of those distances suggests the key length.
  • Friedman: estimate key length from the IoC. Cipher Detective AI uses a combined "Vigenère indicator" score derived from these.

Transposition tells

Letter frequencies look English (because letters are only rearranged), but common bigrams like TH, HE, IN are unusually rare. Rail-fence and columnar transposition both produce this signature.

Affine

Affine = $E(x) = (a x + b) \mod 26$ with $\gcd(a, 26) = 1$. Only 12 valid a values × 26 b values = 312 keys. Brute-forceable; each candidate is scored against English.

Substitution

If frequency analysis matches English shape but Caesar / Atbash fail, suspect a general monoalphabetic substitution. Solving it cleanly needs interactive hill-climbing over an English language model — outside the scope of this demo.

Reality check

These signals work because classical ciphers leak structure. Modern symmetric encryption (AES-GCM, ChaCha20-Poly1305) and modern public-key cryptography do not leak any of this information; none of these techniques apply to them.