Spaces:

systemslibrarian
/

cipher-detective-ai

Sleeping

App Files Files Community

cipher-detective-ai / docs /cryptanalysis-cheatsheet.md

Paul Clark

Add project polish: LICENSE, CONTRIBUTING, SECURITY, CITATION, CHANGELOG, Makefile, pyproject, docs/

56db95a 15 days ago

preview code

raw

history blame contribute delete

3.11 kB

	# Classical Cryptanalysis Cheat Sheet

	These are the signals Cipher Detective AI surfaces in Explain Mode and uses
	inside the heuristic baseline. None of them are proofs; they are clues.

	## Frequency analysis

	English text has a very uneven letter distribution. `E T A O I N S H R` cover
	roughly 70% of letters in normal prose. A monoalphabetic substitution preserves
	that shape but relabels the letters — so the histogram still looks "spiky",
	just with different letters on top.

	## Index of coincidence (IoC)

	$$
	\mathrm{IoC} = \frac{\sum_i n_i (n_i - 1)}{N (N-1)}
	$$

	Rough reference values:

	\| Text type \| Typical IoC \|
	\|------------------------------------\|-------------\|
	\| English plaintext \| ~0.066 \|
	\| Monoalphabetic substitution \| ~0.066 \|
	\| Random uniform letters \| ~0.038 \|
	\| Vigenère with long key \| ~0.040–0.045 \|

	If the IoC is close to English, the cipher is likely **plaintext, Caesar,
	Atbash, transposition, or substitution**. If it drops toward random, suspect a
	polyalphabetic cipher like Vigenère.

	## Shannon entropy

	Entropy of letter frequencies, in bits per letter. English prose sits around
	4.1–4.2 bits/letter. Higher values suggest more "mixed" output (e.g.,
	Vigenère, transposition over a varied alphabet, or short noisy samples).

	## Chi-squared vs English

	Sum of $(O - E)^2 / E$ comparing observed letter counts to English expectations.
	Lower is more English-like. Useful for ranking Caesar / Affine candidates.

	## Caesar / ROT brute force

	Only 26 shifts. Try all of them, score each against an English dictionary or
	chi-squared, and the answer usually pops out.

	## Atbash check

	Atbash is self-inverse. Decrypting once with `A↔Z, B↔Y, ...` is a one-line test
	that costs nothing.

	## Kasiski / Friedman (Vigenère key length)

	- Kasiski: repeated trigrams in the ciphertext often appear at distances
	that are multiples of the key length. The GCD of those distances suggests the
	key length.
	- Friedman: estimate key length from the IoC. Cipher Detective AI uses a
	combined "Vigenère indicator" score derived from these.

	## Transposition tells

	Letter frequencies look English (because letters are only rearranged), but
	common bigrams like `TH`, `HE`, `IN` are unusually rare. Rail-fence and
	columnar transposition both produce this signature.

	## Affine

	Affine = $E(x) = (a x + b) \mod 26$ with $\gcd(a, 26) = 1$. Only 12 valid `a`
	values × 26 `b` values = 312 keys. Brute-forceable; each candidate is scored
	against English.

	## Substitution

	If frequency analysis matches English shape but Caesar / Atbash fail, suspect a
	general monoalphabetic substitution. Solving it cleanly needs interactive
	hill-climbing over an English language model — outside the scope of this demo.

	## Reality check

	These signals work because classical ciphers leak structure. Modern symmetric
	encryption (AES-GCM, ChaCha20-Poly1305) and modern public-key cryptography do
	not leak any of this information; none of these techniques apply to them.