--- license: mit language: en library_name: pytorch tags: - cryptography - elliptic-curve - secp256k1 - bitcoin - negative-result - research pipeline_tag: tabular-classification --- # secp256k1 Parity Prediction — Negative-Result Study **TL;DR — This is a negative-result artifact.** Five model families were trained to predict the parity of the secp256k1 private key `k` from features of the public-key point `(X, Y)`. All converged to **exactly 50% accuracy** on held-out k's, consistent with the long-held cryptographic belief that even one bit of the discrete logarithm is as hard to recover as the whole key. > Do **not** use these models for any cryptographic application. They cannot predict the parity of `k`, by construction. ## Task Given a secp256k1 public-key point `Q = k·G = (X, Y)`, predict `k mod 2` (whether the underlying private scalar is odd or even). This is the "least-significant bit" of the discrete logarithm. ## Theoretical context (carefully stated) There is a body of work showing that individual bits of the discrete logarithm are "hardcore" — that predicting any single bit with non-negligible advantage implies efficiently solving the full discrete log: - **Blum & Micali (1984)** — introduced the framework of hardcore predicates and proved the MSB ("which half") of DL in Z*_p is hardcore. - **Long & Wigderson (1988)** — extended this to O(log log p) bits. - **Boneh & Venkatesan (1996)** — hardness of the most significant bits of Diffie–Hellman shared secrets. - **Håstad & Näslund (≈2004, J. ACM)** — *"The Security of all RSA and Discrete Log Bits"*. Proved every individual bit of the discrete log in Z*_p (and of the RSA function) is hardcore. These foundational results are for the **multiplicative group** Z*_p (and RSA), **not literally for elliptic curves**. The analogous statement is widely *accepted* and *expected* to hold for ECDLP on curves like secp256k1, and follows from the same proof techniques applied to the ECDLP setting, but a single canonical "ECDLP bit hardness" reference of the same standing as Håstad–Näslund is harder to point to cleanly. In practice this is the universally held view in the cryptographic community. The empirical result here is consistent with that belief: no architecture we tried beats 50%. ## Data - **Range:** `k = 1 … 1,000,000` (sequential) - **Generator:** the standard Bitcoin generator `G` - **Splits:** sequential by `k` — train 70% (k=1..700K), val 15% (k=700K..850K), holdout 15% (k=850K..1M). - **Inputs:** 44 engineered features of `X` and `Y` (decimal digit statistics, mod residues, bit-length, popcount, comparisons). Target column `k_state` and traceability column `k` are excluded from inputs. See `feature_cols.json` for the full input schema. ## Models | Model | Architecture | Params | Holdout acc | Holdout AUC | Train time | |---|---|---|---|---|---| | Logistic Regression | sklearn baseline | tiny | 0.4982 | 0.4983 | ~3 s | | XGBoost | 400 trees, depth 6 | — | 0.5033 | 0.5036 | ~3 s | | LightGBM | 400 trees, 63 leaves | — | 0.5002 | 0.4998 | ~11 s | | MLP | 4 layers, 512–512–256 | ~1M | 0.4997 | 0.5009 | ~8 s | | **BitTransformer** | 4-layer encoder, d=128, over raw 512-bit (X\|Y) | 859K | **0.5000** | **0.4998** | ~2.8 h | | Permutation sanity (XGBoost on shuffled labels) | — | — | 0.4994 | 0.4994 | — | All accuracies are statistically indistinguishable from 0.5. The BitTransformer specifically tested the hypothesis that bit-level nested XOR/AND patterns might leak the parity — they don't. ## Repository structure ``` . ├── README.md # this file ├── metrics.json # full per-model metrics ├── feature_cols.json # input feature schema ├── data/ │ └── features_sample_1k.parquet # 1K-row sample of training data ├── models/ │ ├── xgb.json # XGBoost │ ├── lgbm.txt # LightGBM │ ├── mlp.pt # PyTorch MLP state_dict │ └── bit_xformer.pt # PyTorch BitTransformer state_dict └── scripts/ ├── gen.py # generates feature parquet from k range ├── train.py # trains logreg / XGBoost / LightGBM / MLP ├── bit_transformer.py # trains the BitTransformer ├── predict.py # single-point inference ├── batch_eval.py # batch evaluation on fresh k range └── build_html.py # generates a result-table HTML ``` ## Reproducing ```bash # 1. install deps pip install numpy pandas pyarrow scikit-learn xgboost lightgbm torch huggingface_hub # 2. generate features (k=1..1M) python scripts/gen.py 1000000 # 3. train the fast models python scripts/train.py # 4. train the bit-transformer (takes ~3 h on an NVIDIA L4) python scripts/bit_transformer.py ``` Generation is single-threaded Python big-int arithmetic and is the wall-clock bottleneck (~5 min for 1M points). Training the fast models is seconds; the BitTransformer is hours due to sequence length 513 with 4 attention layers. ## Loading the models ```python # XGBoost import xgboost as xgb m = xgb.XGBClassifier(); m.load_model("models/xgb.json") # LightGBM import lightgbm as lgb m = lgb.Booster(model_file="models/lgbm.txt") # MLP import torch, torch.nn as nn D = 44 mlp = nn.Sequential(nn.Linear(D,512), nn.ReLU(), nn.Linear(512,512), nn.ReLU(), nn.Linear(512,256), nn.ReLU(), nn.Linear(256,1)) mlp.load_state_dict(torch.load("models/mlp.pt")) # BitTransformer class BitTransformer(nn.Module): def __init__(self, seq_len=512, d=128, nhead=4, nlayers=4): super().__init__() self.tok = nn.Embedding(2, d) self.pos = nn.Parameter(torch.randn(1, seq_len, d) * 0.02) self.cls = nn.Parameter(torch.randn(1, 1, d) * 0.02) enc = nn.TransformerEncoderLayer(d_model=d, nhead=nhead, dim_feedforward=4*d, batch_first=True, activation="gelu", norm_first=True) self.enc = nn.TransformerEncoder(enc, num_layers=nlayers) self.head = nn.Linear(d, 1) def forward(self, x_bits): h = self.tok(x_bits) + self.pos cls = self.cls.expand(h.size(0), -1, -1) h = torch.cat([cls, h], dim=1) h = self.enc(h) return self.head(h[:, 0, :]).squeeze(1) bx = BitTransformer() bx.load_state_dict(torch.load("models/bit_xformer.pt")) ``` ## Intended use - **Research:** confirming empirically that off-the-shelf ML cannot recover the LSB of a secp256k1 private key from its public-key coordinates. - **Education:** demonstrating how features that "look informative" (digit sums, mod residues, popcount) carry zero signal about cryptographic targets. - **Negative control / sanity check** for related ML-cryptanalysis projects. ## Out-of-scope use - Any actual cryptographic key-recovery attempt. These models cannot do it; the underlying theory says no efficient model can. - Any conclusion about non-curve-based ciphers or curves other than secp256k1. ## Hardware & cost - Trained on a single GCP `g2-standard-8` instance with 1× NVIDIA L4 GPU. - Total wall-clock: ~3 hours (mostly the BitTransformer). - Total cost: ~$5 USD. ## License MIT. ## Citation If you reference this in research, please cite by the repo URL.