secp256k1 Parity Prediction β Negative-Result Study
TL;DR β This is a negative-result artifact. Five model families were trained to predict the parity of the secp256k1 private key k from features of the public-key point (X, Y). All converged to exactly 50% accuracy on held-out k's, consistent with the long-held cryptographic belief that even one bit of the discrete logarithm is as hard to recover as the whole key.
Do not use these models for any cryptographic application. They cannot predict the parity of
k, by construction.
Task
Given a secp256k1 public-key point Q = kΒ·G = (X, Y), predict k mod 2 (whether the underlying private scalar is odd or even).
This is the "least-significant bit" of the discrete logarithm.
Theoretical context (carefully stated)
There is a body of work showing that individual bits of the discrete logarithm are "hardcore" β that predicting any single bit with non-negligible advantage implies efficiently solving the full discrete log:
- Blum & Micali (1984) β introduced the framework of hardcore predicates and proved the MSB ("which half") of DL in Z*_p is hardcore.
- Long & Wigderson (1988) β extended this to O(log log p) bits.
- Boneh & Venkatesan (1996) β hardness of the most significant bits of DiffieβHellman shared secrets.
- HΓ₯stad & NΓ€slund (β2004, J. ACM) β "The Security of all RSA and Discrete Log Bits". Proved every individual bit of the discrete log in Z*_p (and of the RSA function) is hardcore.
These foundational results are for the multiplicative group Z*_p (and RSA), not literally for elliptic curves. The analogous statement is widely accepted and expected to hold for ECDLP on curves like secp256k1, and follows from the same proof techniques applied to the ECDLP setting, but a single canonical "ECDLP bit hardness" reference of the same standing as HΓ₯stadβNΓ€slund is harder to point to cleanly. In practice this is the universally held view in the cryptographic community.
The empirical result here is consistent with that belief: no architecture we tried beats 50%.
Data
- Range:
k = 1 β¦ 1,000,000(sequential) - Generator: the standard Bitcoin generator
G - Splits: sequential by
kβ train 70% (k=1..700K), val 15% (k=700K..850K), holdout 15% (k=850K..1M). - Inputs: 44 engineered features of
XandY(decimal digit statistics, mod residues, bit-length, popcount, comparisons). Target columnk_stateand traceability columnkare excluded from inputs.
See feature_cols.json for the full input schema.
Models
| Model | Architecture | Params | Holdout acc | Holdout AUC | Train time |
|---|---|---|---|---|---|
| Logistic Regression | sklearn baseline | tiny | 0.4982 | 0.4983 | ~3 s |
| XGBoost | 400 trees, depth 6 | β | 0.5033 | 0.5036 | ~3 s |
| LightGBM | 400 trees, 63 leaves | β | 0.5002 | 0.4998 | ~11 s |
| MLP | 4 layers, 512β512β256 | ~1M | 0.4997 | 0.5009 | ~8 s |
| BitTransformer | 4-layer encoder, d=128, over raw 512-bit (X|Y) | 859K | 0.5000 | 0.4998 | ~2.8 h |
| Permutation sanity (XGBoost on shuffled labels) | β | β | 0.4994 | 0.4994 | β |
All accuracies are statistically indistinguishable from 0.5. The BitTransformer specifically tested the hypothesis that bit-level nested XOR/AND patterns might leak the parity β they don't.
Repository structure
.
βββ README.md # this file
βββ metrics.json # full per-model metrics
βββ feature_cols.json # input feature schema
βββ data/
β βββ features_sample_1k.parquet # 1K-row sample of training data
βββ models/
β βββ xgb.json # XGBoost
β βββ lgbm.txt # LightGBM
β βββ mlp.pt # PyTorch MLP state_dict
β βββ bit_xformer.pt # PyTorch BitTransformer state_dict
βββ scripts/
βββ gen.py # generates feature parquet from k range
βββ train.py # trains logreg / XGBoost / LightGBM / MLP
βββ bit_transformer.py # trains the BitTransformer
βββ predict.py # single-point inference
βββ batch_eval.py # batch evaluation on fresh k range
βββ build_html.py # generates a result-table HTML
Reproducing
# 1. install deps
pip install numpy pandas pyarrow scikit-learn xgboost lightgbm torch huggingface_hub
# 2. generate features (k=1..1M)
python scripts/gen.py 1000000
# 3. train the fast models
python scripts/train.py
# 4. train the bit-transformer (takes ~3 h on an NVIDIA L4)
python scripts/bit_transformer.py
Generation is single-threaded Python big-int arithmetic and is the wall-clock bottleneck (~5 min for 1M points). Training the fast models is seconds; the BitTransformer is hours due to sequence length 513 with 4 attention layers.
Loading the models
# XGBoost
import xgboost as xgb
m = xgb.XGBClassifier(); m.load_model("models/xgb.json")
# LightGBM
import lightgbm as lgb
m = lgb.Booster(model_file="models/lgbm.txt")
# MLP
import torch, torch.nn as nn
D = 44
mlp = nn.Sequential(nn.Linear(D,512), nn.ReLU(),
nn.Linear(512,512), nn.ReLU(),
nn.Linear(512,256), nn.ReLU(),
nn.Linear(256,1))
mlp.load_state_dict(torch.load("models/mlp.pt"))
# BitTransformer
class BitTransformer(nn.Module):
def __init__(self, seq_len=512, d=128, nhead=4, nlayers=4):
super().__init__()
self.tok = nn.Embedding(2, d)
self.pos = nn.Parameter(torch.randn(1, seq_len, d) * 0.02)
self.cls = nn.Parameter(torch.randn(1, 1, d) * 0.02)
enc = nn.TransformerEncoderLayer(d_model=d, nhead=nhead, dim_feedforward=4*d,
batch_first=True, activation="gelu", norm_first=True)
self.enc = nn.TransformerEncoder(enc, num_layers=nlayers)
self.head = nn.Linear(d, 1)
def forward(self, x_bits):
h = self.tok(x_bits) + self.pos
cls = self.cls.expand(h.size(0), -1, -1)
h = torch.cat([cls, h], dim=1)
h = self.enc(h)
return self.head(h[:, 0, :]).squeeze(1)
bx = BitTransformer()
bx.load_state_dict(torch.load("models/bit_xformer.pt"))
Intended use
- Research: confirming empirically that off-the-shelf ML cannot recover the LSB of a secp256k1 private key from its public-key coordinates.
- Education: demonstrating how features that "look informative" (digit sums, mod residues, popcount) carry zero signal about cryptographic targets.
- Negative control / sanity check for related ML-cryptanalysis projects.
Out-of-scope use
- Any actual cryptographic key-recovery attempt. These models cannot do it; the underlying theory says no efficient model can.
- Any conclusion about non-curve-based ciphers or curves other than secp256k1.
Hardware & cost
- Trained on a single GCP
g2-standard-8instance with 1Γ NVIDIA L4 GPU. - Total wall-clock: ~3 hours (mostly the BitTransformer).
- Total cost: ~$5 USD.
License
MIT.
Citation
If you reference this in research, please cite by the repo URL.