File size: 7,442 Bytes
6b93c3b 30219a6 6b93c3b 30219a6 6b93c3b 30219a6 6b93c3b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | ---
license: mit
language: en
library_name: pytorch
tags:
- cryptography
- elliptic-curve
- secp256k1
- bitcoin
- negative-result
- research
pipeline_tag: tabular-classification
---
# secp256k1 Parity Prediction β Negative-Result Study
**TL;DR β This is a negative-result artifact.** Five model families were trained to predict the parity of the secp256k1 private key `k` from features of the public-key point `(X, Y)`. All converged to **exactly 50% accuracy** on held-out k's, consistent with the long-held cryptographic belief that even one bit of the discrete logarithm is as hard to recover as the whole key.
> Do **not** use these models for any cryptographic application. They cannot predict the parity of `k`, by construction.
## Task
Given a secp256k1 public-key point `Q = kΒ·G = (X, Y)`, predict `k mod 2` (whether the underlying private scalar is odd or even).
This is the "least-significant bit" of the discrete logarithm.
## Theoretical context (carefully stated)
There is a body of work showing that individual bits of the discrete logarithm are "hardcore" β that predicting any single bit with non-negligible advantage implies efficiently solving the full discrete log:
- **Blum & Micali (1984)** β introduced the framework of hardcore predicates and proved the MSB ("which half") of DL in Z*_p is hardcore.
- **Long & Wigderson (1988)** β extended this to O(log log p) bits.
- **Boneh & Venkatesan (1996)** β hardness of the most significant bits of DiffieβHellman shared secrets.
- **HΓ₯stad & NΓ€slund (β2004, J. ACM)** β *"The Security of all RSA and Discrete Log Bits"*. Proved every individual bit of the discrete log in Z*_p (and of the RSA function) is hardcore.
These foundational results are for the **multiplicative group** Z*_p (and RSA), **not literally for elliptic curves**. The analogous statement is widely *accepted* and *expected* to hold for ECDLP on curves like secp256k1, and follows from the same proof techniques applied to the ECDLP setting, but a single canonical "ECDLP bit hardness" reference of the same standing as HΓ₯stadβNΓ€slund is harder to point to cleanly. In practice this is the universally held view in the cryptographic community.
The empirical result here is consistent with that belief: no architecture we tried beats 50%.
## Data
- **Range:** `k = 1 β¦ 1,000,000` (sequential)
- **Generator:** the standard Bitcoin generator `G`
- **Splits:** sequential by `k` β train 70% (k=1..700K), val 15% (k=700K..850K), holdout 15% (k=850K..1M).
- **Inputs:** 44 engineered features of `X` and `Y` (decimal digit statistics, mod residues, bit-length, popcount, comparisons). Target column `k_state` and traceability column `k` are excluded from inputs.
See `feature_cols.json` for the full input schema.
## Models
| Model | Architecture | Params | Holdout acc | Holdout AUC | Train time |
|---|---|---|---|---|---|
| Logistic Regression | sklearn baseline | tiny | 0.4982 | 0.4983 | ~3 s |
| XGBoost | 400 trees, depth 6 | β | 0.5033 | 0.5036 | ~3 s |
| LightGBM | 400 trees, 63 leaves | β | 0.5002 | 0.4998 | ~11 s |
| MLP | 4 layers, 512β512β256 | ~1M | 0.4997 | 0.5009 | ~8 s |
| **BitTransformer** | 4-layer encoder, d=128, over raw 512-bit (X\|Y) | 859K | **0.5000** | **0.4998** | ~2.8 h |
| Permutation sanity (XGBoost on shuffled labels) | β | β | 0.4994 | 0.4994 | β |
All accuracies are statistically indistinguishable from 0.5. The BitTransformer specifically tested the hypothesis that bit-level nested XOR/AND patterns might leak the parity β they don't.
## Repository structure
```
.
βββ README.md # this file
βββ metrics.json # full per-model metrics
βββ feature_cols.json # input feature schema
βββ data/
β βββ features_sample_1k.parquet # 1K-row sample of training data
βββ models/
β βββ xgb.json # XGBoost
β βββ lgbm.txt # LightGBM
β βββ mlp.pt # PyTorch MLP state_dict
β βββ bit_xformer.pt # PyTorch BitTransformer state_dict
βββ scripts/
βββ gen.py # generates feature parquet from k range
βββ train.py # trains logreg / XGBoost / LightGBM / MLP
βββ bit_transformer.py # trains the BitTransformer
βββ predict.py # single-point inference
βββ batch_eval.py # batch evaluation on fresh k range
βββ build_html.py # generates a result-table HTML
```
## Reproducing
```bash
# 1. install deps
pip install numpy pandas pyarrow scikit-learn xgboost lightgbm torch huggingface_hub
# 2. generate features (k=1..1M)
python scripts/gen.py 1000000
# 3. train the fast models
python scripts/train.py
# 4. train the bit-transformer (takes ~3 h on an NVIDIA L4)
python scripts/bit_transformer.py
```
Generation is single-threaded Python big-int arithmetic and is the wall-clock bottleneck (~5 min for 1M points). Training the fast models is seconds; the BitTransformer is hours due to sequence length 513 with 4 attention layers.
## Loading the models
```python
# XGBoost
import xgboost as xgb
m = xgb.XGBClassifier(); m.load_model("models/xgb.json")
# LightGBM
import lightgbm as lgb
m = lgb.Booster(model_file="models/lgbm.txt")
# MLP
import torch, torch.nn as nn
D = 44
mlp = nn.Sequential(nn.Linear(D,512), nn.ReLU(),
nn.Linear(512,512), nn.ReLU(),
nn.Linear(512,256), nn.ReLU(),
nn.Linear(256,1))
mlp.load_state_dict(torch.load("models/mlp.pt"))
# BitTransformer
class BitTransformer(nn.Module):
def __init__(self, seq_len=512, d=128, nhead=4, nlayers=4):
super().__init__()
self.tok = nn.Embedding(2, d)
self.pos = nn.Parameter(torch.randn(1, seq_len, d) * 0.02)
self.cls = nn.Parameter(torch.randn(1, 1, d) * 0.02)
enc = nn.TransformerEncoderLayer(d_model=d, nhead=nhead, dim_feedforward=4*d,
batch_first=True, activation="gelu", norm_first=True)
self.enc = nn.TransformerEncoder(enc, num_layers=nlayers)
self.head = nn.Linear(d, 1)
def forward(self, x_bits):
h = self.tok(x_bits) + self.pos
cls = self.cls.expand(h.size(0), -1, -1)
h = torch.cat([cls, h], dim=1)
h = self.enc(h)
return self.head(h[:, 0, :]).squeeze(1)
bx = BitTransformer()
bx.load_state_dict(torch.load("models/bit_xformer.pt"))
```
## Intended use
- **Research:** confirming empirically that off-the-shelf ML cannot recover the LSB of a secp256k1 private key from its public-key coordinates.
- **Education:** demonstrating how features that "look informative" (digit sums, mod residues, popcount) carry zero signal about cryptographic targets.
- **Negative control / sanity check** for related ML-cryptanalysis projects.
## Out-of-scope use
- Any actual cryptographic key-recovery attempt. These models cannot do it; the underlying theory says no efficient model can.
- Any conclusion about non-curve-based ciphers or curves other than secp256k1.
## Hardware & cost
- Trained on a single GCP `g2-standard-8` instance with 1Γ NVIDIA L4 GPU.
- Total wall-clock: ~3 hours (mostly the BitTransformer).
- Total cost: ~$5 USD.
## License
MIT.
## Citation
If you reference this in research, please cite by the repo URL.
|