Correct theoretical-foundations section: distinguish multiplicative-group DL bit-hardness (Hastad-Naslund, J.ACM ~2004) from ECDLP, which inherits the result by analogy but is not literally that paper.

30219a6 verified 7 days ago

preview code

raw

history blame contribute delete

7.44 kB

	---
	license: mit
	language: en
	library_name: pytorch
	tags:
	- cryptography
	- elliptic-curve
	- secp256k1
	- bitcoin
	- negative-result
	- research
	pipeline_tag: tabular-classification
	---

	# secp256k1 Parity Prediction — Negative-Result Study

	TL;DR — This is a negative-result artifact. Five model families were trained to predict the parity of the secp256k1 private key `k` from features of the public-key point `(X, Y)`. All converged to exactly 50% accuracy on held-out k's, consistent with the long-held cryptographic belief that even one bit of the discrete logarithm is as hard to recover as the whole key.

	> Do not use these models for any cryptographic application. They cannot predict the parity of `k`, by construction.

	## Task

	Given a secp256k1 public-key point `Q = k·G = (X, Y)`, predict `k mod 2` (whether the underlying private scalar is odd or even).

	This is the "least-significant bit" of the discrete logarithm.

	## Theoretical context (carefully stated)

	There is a body of work showing that individual bits of the discrete logarithm are "hardcore" — that predicting any single bit with non-negligible advantage implies efficiently solving the full discrete log:

	- Blum & Micali (1984) — introduced the framework of hardcore predicates and proved the MSB ("which half") of DL in Z*_p is hardcore.
	- Long & Wigderson (1988) — extended this to O(log log p) bits.
	- Boneh & Venkatesan (1996) — hardness of the most significant bits of Diffie–Hellman shared secrets.
	- Håstad & Näslund (≈2004, J. ACM) — "The Security of all RSA and Discrete Log Bits". Proved every individual bit of the discrete log in Z*_p (and of the RSA function) is hardcore.

	These foundational results are for the multiplicative group Z_p (and RSA), not literally for elliptic curves. The analogous statement is widely accepted* and expected to hold for ECDLP on curves like secp256k1, and follows from the same proof techniques applied to the ECDLP setting, but a single canonical "ECDLP bit hardness" reference of the same standing as Håstad–Näslund is harder to point to cleanly. In practice this is the universally held view in the cryptographic community.

	The empirical result here is consistent with that belief: no architecture we tried beats 50%.

	## Data

	- Range: `k = 1 … 1,000,000` (sequential)
	- Generator: the standard Bitcoin generator `G`
	- Splits: sequential by `k` — train 70% (k=1..700K), val 15% (k=700K..850K), holdout 15% (k=850K..1M).
	- Inputs: 44 engineered features of `X` and `Y` (decimal digit statistics, mod residues, bit-length, popcount, comparisons). Target column `k_state` and traceability column `k` are excluded from inputs.

	See `feature_cols.json` for the full input schema.

	## Models

	\| Model \| Architecture \| Params \| Holdout acc \| Holdout AUC \| Train time \|
	\|---\|---\|---\|---\|---\|---\|
	\| Logistic Regression \| sklearn baseline \| tiny \| 0.4982 \| 0.4983 \| ~3 s \|
	\| XGBoost \| 400 trees, depth 6 \| — \| 0.5033 \| 0.5036 \| ~3 s \|
	\| LightGBM \| 400 trees, 63 leaves \| — \| 0.5002 \| 0.4998 \| ~11 s \|
	\| MLP \| 4 layers, 512–512–256 \| ~1M \| 0.4997 \| 0.5009 \| ~8 s \|
	\| BitTransformer \| 4-layer encoder, d=128, over raw 512-bit (X\\|Y) \| 859K \| 0.5000 \| 0.4998 \| ~2.8 h \|
	\| Permutation sanity (XGBoost on shuffled labels) \| — \| — \| 0.4994 \| 0.4994 \| — \|

	All accuracies are statistically indistinguishable from 0.5. The BitTransformer specifically tested the hypothesis that bit-level nested XOR/AND patterns might leak the parity — they don't.

	## Repository structure

	```
	.
	├── README.md # this file
	├── metrics.json # full per-model metrics
	├── feature_cols.json # input feature schema
	├── data/
	│ └── features_sample_1k.parquet # 1K-row sample of training data
	├── models/
	│ ├── xgb.json # XGBoost
	│ ├── lgbm.txt # LightGBM
	│ ├── mlp.pt # PyTorch MLP state_dict
	│ └── bit_xformer.pt # PyTorch BitTransformer state_dict
	└── scripts/
	├── gen.py # generates feature parquet from k range
	├── train.py # trains logreg / XGBoost / LightGBM / MLP
	├── bit_transformer.py # trains the BitTransformer
	├── predict.py # single-point inference
	├── batch_eval.py # batch evaluation on fresh k range
	└── build_html.py # generates a result-table HTML
	```

	## Reproducing

	```bash
	# 1. install deps
	pip install numpy pandas pyarrow scikit-learn xgboost lightgbm torch huggingface_hub

	# 2. generate features (k=1..1M)
	python scripts/gen.py 1000000

	# 3. train the fast models
	python scripts/train.py

	# 4. train the bit-transformer (takes ~3 h on an NVIDIA L4)
	python scripts/bit_transformer.py
	```

	Generation is single-threaded Python big-int arithmetic and is the wall-clock bottleneck (~5 min for 1M points). Training the fast models is seconds; the BitTransformer is hours due to sequence length 513 with 4 attention layers.

	## Loading the models

	```python
	# XGBoost
	import xgboost as xgb
	m = xgb.XGBClassifier(); m.load_model("models/xgb.json")

	# LightGBM
	import lightgbm as lgb
	m = lgb.Booster(model_file="models/lgbm.txt")

	# MLP
	import torch, torch.nn as nn
	D = 44
	mlp = nn.Sequential(nn.Linear(D,512), nn.ReLU(),
	nn.Linear(512,512), nn.ReLU(),
	nn.Linear(512,256), nn.ReLU(),
	nn.Linear(256,1))
	mlp.load_state_dict(torch.load("models/mlp.pt"))

	# BitTransformer
	class BitTransformer(nn.Module):
	def __init__(self, seq_len=512, d=128, nhead=4, nlayers=4):
	super().__init__()
	self.tok = nn.Embedding(2, d)
	self.pos = nn.Parameter(torch.randn(1, seq_len, d) * 0.02)
	self.cls = nn.Parameter(torch.randn(1, 1, d) * 0.02)
	enc = nn.TransformerEncoderLayer(d_model=d, nhead=nhead, dim_feedforward=4*d,
	batch_first=True, activation="gelu", norm_first=True)
	self.enc = nn.TransformerEncoder(enc, num_layers=nlayers)
	self.head = nn.Linear(d, 1)
	def forward(self, x_bits):
	h = self.tok(x_bits) + self.pos
	cls = self.cls.expand(h.size(0), -1, -1)
	h = torch.cat([cls, h], dim=1)
	h = self.enc(h)
	return self.head(h[:, 0, :]).squeeze(1)
	bx = BitTransformer()
	bx.load_state_dict(torch.load("models/bit_xformer.pt"))
	```

	## Intended use

	- Research: confirming empirically that off-the-shelf ML cannot recover the LSB of a secp256k1 private key from its public-key coordinates.
	- Education: demonstrating how features that "look informative" (digit sums, mod residues, popcount) carry zero signal about cryptographic targets.
	- Negative control / sanity check for related ML-cryptanalysis projects.

	## Out-of-scope use

	- Any actual cryptographic key-recovery attempt. These models cannot do it; the underlying theory says no efficient model can.
	- Any conclusion about non-curve-based ciphers or curves other than secp256k1.

	## Hardware & cost

	- Trained on a single GCP `g2-standard-8` instance with 1× NVIDIA L4 GPU.
	- Total wall-clock: ~3 hours (mostly the BitTransformer).
	- Total cost: ~$5 USD.

	## License

	MIT.

	## Citation

	If you reference this in research, please cite by the repo URL.