MLGraph-Bitcoin-GAD / README.md
thanhphxu's picture
Upload folder using huggingface_hub
d7b8193 verified

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: Bitcoin Abuse Scoring (GAT / GATv2)
emoji: 🧭
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

Bitcoin Abuse Scoring (GAT / GATv2) β€” Hugging Face Space

This Space builds an ego-subgraph from a given Bitcoin transaction hash (k steps backward & forward), then runs two pretrained GNN models (GAT baseline & GATv2 enhanced) trained on Elliptic to score whether the center transaction is abuse.

βœ… Features

  • Data sources (public JSON APIs, no scraping): mempool.space / blockstream.info (Esplora), fallback to Blockchair (optional key).
  • Ego-subgraph expansion k ∈ {1,2,3} (both parents & children).
  • Graph safeguards: MAX_NODES & MAX_EDGES to avoid explosion.
  • Node features: degree stats, value sums/logs, counts, ratio, distance-to-center, block height.
  • Standardized features (on-the-fly). If your model used different features/scaler, set USE_FEATURE_ADAPTER=true (default) β€” it inserts a Linear projection to the expected input dimension (165 by default).
  • Two models are loaded from Hugging Face Hub with thresholds (via thresholds.json or fallback 0.5).
  • Rate limit: 20 requests/min globally (sliding window).
  • Visualizations: ego-graph (pyvis HTML) & histogram of scores per model.
  • CPU-only deployment on Spaces.

πŸ”§ Configuration

Set these Environment Variables (Space β†’ Settings β†’ Variables):

HF_GAT_BASELINE_REPO=org/name_gat_baseline
HF_GATV2_REPO=org/name_gatv2

# (Optional overrides)
IN_CHANNELS=165
HIDDEN_CHANNELS=128
HEADS=8
NUM_BLOCKS=2
DROPOUT=0.5

DATA_PROVIDER=mempool    # mempool | blockstream | blockchair
HTTP_TIMEOUT=10
HTTP_RETRIES=2
MAX_NODES=5000
MAX_EDGES=15000
USE_FEATURE_ADAPTER=true
DEFAULT_THRESHOLD=0.5
QUEUE_CONCURRENCY=2
BLOCKCHAIR_API_KEY=

Each model repo should contain:

  • model.pt β€” PyTorch Geometric weights.
  • (optional) thresholds.json with a key like {"threshold": 0.42}.
  • (optional) scaler.joblib if you want to reuse the training scaler.

πŸ“¦ API Usage in App

  • GET /api/tx/{txid} and GET /api/tx/{txid}/outspends (Esplora).
  • GET /bitcoin/dashboards/transaction/{txid} (Blockchair).

All calls have timeouts & retries and use a small in-memory cache.

🚦 Rate Limiting

Global limit 20 req/min across the app (sliding window). Exceeding returns Rate limit exceeded (20 req/min).

πŸ§ͺ Acceptance Criteria

  • Enter a valid tx hash & k=2 β†’ ego-graph is built, both models run, and the app displays:
    • probability, threshold, label for GAT and GATv2,
    • counts of nodes/edges and notes (e.g., FeatureAdapter used).
  • Ego-graph renders with center highlighted; tooltips show txid and score.
  • If the first provider fails, the app falls back.
  • If graph exceeds safeguards, the app stops expansion and warns in logs (but still infers with what it has).

⚠️ Notes

  • Domain shift: Features from on-chain crawls can differ from Elliptic; use the adapter and consider fine-tuning for production.
  • Public APIs have their own rate limits β€” this app is conservative with requests, but heavy usage may still hit external limits.
  • Input is validated to be a 64-hex txid. No arbitrary URLs are accepted.