Spaces:

thanhphxu
/

MLGraph-Bitcoin-GAD

Build error

App Files Files Community

thanhphxu commited on Oct 25, 2025

Commit

db886e4

verified ·

1 Parent(s): d15e63a

Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

.gitignore +1 -0
README.md +72 -13
app.py +137 -0
config.py +41 -0
explorers.py +189 -0
features.py +90 -0
graph_builder.py +132 -0
inference.py +109 -0
models.py +78 -0
rate_limit.py +34 -0
requirements.txt +28 -0
viz.py +44 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .venv/**

README.md CHANGED Viewed

@@ -1,13 +1,72 @@
----
-title: MLGraph Bitcoin GAD
-emoji: 🏢
-colorFrom: yellow
-colorTo: blue
-sdk: gradio
-sdk_version: 5.49.1
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Bitcoin Abuse Scoring (GAT / GATv2) — Hugging Face Space
+This Space builds an **ego-subgraph** from a given Bitcoin transaction hash (`k` steps backward & forward), then runs **two pretrained GNN models** (GAT baseline & GATv2 enhanced) trained on **Elliptic** to score whether the center transaction is *abuse*.
+## ✅ Features
+- Data sources (public JSON APIs, no scraping): `mempool.space` / `blockstream.info` (Esplora), fallback to `Blockchair` (optional key).
+- Ego-subgraph expansion **k ∈ {1,2,3}** (both parents & children).
+- Graph safeguards: `MAX_NODES` & `MAX_EDGES` to avoid explosion.
+- Node features: degree stats, value sums/logs, counts, ratio, distance-to-center, block height.
+- Standardized features (on-the-fly). If your model used different features/scaler, set `USE_FEATURE_ADAPTER=true` (default) — it inserts a `Linear` projection to the expected input dimension (165 by default).
+- Two models are loaded from **Hugging Face Hub** with thresholds (via `thresholds.json` or fallback `0.5`).
+- **Rate limit**: 20 requests/min globally (sliding window).
+- Visualizations: **ego-graph (pyvis HTML)** & **histogram of scores** per model.
+- CPU-only deployment on Spaces.
+## 🔧 Configuration
+Set these **Environment Variables** (Space → Settings → Variables):
+```
+HF_GAT_BASELINE_REPO=org/name_gat_baseline
+HF_GATV2_REPO=org/name_gatv2
+# (Optional overrides)
+IN_CHANNELS=165
+HIDDEN_CHANNELS=128
+HEADS=8
+NUM_BLOCKS=2
+DROPOUT=0.5
+DATA_PROVIDER=mempool    # mempool | blockstream | blockchair
+HTTP_TIMEOUT=10
+HTTP_RETRIES=2
+MAX_NODES=5000
+MAX_EDGES=15000
+USE_FEATURE_ADAPTER=true
+DEFAULT_THRESHOLD=0.5
+QUEUE_CONCURRENCY=2
+BLOCKCHAIR_API_KEY=
+```
+Each model repo should contain:
+- `model.pt` — PyTorch Geometric weights.
+- (optional) `thresholds.json` with a key like `{"threshold": 0.42}`.
+- (optional) `scaler.joblib` if you want to reuse the training scaler.
+## 📦 API Usage in App
+- `GET /api/tx/{txid}` and `GET /api/tx/{txid}/outspends` (Esplora).
+- `GET /bitcoin/dashboards/transaction/{txid}` (Blockchair).
+All calls have **timeouts & retries** and use a small **in-memory cache**.
+## 🚦 Rate Limiting
+Global limit `20 req/min` across the app (sliding window). Exceeding returns `Rate limit exceeded (20 req/min)`.
+## 🧪 Acceptance Criteria
+- Enter a valid tx hash & `k=2` → ego-graph is built, both models run, and the app displays:
+  - `probability`, `threshold`, `label` for **GAT** and **GATv2**,
+  - counts of nodes/edges and notes (e.g., *FeatureAdapter used*).
+- Ego-graph renders with center highlighted; tooltips show txid and score.
+- If the first provider fails, the app falls back.
+- If graph exceeds safeguards, the app stops expansion and warns in logs (but still infers with what it has).
+## ⚠️ Notes
+- **Domain shift**: Features from on-chain crawls can differ from Elliptic; use the adapter and consider fine-tuning for production.
+- Public APIs have their own rate limits — this app is conservative with requests, but heavy usage may still hit external limits.
+- Input is validated to be a 64-hex txid. No arbitrary URLs are accepted.

app.py ADDED Viewed

	@@ -0,0 +1,137 @@

+import re
+import traceback
+from typing import Tuple, List, Dict, Any
+import gradio as gr
+import numpy as np
+import pandas as pd
+import torch
+from torch_geometric.data import Data
+from torch_geometric.utils import coalesce
+from config import AppConfig
+from rate_limit import GlobalRateLimiter, RateLimitExceeded
+from explorers import new_client, fetch_with_fallback
+from graph_builder import expand_ego
+from features import build_features, scale_features
+from inference import load_models, run_for_both_models
+from viz import render_ego_html, histogram_scores
+CFG = AppConfig()
+RATE = GlobalRateLimiter(CFG.MAX_CALLS_PER_MIN, CFG.WINDOW_SECONDS)
+_models_cache = None
+def _is_valid_txid(tx: str) -> bool:
+    return bool(re.fullmatch(r"[0-9a-fA-F]{64}", tx or ""))
+def _load_models_once():
+    global _models_cache
+    if _models_cache is None:
+        _models_cache = load_models(CFG)
+    return _models_cache
+@RATE.enforce()
+def handle_run(tx_hash: str, k: int, provider: str):
+    logs = []
+    try:
+        if not _is_valid_txid(tx_hash):
+            return None, None, "❌ Invalid txid. Please enter a 64-hex transaction hash.", None
+        k = int(k)
+        if k < 1 or k > 3:
+            return None, None, "❌ k must be in {1,2,3}.", None
+        logs.append(f"Fetching ego-subgraph for {tx_hash} with k={k} via {provider}…")
+        nodes, edges, center_idx, node_meta, gb_logs = expand_ego(tx_hash, k, provider, CFG)
+        logs.extend(gb_logs or [])
+        if center_idx < 0 or len(nodes) == 0:
+            return None, None, "❌ Failed to build subgraph (see logs).", "\n".join(logs)
+        if len(edges) == 0:
+            logs.append("⚠️ No edges in ego-graph; proceeding with single-node graph.")
+        # Build features
+        X, feat_names = build_features(nodes, edges, center_idx, node_meta)
+        Xs, scaler_used, scale_note = scale_features(X, scaler=None)  # can inject scaler from model repo if desired
+        logs.append(scale_note)
+        # PyG Data
+        if len(edges) > 0:
+            edge_index = torch.tensor(np.array(edges).T, dtype=torch.long)  # shape [2,E]
+        else:
+            edge_index = torch.empty((2,0), dtype=torch.long)
+        edge_index = coalesce(edge_index)
+        data = Data(
+            x=torch.tensor(Xs, dtype=torch.float32),
+            edge_index=edge_index
+        )
+        bundles = _load_models_once()
+        results = run_for_both_models(bundles, data, center_idx, CFG)
+        # Compose output table
+        records = []
+        for name, probs, thr, label, note in results:
+            rec = {
+                "tx_hash": tx_hash,
+                "model_name": name,
+                "probability": float(probs[center_idx]),
+                "threshold": float(thr),
+                "pred_label": int(label),
+                "k_used": int(k),
+                "num_nodes": int(len(nodes)),
+                "num_edges": int(len(edges)),
+                "note": note
+            }
+            records.append(rec)
+        df = pd.DataFrame(records)
+        # Visuals (two HTML ego-graphs)
+        html_gat = render_ego_html(nodes, edges, center_idx, scores=results[0][1])
+        html_gatv2 = render_ego_html(nodes, edges, center_idx, scores=results[1][1])
+        # Histogram of scores for the subgraph
+        fig_hist_gat = histogram_scores(results[0][1], title="Scores (GAT)")
+        fig_hist_v2 = histogram_scores(results[1][1], title="Scores (GATv2)")
+        log_text = "\n".join(logs)
+        return df, html_gat, html_gatv2, log_text, fig_hist_gat, fig_hist_v2
+    except RateLimitExceeded as e:
+        return None, None, None, f"❌ {e}", None, None
+    except Exception as e:
+        tb = traceback.format_exc()
+        return None, None, None, f"❌ Error: {e}\n\n{tb}", None, None
+with gr.Blocks(fill_height=True, theme=gr.themes.Soft()) as app:
+    gr.Markdown("## 🧭 Bitcoin Abuse Scoring (GAT / GATv2)\nEnter a transaction hash and k (1–3). The app builds an ego-subgraph from on-chain data and returns model scores.")
+    with gr.Row():
+        tx_in = gr.Textbox(label="Transaction Hash (64-hex)", placeholder="e.g., 4d3c... (64 hex)")
+        k_in = gr.Slider(1, 3, value=2, step=1, label="k (steps before/after)")
+        provider_in = gr.Dropdown(choices=["mempool", "blockstream", "blockchair"], value="mempool", label="Data Source")
+    run_btn = gr.Button("Run", variant="primary")
+    with gr.Row():
+        out_table = gr.Dataframe(label="Results (GAT vs GATv2)", interactive=False)
+    with gr.Tabs():
+        with gr.Tab("Ego-graph (GAT)"):
+            out_html_gat = gr.HTML()
+            out_hist_gat = gr.Plot(label="Score histogram (GAT)")
+        with gr.Tab("Ego-graph (GATv2)"):
+            out_html_gatv2 = gr.HTML()
+            out_hist_gatv2 = gr.Plot(label="Score histogram (GATv2)")
+    out_logs = gr.Textbox(label="Logs", lines=8)
+    run_btn.click(
+        handle_run,
+        inputs=[tx_in, k_in, provider_in],
+        outputs=[out_table, out_html_gat, out_html_gatv2, out_logs, out_hist_gat, out_hist_gatv2]
+    )
+app.queue(concurrency_count=CFG.QUEUE_CONCURRENCY, max_size=32)
+if __name__ == "__main__":
+    app.launch(server_name="0.0.0.0", server_port=7860)

config.py ADDED Viewed

	@@ -0,0 +1,41 @@

+import os
+from dataclasses import dataclass
+@dataclass
+class AppConfig:
+    # --- Hugging Face model repos (set via environment variables on Spaces) ---
+    HF_GAT_BASELINE_REPO: str = os.getenv("HF_GAT_BASELINE_REPO", "org/name_gat_baseline")
+    HF_GATV2_REPO: str = os.getenv("HF_GATV2_REPO", "org/name_gatv2")
+    # Expected input dim of Elliptic-trained models (given by user)
+    IN_CHANNELS: int = int(os.getenv("IN_CHANNELS", "165"))
+    HIDDEN_CHANNELS: int = int(os.getenv("HIDDEN_CHANNELS", "128"))
+    HEADS: int = int(os.getenv("HEADS", "8"))
+    NUM_BLOCKS: int = int(os.getenv("NUM_BLOCKS", "2"))
+    DROPOUT: float = float(os.getenv("DROPOUT", "0.5"))
+    # Data providers
+    DATA_PROVIDER: str = os.getenv("DATA_PROVIDER", "mempool")  # mempool | blockstream | blockchair
+    HTTP_TIMEOUT: int = int(os.getenv("HTTP_TIMEOUT", "10"))
+    HTTP_RETRIES: int = int(os.getenv("HTTP_RETRIES", "2"))
+    # Graph limits (safeguard)
+    MAX_NODES: int = int(os.getenv("MAX_NODES", "5000"))
+    MAX_EDGES: int = int(os.getenv("MAX_EDGES", "15000"))
+    # Feature handling
+    USE_FEATURE_ADAPTER: bool = os.getenv("USE_FEATURE_ADAPTER", "true").lower() == "true"
+    MAKE_UNDIRECTED: bool = os.getenv("MAKE_UNDIRECTED", "false").lower() == "true"
+    # Threshold fallback
+    DEFAULT_THRESHOLD: float = float(os.getenv("DEFAULT_THRESHOLD", "0.5"))
+    # Rate limit
+    MAX_CALLS_PER_MIN: int = int(os.getenv("MAX_CALLS_PER_MIN", "20"))
+    WINDOW_SECONDS: int = int(os.getenv("WINDOW_SECONDS", "60"))
+    # Queue config
+    QUEUE_CONCURRENCY: int = int(os.getenv("QUEUE_CONCURRENCY", "2"))
+    # Blockchair API key (optional)
+    BLOCKCHAIR_API_KEY: str = os.getenv("BLOCKCHAIR_API_KEY", "").strip()

explorers.py ADDED Viewed

	@@ -0,0 +1,189 @@

+import os
+import json
+import time
+from typing import Dict, Any, List, Optional, Tuple
+import requests
+from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
+from cachetools import TTLCache
+from config import AppConfig
+UserAgent = "HF-Space-BTC-Abuse-GNN/1.0 (+https://huggingface.co/spaces)"
+class ExplorerError(Exception):
+    pass
+def _req_json(url: str, timeout: int, retries: int = 2) -> Any:
+    @retry(stop=stop_after_attempt(retries), wait=wait_exponential(min=0.5, max=4),
+           retry=retry_if_exception_type((requests.Timeout, requests.ConnectionError)))
+    def _do():
+        r = requests.get(url, timeout=timeout, headers={"User-Agent": UserAgent})
+        if r.status_code != 200:
+            raise ExplorerError(f"HTTP {r.status_code} for {url}")
+        return r.json()
+    return _do()
+def _satoshis_to_btc(v: Optional[int]) -> float:
+    try:
+        return float(v) / 1e8 if v is not None else 0.0
+    except Exception:
+        return 0.0
+def _normalize_tx_esplora(j: Dict[str, Any]) -> Dict[str, Any]:
+    # https://mempool.space/api/tx/{txid}
+    vin = j.get("vin", [])
+    vout = j.get("vout", [])
+    status = j.get("status", {}) or {}
+    bh = status.get("block_height")
+    bt = status.get("block_time")
+    vin_list = []
+    for e in vin:
+        p = e.get("prevout") or {}
+        vin_list.append({
+            "txid": e.get("txid"),
+            "vout": e.get("vout"),
+            "prevout_value": p.get("value"),
+            "prevout_address": p.get("scriptpubkey_address") or None
+        })
+    vout_list = []
+    for idx, e in enumerate(vout):
+        vout_list.append({
+            "n": idx,
+            "value": e.get("value"),
+            "address": e.get("scriptpubkey_address") or None
+        })
+    return {
+        "txid": j.get("txid") or j.get("hash"),
+        "vin": vin_list,
+        "vout": vout_list,
+        "block_height": bh,
+        "block_time": bt,
+    }
+def _normalize_outspends_esplora(j: Any) -> List[Optional[str]]:
+    # returns list aligned to outputs: each item has 'spent', 'txid'
+    res = []
+    if isinstance(j, list):
+        for e in j:
+            if isinstance(e, dict) and e.get("spent"):
+                res.append(e.get("txid"))
+            else:
+                res.append(None)
+    return res
+class BaseExplorer:
+    def __init__(self, cfg: AppConfig):
+        self.cfg = cfg
+        self.cache_tx = TTLCache(maxsize=10000, ttl=300)
+        self.cache_out = TTLCache(maxsize=10000, ttl=300)
+    def get_tx(self, txid: str) -> Dict[str, Any]:
+        raise NotImplementedError
+    def get_outspends(self, txid: str) -> List[Optional[str]]:
+        raise NotImplementedError
+class MempoolSpaceClient(BaseExplorer):
+    def __init__(self, cfg: AppConfig, base: str = "https://mempool.space"):
+        super().__init__(cfg)
+        self.base = base.rstrip("/")
+    def get_tx(self, txid: str) -> Dict[str, Any]:
+        if txid in self.cache_tx:
+            return self.cache_tx[txid]
+        url = f"{self.base}/api/tx/{txid}"
+        j = _req_json(url, timeout=self.cfg.HTTP_TIMEOUT, retries=self.cfg.HTTP_RETRIES)
+        tx = _normalize_tx_esplora(j)
+        self.cache_tx[txid] = tx
+        return tx
+    def get_outspends(self, txid: str) -> List[Optional[str]]:
+        if txid in self.cache_out:
+            return self.cache_out[txid]
+        url = f"{self.base}/api/tx/{txid}/outspends"
+        j = _req_json(url, timeout=self.cfg.HTTP_TIMEOUT, retries=self.cfg.HTTP_RETRIES)
+        out = _normalize_outspends_esplora(j)
+        self.cache_out[txid] = out
+        return out
+class BlockstreamClient(MempoolSpaceClient):
+    def __init__(self, cfg: AppConfig):
+        super().__init__(cfg, base="https://blockstream.info")
+class BlockchairClient(BaseExplorer):
+    def __init__(self, cfg: AppConfig):
+        super().__init__(cfg)
+        self.base = "https://api.blockchair.com/bitcoin"
+    def get_tx(self, txid: str) -> Dict[str, Any]:
+        if txid in self.cache_tx:
+            return self.cache_tx[txid]
+        url = f"{self.base}/dashboards/transaction/{txid}"
+        if self.cfg.BLOCKCHAIR_API_KEY:
+            url += f"?key={self.cfg.BLOCKCHAIR_API_KEY}"
+        j = _req_json(url, timeout=self.cfg.HTTP_TIMEOUT, retries=self.cfg.HTTP_RETRIES)
+        data = j.get("data", {}).get(txid, {})
+        tx = data.get("transaction", {})
+        inputs = data.get("inputs", [])
+        outputs = data.get("outputs", [])
+        vin_list = [{
+            "txid": i.get("spending_transaction_hash") or i.get("recipient_transaction_hash"),
+            "vout": i.get("spending_index"),
+            "prevout_value": i.get("value"),
+            "prevout_address": i.get("recipient"),
+        } for i in inputs]
+        vout_list = [{
+            "n": o.get("index"),
+            "value": o.get("value"),
+            "address": o.get("recipient"),
+        } for o in outputs]
+        out = {
+            "txid": txid,
+            "vin": vin_list,
+            "vout": vout_list,
+            "block_height": tx.get("block_id"),
+            "block_time": tx.get("time"),
+        }
+        self.cache_tx[txid] = out
+        return out
+    def get_outspends(self, txid: str) -> List[Optional[str]]:
+        # Blockchair includes outputs with 'spent_by_transaction_hash'
+        if txid in self.cache_out:
+            return self.cache_out[txid]
+        url = f"{self.base}/dashboards/transaction/{txid}"
+        if self.cfg.BLOCKCHAIR_API_KEY:
+            url += f"?key={self.cfg.BLOCKCHAIR_API_KEY}"
+        j = _req_json(url, timeout=self.cfg.HTTP_TIMEOUT, retries=self.cfg.HTTP_RETRIES)
+        outputs = j.get("data", {}).get(txid, {}).get("outputs", [])
+        res = []
+        for o in outputs:
+            res.append(o.get("spent_by_transaction_hash"))
+        self.cache_out[txid] = res
+        return res
+def new_client(cfg: AppConfig, primary: str) -> List[BaseExplorer]:
+    # primary then fallbacks
+    primary = (primary or cfg.DATA_PROVIDER).lower()
+    chain = []
+    if primary in ("mempool", "mempool.space"):
+        chain = [MempoolSpaceClient(cfg), BlockstreamClient(cfg), BlockchairClient(cfg)]
+    elif primary in ("blockstream", "blockstream.info"):
+        chain = [BlockstreamClient(cfg), MempoolSpaceClient(cfg), BlockchairClient(cfg)]
+    elif primary == "blockchair":
+        chain = [BlockchairClient(cfg), MempoolSpaceClient(cfg), BlockstreamClient(cfg)]
+    else:
+        chain = [MempoolSpaceClient(cfg), BlockstreamClient(cfg), BlockchairClient(cfg)]
+    return chain
+def fetch_with_fallback(txid: str, cfg: AppConfig, source: str):
+    errors = []
+    for c in new_client(cfg, source):
+        try:
+            tx = c.get_tx(txid)
+            outspends = c.get_outspends(txid)
+            if tx and outspends is not None:
+                return c, tx, outspends, None
+        except Exception as e:
+            errors.append(f"{c.__class__.__name__}: {e}")
+            continue
+    return None, None, None, errors

features.py ADDED Viewed

	@@ -0,0 +1,90 @@

+from typing import Dict, Any, List, Tuple
+import numpy as np
+from collections import deque, defaultdict
+from sklearn.preprocessing import StandardScaler
+def _sum_inputs_btc(tx: Dict[str, Any]) -> float:
+    s = 0.0
+    for v in tx.get("vin", []):
+        s += float(v.get("prevout_value") or 0.0) / 1e8
+    return s
+def _sum_outputs_btc(tx: Dict[str, Any]) -> float:
+    s = 0.0
+    for o in tx.get("vout", []):
+        s += float(o.get("value") or 0.0) / 1e8
+    return s
+def _compute_distances(n: int, edges: List[Tuple[int,int]], center: int) -> np.ndarray:
+    # undirected BFS distance
+    adj = [[] for _ in range(n)]
+    for u,v in edges:
+        adj[u].append(v); adj[v].append(u)
+    dist = np.full(n, fill_value=-1, dtype=np.int32)
+    q = deque([center]); dist[center] = 0
+    while q:
+        u = q.popleft()
+        for nb in adj[u]:
+            if dist[nb] == -1:
+                dist[nb] = dist[u] + 1
+                q.append(nb)
+    return dist
+def build_features(nodes: List[str], edges: List[Tuple[int,int]], center_idx: int, node_meta: Dict[str, Dict[str, Any]]):
+    n = len(nodes)
+    # degrees
+    out_deg = np.zeros(n, dtype=np.float32)
+    in_deg = np.zeros(n, dtype=np.float32)
+    for u,v in edges:
+        out_deg[u] += 1
+        in_deg[v] += 1
+    deg = in_deg + out_deg
+    ratio_in_out = in_deg / (out_deg + 1e-6)
+    # sums & counts from metadata
+    sum_in_btc = np.zeros(n, dtype=np.float32)
+    sum_out_btc = np.zeros(n, dtype=np.float32)
+    n_inputs = np.zeros(n, dtype=np.float32)
+    n_outputs = np.zeros(n, dtype=np.float32)
+    block_height = np.zeros(n, dtype=np.float32)
+    for idx, txid in enumerate(nodes):
+        meta = node_meta.get(txid) or {}
+        n_inputs[idx] = float(len(meta.get("vin", []) or []))
+        n_outputs[idx] = float(len(meta.get("vout", []) or []))
+        sum_in_btc[idx] = _sum_inputs_btc(meta)
+        sum_out_btc[idx] = _sum_outputs_btc(meta)
+        bh = meta.get("block_height")
+        block_height[idx] = float(bh) if bh is not None else 0.0
+    log_sum_in = np.log1p(sum_in_btc)
+    log_sum_out = np.log1p(sum_out_btc)
+    distance = _compute_distances(n, edges, center_idx)
+    feats = np.stack([
+        in_deg, out_deg, deg, ratio_in_out,
+        n_inputs, n_outputs,
+        sum_in_btc, sum_out_btc,
+        log_sum_in, log_sum_out,
+        distance.astype(np.float32),
+        block_height
+    ], axis=1)
+    feature_names = [
+        "in_degree","out_degree","degree","ratio_in_out",
+        "n_inputs","n_outputs",
+        "sum_in_btc","sum_out_btc",
+        "log_sum_in","log_sum_out",
+        "distance","block_height",
+    ]
+    return feats, feature_names
+def scale_features(X: np.ndarray, scaler=None):
+    if scaler is None:
+        scaler = StandardScaler()
+        Xs = scaler.fit_transform(X)
+        note = "Fitted new StandardScaler on ego-subgraph (domain shift vs Elliptic)."
+    else:
+        Xs = scaler.transform(X)
+        note = "Used provided scaler from model repo."
+    return Xs.astype("float32"), scaler, note

graph_builder.py ADDED Viewed

	@@ -0,0 +1,132 @@

+from typing import Dict, Any, List, Tuple, Set
+from collections import deque, defaultdict
+from explorers import fetch_with_fallback
+from config import AppConfig
+def expand_ego(txid: str, k: int, source: str, cfg: AppConfig):
+    """
+    Expand ego-subgraph up to k steps backward (parents) and forward (children).
+    Returns:
+      nodes: List[str] txids
+      edges: List[Tuple[int,int]] parent->child indices
+      center_idx: int
+      node_meta: Dict[txid, dict]
+      logs: List[str]
+    """
+    logs = []
+    client, tx0, out0, errs = fetch_with_fallback(txid, cfg, source)
+    if client is None:
+        return [], [], -1, {}, ["All providers failed", *(errs or [])]
+    nodes: List[str] = []
+    idx_map: Dict[str, int] = {}
+    edges: List[Tuple[int,int]] = []
+    node_meta: Dict[str, Dict[str, Any]] = {}
+    def add_node(tid: str, meta: Dict[str, Any]):
+        if tid in idx_map:
+            return idx_map[tid]
+        if len(nodes) >= cfg.MAX_NODES:
+            return None
+        idx = len(nodes)
+        nodes.append(tid)
+        idx_map[tid] = idx
+        node_meta[tid] = meta
+        return idx
+    def ensure_tx(tid: str):
+        c, tj, outsp, _ = fetch_with_fallback(tid, cfg, source)
+        if tj is None:
+            return None, None
+        return tj, outsp
+    # seed
+    center_idx = add_node(txid, tx0)
+    if center_idx is None:
+        return [], [], -1, {}, ["MAX_NODES reached at seed"]
+    frontier_par = deque([(txid, 0)])
+    frontier_ch = deque([(txid, 0)])
+    # BFS backward (parents)
+    while frontier_par:
+        cur, depth = frontier_par.popleft()
+        if depth >= k:
+            continue
+        tj = node_meta.get(cur)
+        if tj is None:
+            t, o = ensure_tx(cur)
+            if t is None:
+                continue
+            node_meta[cur] = t
+            tj = t
+        for vi in tj.get("vin", []):
+            ptx = vi.get("txid")
+            if not ptx:
+                continue
+            if ptx not in idx_map:
+                if len(nodes) >= cfg.MAX_NODES:
+                    logs.append("MAX_NODES reached during backward expansion")
+                    break
+                ptj, pout = ensure_tx(ptx)
+                if ptj is None:
+                    continue
+                pidx = add_node(ptx, ptj)
+                if pidx is None:
+                    continue
+                frontier_par.append((ptx, depth+1))
+            else:
+                pidx = idx_map[ptx]
+            # edge parent->child
+            cidx = idx_map.get(cur)
+            if cidx is not None:
+                edges.append((pidx, cidx))
+                if cfg.MAX_EDGES and len(edges) >= cfg.MAX_EDGES:
+                    logs.append("MAX_EDGES reached; stopping further edge additions")
+                    break
+    # BFS forward (children)
+    while frontier_ch:
+        cur, depth = frontier_ch.popleft()
+        if depth >= k:
+            continue
+        tj = node_meta.get(cur)
+        if tj is None:
+            t, o = ensure_tx(cur)
+            if t is None:
+                continue
+            node_meta[cur] = t
+            tj = t
+        outsp = None
+        try:
+            _, _, outsp, _ = fetch_with_fallback(cur, cfg, source)
+        except Exception:
+            outsp = None
+        if outsp is None:
+            continue
+        for child_tx in outsp:
+            if not child_tx:
+                continue
+            if child_tx not in idx_map:
+                if len(nodes) >= cfg.MAX_NODES:
+                    logs.append("MAX_NODES reached during forward expansion")
+                    break
+                ctj, cout = ensure_tx(child_tx)
+                if ctj is None:
+                    continue
+                cidx = add_node(child_tx, ctj)
+                if cidx is None:
+                    continue
+                frontier_ch.append((child_tx, depth+1))
+            else:
+                cidx = idx_map[child_tx]
+            pidx = idx_map.get(cur)
+            if pidx is not None:
+                edges.append((pidx, cidx))
+                if cfg.MAX_EDGES and len(edges) >= cfg.MAX_EDGES:
+                    logs.append("MAX_EDGES reached; stopping further edge additions")
+                    break
+    # deduplicate edges
+    edges = list(set(edges))
+    return nodes, edges, center_idx, node_meta, logs

inference.py ADDED Viewed

	@@ -0,0 +1,109 @@

+import json
+import os
+from typing import Dict, Any, Tuple, Optional
+import torch
+from huggingface_hub import snapshot_download
+from torch_geometric.data import Data
+from torch_geometric.utils import to_undirected
+from config import AppConfig
+from models import GATBaseline, GATv2Enhanced, AdapterWrapper
+def _load_threshold(model_dir: str, default_thr: float) -> float:
+    for name in ["thresholds.json", "threshold.json", "config.json"]:
+        p = os.path.join(model_dir, name)
+        if os.path.exists(p):
+            try:
+                d = json.load(open(p, "r"))
+                for k in ["threshold","default_threshold","thr","best_f1","best_j"]:
+                    if k in d and isinstance(d[k], (int, float)):
+                        return float(d[k])
+            except Exception:
+                continue
+    return default_thr
+def _load_scaler(model_dir: str):
+    # Optional scaler joblib/pkl
+    for name in ["scaler.joblib", "scaler.pkl", "elliptic_scaler.joblib", "elliptic_scaler.pkl"]:
+        p = os.path.join(model_dir, name)
+        if os.path.exists(p):
+            try:
+                import joblib
+                return joblib.load(p)
+            except Exception:
+                pass
+    return None
+def load_models(cfg: AppConfig):
+    # Download both repos
+    dir_gat = snapshot_download(cfg.HF_GAT_BASELINE_REPO, local_dir_use_symlinks=False)
+    dir_gatv2 = snapshot_download(cfg.HF_GATV2_REPO, local_dir_use_symlinks=False)
+    # Model files
+    ckpt_gat = os.path.join(dir_gat, "model.pt")
+    ckpt_gatv2 = os.path.join(dir_gatv2, "model.pt")
+    if not os.path.exists(ckpt_gat):
+        raise FileNotFoundError(f"Missing model.pt in {dir_gat}")
+    if not os.path.exists(ckpt_gatv2):
+        raise FileNotFoundError(f"Missing model.pt in {dir_gatv2}")
+    # Build cores (expected input dim from training)
+    core_gat = GATBaseline(cfg.IN_CHANNELS, cfg.HIDDEN_CHANNELS, cfg.HEADS, cfg.NUM_BLOCKS, cfg.DROPOUT)
+    core_gatv2 = GATv2Enhanced(cfg.IN_CHANNELS, cfg.HIDDEN_CHANNELS, cfg.HEADS, cfg.NUM_BLOCKS, cfg.DROPOUT)
+    state_gat = torch.load(ckpt_gat, map_location="cpu")
+    state_gatv2 = torch.load(ckpt_gatv2, map_location="cpu")
+    # strict load for cores
+    core_gat.load_state_dict(state_gat, strict=True)
+    core_gatv2.load_state_dict(state_gatv2, strict=True)
+    thr_gat = _load_threshold(dir_gat, cfg.DEFAULT_THRESHOLD)
+    thr_gatv2 = _load_threshold(dir_gatv2, cfg.DEFAULT_THRESHOLD)
+    scaler_gat = _load_scaler(dir_gat)
+    scaler_gatv2 = _load_scaler(dir_gatv2)
+    return {
+        "gat": {"core": core_gat.eval(), "threshold": thr_gat, "scaler": scaler_gat, "repo_dir": dir_gat},
+        "gatv2": {"core": core_gatv2.eval(), "threshold": thr_gatv2, "scaler": scaler_gatv2, "repo_dir": dir_gatv2},
+    }
+@torch.no_grad()
+def predict(model, data: Data):
+    logits = model(data.x, data.edge_index)
+    probs = torch.sigmoid(logits).cpu().numpy()
+    return probs
+def adapt_and_predict(bundle: Dict[str, Any], in_dim_new: int, data: Data, cfg: AppConfig):
+    core = bundle["core"]
+    if in_dim_new != cfg.IN_CHANNELS and cfg.USE_FEATURE_ADAPTER:
+        model = AdapterWrapper(in_dim_new, cfg.IN_CHANNELS, core).eval()
+        note = f"FeatureAdapter used (new_dim={in_dim_new} → expected={cfg.IN_CHANNELS})."
+    elif in_dim_new != cfg.IN_CHANNELS:
+        # attempt to run without adapter (not recommended)
+        model = core.eval()
+        note = f"Dimension mismatch (new_dim={in_dim_new}, expected={cfg.IN_CHANNELS}). Proceeding without adapter (may fail)."
+    else:
+        model = core.eval()
+        note = "Input dim matches."
+    probs = predict(model, data)
+    return probs, note
+def run_for_both_models(bundles, data: Data, center_idx: int, cfg: AppConfig):
+    in_dim_new = data.x.shape[1]
+    results = []
+    probs_g, note_g = adapt_and_predict(bundles["gat"], in_dim_new, data, cfg)
+    thr_g = float(bundles["gat"]["threshold"])
+    label_g = int(probs_g[center_idx] >= thr_g)
+    probs_v2, note_v2 = adapt_and_predict(bundles["gatv2"], in_dim_new, data, cfg)
+    thr_v2 = float(bundles["gatv2"]["threshold"])
+    label_v2 = int(probs_v2[center_idx] >= thr_v2)
+    return [
+        ("GAT", probs_g, thr_g, label_g, note_g),
+        ("GATv2", probs_v2, thr_v2, label_v2, note_v2),
+    ]

models.py ADDED Viewed

	@@ -0,0 +1,78 @@

+import torch
+import torch.nn as nn
+from torch_geometric.nn import GATConv, GATv2Conv, BatchNorm
+class ResidualGATBlock(nn.Module):
+    def __init__(self, in_channels, hidden_channels, heads=8, dropout=0.5, v2=False):
+        super().__init__()
+        Conv = GATv2Conv if v2 else GATConv
+        self.conv = Conv(in_channels, hidden_channels, heads=heads, dropout=dropout)
+        self.bn = BatchNorm(hidden_channels * heads)
+        self.act = nn.ReLU()
+        self.dropout = nn.Dropout(dropout)
+        self.res_proj = None
+        out_dim = hidden_channels * heads
+        if in_channels != out_dim:
+            self.res_proj = nn.Linear(in_channels, out_dim)
+    def forward(self, x, edge_index):
+        identity = x
+        out = self.conv(x, edge_index)
+        out = self.bn(out)
+        out = self.act(out)
+        out = self.dropout(out)
+        if self.res_proj is not None:
+            identity = self.res_proj(identity)
+        return out + identity
+class GATBaseline(nn.Module):
+    def __init__(self, in_channels, hidden_channels=128, heads=8, num_blocks=2, dropout=0.5):
+        super().__init__()
+        layers = []
+        c_in = in_channels
+        for _ in range(num_blocks):
+            layers.append(ResidualGATBlock(c_in, hidden_channels, heads=heads, dropout=dropout, v2=False))
+            c_in = hidden_channels * heads
+        self.blocks = nn.ModuleList(layers)
+        self.dropout = nn.Dropout(dropout)
+        self.out_conv = GATConv(c_in, 1, heads=1, concat=False, dropout=dropout)
+    def forward(self, x, edge_index):
+        for block in self.blocks:
+            x = block(x, edge_index)
+        x = self.dropout(x)
+        out = self.out_conv(x, edge_index)
+        return out.view(-1)
+class GATv2Enhanced(nn.Module):
+    def __init__(self, in_channels, hidden_channels=128, heads=8, num_blocks=2, dropout=0.5):
+        super().__init__()
+        layers = []
+        c_in = in_channels
+        for _ in range(num_blocks):
+            layers.append(ResidualGATBlock(c_in, hidden_channels, heads=heads, dropout=dropout, v2=True))
+            c_in = hidden_channels * heads
+        self.blocks = nn.ModuleList(layers)
+        self.dropout = nn.Dropout(dropout)
+        self.out_conv = GATv2Conv(c_in, 1, heads=1, concat=False, dropout=dropout)
+    def forward(self, x, edge_index):
+        for block in self.blocks:
+            x = block(x, edge_index)
+        x = self.dropout(x)
+        out = self.out_conv(x, edge_index)
+        return out.view(-1)
+class AdapterWrapper(nn.Module):
+    def __init__(self, in_dim_new, expected_in_dim, core_model):
+        super().__init__()
+        if in_dim_new != expected_in_dim:
+            self.adapter = nn.Linear(in_dim_new, expected_in_dim, bias=True)
+        else:
+            self.adapter = None
+        self.core = core_model
+    def forward(self, x, edge_index):
+        if self.adapter is not None:
+            x = self.adapter(x)
+        return self.core(x, edge_index)

rate_limit.py ADDED Viewed

	@@ -0,0 +1,34 @@

+import time
+import threading
+from collections import deque
+from functools import wraps
+class RateLimitExceeded(Exception):
+    pass
+class GlobalRateLimiter:
+    def __init__(self, max_calls: int, window_seconds: int):
+        self.max_calls = max_calls
+        self.window = window_seconds
+        self._lock = threading.Lock()
+        self._events = deque()
+    def allow(self) -> bool:
+        now = time.time()
+        with self._lock:
+            while self._events and now - self._events[0] > self.window:
+                self._events.popleft()
+            if len(self._events) < self.max_calls:
+                self._events.append(now)
+                return True
+            return False
+    def enforce(self, func=None):
+        def decorator(f):
+            @wraps(f)
+            def wrapper(*args, **kwargs):
+                if not self.allow():
+                    raise RateLimitExceeded(f"Rate limit exceeded ({self.max_calls} req/{self.window}s)")
+                return f(*args, **kwargs)
+            return wrapper
+        return decorator if func is None else decorator(func)

requirements.txt ADDED Viewed

	@@ -0,0 +1,28 @@

+# PyTorch CPU-only build (3.13 compatible)
+torch==2.6.0
+torchvision==0.21.0
+torchaudio==2.6.0
+-f https://download.pytorch.org/whl/cpu
+# PyTorch Geometric (compatible with torch 2.6.0 CPU)
+pyg_lib==0.4.0
+torch_scatter==2.1.2
+torch_sparse==0.6.18
+torch_cluster==1.6.3
+torch_spline_conv==1.2.2
+torch_geometric==2.6.0
+# Core app & utilities
+gradio>=5.0.0,<5.2.0
+huggingface_hub>=0.26.2
+requests>=2.32.3
+tenacity>=9.0.0
+diskcache>=5.6.3
+cachetools>=5.5.0
+pyvis>=0.3.3
+networkx>=3.4.2
+pandas>=2.2.3
+numpy>=2.1.3
+scikit-learn>=1.6.0
+matplotlib>=3.9.2
+tqdm>=4.67.1

viz.py ADDED Viewed

	@@ -0,0 +1,44 @@

+from typing import List, Tuple, Optional
+from pyvis.network import Network
+import numpy as np
+import io, base64
+import matplotlib.pyplot as plt
+def _color_from_score(p: float) -> str:
+    # blue (0) -> gray (0.5) -> red (1)
+    p = float(np.clip(p, 0.0, 1.0))
+    if p < 0.5:
+        # interpolate blue (#377eb8) to gray (#bbbbbb)
+        t = p / 0.5
+        c0 = (0x37, 0x7e, 0xb8)
+        c1 = (0xbb, 0xbb, 0xbb)
+    else:
+        # interpolate gray to red (#e41a1c)
+        t = (p - 0.5) / 0.5
+        c0 = (0xbb, 0xbb, 0xbb)
+        c1 = (0xe4, 0x1a, 0x1c)
+    r = int((1-t)*c0[0] + t*c1[0])
+    g = int((1-t)*c0[1] + t*c1[1])
+    b = int((1-t)*c0[2] + t*c1[2])
+    return f"#{r:02x}{g:02x}{b:02x}"
+def render_ego_html(nodes: List[str], edges: List[Tuple[int,int]], center_idx: int, scores: Optional[np.ndarray]=None) -> str:
+    net = Network(height="600px", width="100%", notebook=False, directed=True)
+    for i, txid in enumerate(nodes):
+        color = _color_from_score(scores[i]) if scores is not None else "#bbbbbb"
+        size = 20 if i == center_idx else 8
+        title = f"{txid}"
+        if scores is not None:
+            title += f"<br/>score={scores[i]:.4f}"
+        net.add_node(i, label=txid[:10]+"…", title=title, color=color, size=size)
+    for (u,v) in edges:
+        net.add_edge(u, v, arrows="to")
+    net.toggle_physics(True)
+    return net.generate_html()
+def histogram_scores(scores, title="Score distribution"):
+    fig = plt.figure(figsize=(6,4))
+    plt.hist(scores, bins=40)
+    plt.xlabel("Score"); plt.ylabel("Count"); plt.title(title)
+    plt.tight_layout()
+    return fig