Spaces:
Running
Running
| title: GraphStrike | |
| emoji: 🕵️ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| license: mit | |
| tags: | |
| - reinforcement-learning | |
| - social-network | |
| - fraud-detection | |
| - openenv | |
| - llm-agent | |
| <br> | |
| <p align="center"> | |
| <img src="images/logo.png" width="600"/> | |
| </p> | |
| <br> | |
| <p align="center"> | |
| <img src="https://img.shields.io/badge/Hugging%20Face-FFD21E?style=for-the-badge&logo=huggingface&logoColor=black"/> | |
| <img src="https://img.shields.io/badge/HF%20Spaces-FFBF00?style=for-the-badge&logo=huggingface&logoColor=black"/> | |
| <img src="https://img.shields.io/badge/FastAPI-009688?style=for-the-badge&logo=fastapi&logoColor=white"/> | |
| <img src="https://img.shields.io/badge/Docker-2496ED?style=for-the-badge&logo=docker&logoColor=white"/> | |
| <img src="https://img.shields.io/badge/Gradio-F97316?style=for-the-badge&logo=gradio&logoColor=white"/> | |
| <img src="https://img.shields.io/badge/OpenEnv-4B5563?style=for-the-badge&logo=envato&logoColor=white"/> | |
| <img src="https://img.shields.io/badge/Amazon%20Bedrock-FF9900?style=for-the-badge&logo=amazonaws&logoColor=white"/> | |
| </p> | |
| <br> | |
| <h1 align="center"> | |
| </h1> | |
| <p align="center"> | |
| An OpenEnv-compatible reinforcement learning environment where an LLM agent must identify all 10 members of a coordinated fake account network hidden inside a synthetic social network. The agent learns via Reflexion and a dynamic hybrid rule/LLM policy , not via gradient updates or fine-tuning. | |
| <br /> | |
| </p> | |
| </p> | |
| <br> | |
| <br> | |
| ### *Deployed Endpoint Verification* | |
| The live environment at [huggingface.co/spaces/Pandago/graphstrike](https://huggingface.co/spaces/Pandago/graphstrike) | |
| responds to all standard OpenEnv endpoints: | |
| ```bash | |
| # Health check | |
| curl https://pandago-graphstrike.hf.space/health | |
| # → {"status": "healthy"} | |
| # Task discovery | |
| curl https://pandago-graphstrike.hf.space/tasks | |
| # → {"tasks": ["easy","medium","hard"], "action_schema": {...}, "score_range": [0.0, 1.0]} | |
| # Baseline (deterministic, reproducible) | |
| curl -X POST https://pandago-graphstrike.hf.space/baseline | |
| # → {"scores": {"easy": 0.91, "medium": 0.906, "hard": 0.9038}, "agent": "rule_based"} | |
| ``` | |
| --- | |
| <br> | |
| We evaluate GraphStrike's hybrid rule/LLM policy across multiple *frontier models to measure how well each model handles the investigation task. All runs use | |
| the same inference pipeline (`inference.py`) with identical system prompts and structured logging. Each model ran: (1) seed=0 on all 3 tasks, and | |
| (2) seeds 0-2 on all 3 tasks for variance measurement.* | |
| <br> | |
| **Seed=0 scores (single episode per task):** | |
| <p align="center"> | |
| <img src="images/table1.png" alt="Model Performance Table" width="1600"/> | |
| </p> | |
| <br> | |
| **3-seed variance scores (mean across seeds 0, 1, 2):** | |
| <p align="center"> | |
| <img src="images/table2.png" alt="Model Performance Table" width="1600"/> | |
| </p> | |
| <br> | |
| **Rule-Based Baseline (no LLM, deterministic)** | |
| <p align="center"> | |
| <img src="images/table3.png" alt="Model Performance Table" width="1600"/> | |
| </p> | |
| <br> | |
| --- | |
| **The task:** A social network contains fake accounts organised into a | |
| single coordinated network of 10. The network behaves in a coordinated way — same posting hour, | |
| same IP subnet, stolen celebrity photos, copy-paste bios. The agent must find | |
| all 10 by navigating a limited step budget, inspecting accounts, and flagging suspects. | |
| **What makes this non-trivial:** The network is large (50–1000 accounts depending on difficulty). Fake accounts are mixed with innocent high-signal "decoy" accounts.In hard mode, the fake accounts actively evades — dropping intra-account follows, renaming profiles — while the agent is mid-investigation.The agent cannot see the full network upfront: it must explore via INSPECT and INVESTIGATE_NETWORK actions, spending steps to reveal information. | |
| **What makes the learning novel:** The LInference LLM (via aws bedrock) cannot be fine-tuned.it's a black-box API. The agent learns via Reflexion i.e., post-episode lessons are written back into memory and injected into every future prompt. A dynamic hybrid policy (α-weighted) blends the LLM with a deterministic rule engine, with the blend weight α updating based on recent win rate. Rules dominate early; the LLM takes over as it proves itself. | |
| --- | |
| ## Detection Signals | |
| Detection operates entirely on numeric metadata — no content processing. Three signal categories, computed at different points: | |
| **Node signals** (pre-computed by content pipeline, static per account): | |
| | Feature | Fake range | Real range | Notes | | |
| |---|---|---|---| | |
| | `photo_reuse_score` | 0.30–0.95 | 0.00–0.15 | pHash fingerprint match against celebrity photo DB | | |
| | `bio_template_score` | 0.20–0.90 | 0.00–0.12 | Cosine sim to known fake bio templates | | |
| | `comment_repeat_score` | 0.60–0.90 | 0.00–0.08 | Copy-paste spam fraction across accounts | | |
| **Behavioral signals** (static, from account metadata): | |
| | Feature | Fake pattern | | |
| |---|---| | |
| | `avg_post_hour` | All 10 gang members cluster within ±0.5 hours — coordinated scheduling | | |
| | `account_age_days` | Created within the same week (`base_age ± 7 days`) | | |
| | `shared_ip_count` | 9 — all 10 share one IP subnet (`ip_gang_{seed}`) | | |
| **Graph signals** (dynamic — computed at INSPECT time, shift as investigation progresses): | |
| | Feature | Why it matters | | |
| |---|---| | |
| | `mutual_follow_rate` | Gang members mutually follow each other at 0.6–0.9 density; legitimate hubs don't follow back | | |
| | `flagged_neighbor_count` | Grows as more gang members are flagged — the cascade signal | | |
| | `post_hour_cluster_score` | Alignment to mean posting hour of currently-flagged accounts (wrap-around aware) | | |
| | `suspicious_mutual_ratio` | Used to compute hub legitimacy — protects celebrities from false positives | | |
| Graph signals are the most powerful: once one gang member is flagged, `flagged_neighbor_count` rises for all connected members, compounding with each subsequent flag. | |
| --- | |
| ## Synthetic Network Composition | |
| 150 episodes pre-generated deterministically (50 per task). Each episode is a JSON file (`episodes/{task}_{seed:03d}.json`). | |
| | Task | Accounts | Gang | Decoys | Max steps | Evasion | | |
| |---|---|---|---|---|---| | |
| | easy | 50 | 10 | 0 | 30 | None | | |
| | medium | 200 | 10 | 20 | 50 | Step 20 (once) | | |
| | hard | 1000 | 10 | 50 | 80 | Steps 15/30/45/60 | | |
| - **Gang:** Dense intra-follow graph (density 0.60–0.80), same IP subnet, tightly clustered post hours (std 0.5/1.5/2.5 by task). | |
| - **Decoys** (medium/hard only): Real accounts with elevated `photo_reuse` and `bio_template` scores (0.20–0.40). They score as suspicious but are not gang members — they penalise reckless flagging. | |
| - **Celebrities** (2 per episode): 100k–5M followers, near-zero fake scores. Hub legitimacy formula protects them. | |
| - **Zero-edge isolates** (2 per episode): `follower_count=0`, no edges. Test whether the agent wastes steps on disconnected nodes. | |
| --- | |
| ## Actions | |
| | Action | Cost | Effect | | |
| |---|---|---| | |
| | `inspect` | 1 step | Reveals full `AccountProfile` (all 22 features), adds neighbors to visible set | | |
| | `investigate_network` | 2 steps | Bidirectional 2-hop expansion — reveals account IDs only (no profiles); re-cascades SUSPECT | | |
| | `flag` | 0 steps | Marks account CONFIRMED_FAKE; dual cascade: follow-graph + IP cluster | | |
| | `unflag` | 0 steps | Clears CONFIRMED_FAKE status | | |
| | `submit` | 0 steps | Ends episode, triggers scoring | | |
| **Dual SUSPECT cascade on FLAG:** | |
| 1. *Follow-graph:* Every visible account that the flagged account follows → SUSPECT (high precision: gang follow density 0.70+). | |
| 2. *IP cluster:* Every visible account sharing the same `ip_cluster_id` → SUSPECT (zero false positives: real accounts each have a unique IP; gang shares `ip_gang_{seed}`). | |
| Both mechanisms surface in `obs.suspect_ids` — the agent's highest-priority INSPECT targets. | |
| --- | |
| ## Risk Scoring (`server/scoring.py`) | |
| All functions are stateless, called inside `_build_profile()` at INSPECT time and on re-profiling after each FLAG. | |
| ``` | |
| node_risk = 0.60 × photo_reuse + 0.40 × bio_template | |
| age_norm = min(1.0, account_age_days / 365) | |
| behavior_risk = 0.55 × (1 − age_norm) + 0.45 × post_hour_cluster_score | |
| flagged_ratio = flagged_neighbor_count / max(inspected_neighbor_count, 1) | |
| graph_risk = 0.45 × flagged_ratio + 0.35 × mutual_follow_rate + 0.20 × avg_neighbor_photo_reuse | |
| hub_legitimacy = 0.45 × log(1+followers)/log(1+1M) | |
| + 0.25 × (1 − follow_ratio_norm) | |
| + 0.20 × age_norm | |
| + 0.10 × (1 − suspicious_mutual_ratio) | |
| fake_risk = clip(0.30×node_risk + 0.25×behavior_risk + 0.45×graph_risk − 0.25×hub_legitimacy, 0, 1) | |
| ``` | |
| **Weight rationale:** Graph risk (0.45) is dominant — structural signals are hardest to fake and compound across the investigation. Hub legitimacy is subtractive — a celebrity with 5M followers produces `hub_legitimacy ≈ 1.0`, making their fake_risk near zero even if gang members follow them. | |
| **Classification thresholds:** | |
| - `fake_risk < 0.35` → normal | |
| - `0.35 ≤ fake_risk < 0.60` → suspect | |
| - `fake_risk ≥ 0.60` → confirmed_fake (formula-level; explicit FLAG overrides) | |
| **Grader score** (normalised [0.0, 1.0], returned by `/grader`): | |
| ``` | |
| recall = tp / 10 | |
| precision = tp / max(tp + fp, 1) | |
| efficiency = max(0, (max_steps − steps_used) / max_steps) | |
| if recall ≥ 0.8 AND precision ≥ 0.7: | |
| score = 0.55 + 0.20×recall + 0.15×precision + 0.10×efficiency | |
| else: | |
| score = 0.30×recall + 0.10×precision | |
| ``` | |
| Maximum 1.0 (all 10 found, zero false positives, zero steps used). Win threshold ≈ 0.815. | |
| --- | |
| ## Hybrid Policy (`agent/hybrid_policy.py`) | |
| The agent blends a deterministic rule engine with Qwen3-Next-80B (via AWS Bedrock) using a per-task trust weight α. | |
| **Alpha update** (per episode, after win/loss recorded): | |
| ``` | |
| reflection_factor = min(1.0, n_reflections / 4.0) | |
| raw = 0.20 + reflection_factor × (0.80 × recent_win_rate + 0.12) | |
| alpha = clamp(raw, 0.20, task_cap) | |
| ``` | |
| | Task | α cap | Rationale | | |
| |---|---|---| | |
| | easy | 0.50 | Rule engine alone hits ~91% — LLM assists, doesn't override | | |
| | medium | 0.70 | Decoys require LLM judgment, but cascade must stay | | |
| | hard | 0.85 | LLM needs latitude for evasion adaptation | | |
| `reflection_factor` gates α: the LLM must accumulate ≥4 post-episode lessons before reaching meaningful trust, regardless of raw win rate. | |
| **Blending decision:** | |
| ```python | |
| rule_action, rule_conf = get_rule_action(obs) # deterministic, with confidence score | |
| llm_action, _ = get_action(obs, ...) # Qwen3 via Bedrock | |
| if rule_action == llm_action: final = llm_action # agree | |
| elif rule_conf >= alpha: final = rule_action # rule overrides | |
| else: final = llm_action # LLM trusted | |
| ``` | |
| Rule confidences: SUBMIT-forced=1.00, INSPECT-suspect=0.95, FLAG-high-risk=0.95, FLAG-threshold=0.70+, INSPECT-explore=0.30. At `α=0.50` (easy cap), safety decisions (suspects, forced submit) always override; exploration goes to the LLM. | |
| **Reflexion learning:** After each episode, Qwen3 generates a 2–3 sentence lesson from the action log and outcome. Lessons are stored in `memory/reflections_{task}.jsonl` and injected into every future prompt (last 4 lessons + best winning trajectory as few-shot example). Memory persists across container restarts via Docker volume. | |
| --- | |
| ## API Reference | |
| | Endpoint | Method | Description | | |
| |---|---|---| | |
| | `/health` | GET | `{"status": "healthy"}` | | |
| | `/tasks` | GET | Task list + `action_schema` + `score_range: [0.0, 1.0]` | | |
| | `/reset` | POST | `{task, seed}` → initial observation | | |
| | `/step` | POST | `{action_type, account_id?}` → updated observation | | |
| | `/state` | GET | Episode metadata (step count, task, score, evasion count) | | |
| | `/grader` | GET | Normalised [0.0, 1.0] score after SUBMIT (400 if not done) | | |
| | `/baseline` | POST | Runs rule-based agent on all 3 tasks, seed=0 | | |
| | `/metadata` | GET | OpenEnv metadata block | | |
| | `/schema` | GET | Full JSON schema for actions and observations | | |
| | `/mcp` | POST | JSON-RPC 2.0 tool discovery (Model Context Protocol) | | |
| Live: `https://pandago-graphstrike.hf.space` | |
| --- | |
| ## File Structure | |
| ``` | |
| server/ | |
| app.py — FastAPI + Gradio UI (gr.mount_gradio_app) | |
| environment.py — Episode lifecycle, action mechanics, cascade logic | |
| generator.py — Deterministic episode generation (150 JSON files) | |
| scoring.py — Stateless risk formula functions | |
| models.py — Pydantic models: AccountProfile, FakeGangObservation, ActionType | |
| agent/ | |
| policy.py — Qwen3 prompt construction + action parsing | |
| hybrid_policy.py — Alpha blending, rule engine with confidence scores | |
| reflection.py — Post-episode lesson generation | |
| memory.py — JSONL persistence for reflections, trajectories, alpha | |
| inference.py — Submission entrypoint: [START]/[STEP]/[END] structured logs, OpenAI client | |
| validate.py — 24-point pre-submission validator (local + HTTP) | |
| train.py — Full training loop with curriculum | |
| episodes/ — 150 pre-generated JSON episode files (baked into Docker image) | |
| memory/ — Docker volume: reflections, win history, alpha values | |
| ``` | |
| --- | |
| ## Baseline Scores | |
| | Task | Seed=0 | Win rate (50 seeds) | Mean (50 seeds) | | |
| |---|---|---|---| | |
| | easy | 0.910 | 100% | ~0.91 | | |
| | medium | 0.906 | 84% | ~0.77 | | |
| | hard | 0.9038 | 52% | ~0.47 | | |
| The rule-based baseline (no LLM) is competitive on easy/medium. Hard is the real differentiator — evasion events drop intra-gang edges mid-investigation, destroying graph signals. Frontier LLM agents with accumulated reflections adapt; the rule engine degrades. | |
| --- | |
| *Built by team computeXor* | |