Spaces:

Pandago
/

graphstrike

Sleeping

App Files Files Community

graphstrike / docs.md

Pandago

Upload folder using huggingface_hub

87f2d84 verified about 2 months ago

preview code

raw

history blame contribute delete

13.7 kB

metadata

title: GraphStrike
emoji: 🕵️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
license: mit
tags:
  - reinforcement-learning
  - social-network
  - fraud-detection
  - openenv
  - llm-agent

An OpenEnv-compatible reinforcement learning environment where an LLM agent must identify all 10 members of a coordinated fake account network hidden inside a synthetic social network. The agent learns via Reflexion and a dynamic hybrid rule/LLM policy , not via gradient updates or fine-tuning.

Deployed Endpoint Verification

The live environment at huggingface.co/spaces/Pandago/graphstrike responds to all standard OpenEnv endpoints:

# Health check
curl https://pandago-graphstrike.hf.space/health
# → {"status": "healthy"}

# Task discovery
curl https://pandago-graphstrike.hf.space/tasks
# → {"tasks": ["easy","medium","hard"], "action_schema": {...}, "score_range": [0.0, 1.0]}

# Baseline (deterministic, reproducible)
curl -X POST https://pandago-graphstrike.hf.space/baseline
# → {"scores": {"easy": 0.91, "medium": 0.906, "hard": 0.9038}, "agent": "rule_based"}

We evaluate GraphStrike's hybrid rule/LLM policy across multiple frontier models to measure how well each model handles the investigation task. All runs use the same inference pipeline (inference.py) with identical system prompts and structured logging. Each model ran: (1) seed=0 on all 3 tasks, and (2) seeds 0-2 on all 3 tasks for variance measurement.

Seed=0 scores (single episode per task):

Model Performance Table

3-seed variance scores (mean across seeds 0, 1, 2):

Model Performance Table

Rule-Based Baseline (no LLM, deterministic)

Model Performance Table

The task: A social network contains fake accounts organised into a single coordinated network of 10. The network behaves in a coordinated way — same posting hour, same IP subnet, stolen celebrity photos, copy-paste bios. The agent must find all 10 by navigating a limited step budget, inspecting accounts, and flagging suspects.

What makes this non-trivial: The network is large (50–1000 accounts depending on difficulty). Fake accounts are mixed with innocent high-signal "decoy" accounts.In hard mode, the fake accounts actively evades — dropping intra-account follows, renaming profiles — while the agent is mid-investigation.The agent cannot see the full network upfront: it must explore via INSPECT and INVESTIGATE_NETWORK actions, spending steps to reveal information.

What makes the learning novel: The LInference LLM (via aws bedrock) cannot be fine-tuned.it's a black-box API. The agent learns via Reflexion i.e., post-episode lessons are written back into memory and injected into every future prompt. A dynamic hybrid policy (α-weighted) blends the LLM with a deterministic rule engine, with the blend weight α updating based on recent win rate. Rules dominate early; the LLM takes over as it proves itself.

Detection Signals

Detection operates entirely on numeric metadata — no content processing. Three signal categories, computed at different points:

Node signals (pre-computed by content pipeline, static per account):

Feature	Fake range	Real range	Notes
`photo_reuse_score`	0.30–0.95	0.00–0.15	pHash fingerprint match against celebrity photo DB
`bio_template_score`	0.20–0.90	0.00–0.12	Cosine sim to known fake bio templates
`comment_repeat_score`	0.60–0.90	0.00–0.08	Copy-paste spam fraction across accounts

Behavioral signals (static, from account metadata):

Feature	Fake pattern
`avg_post_hour`	All 10 gang members cluster within ±0.5 hours — coordinated scheduling
`account_age_days`	Created within the same week (`base_age ± 7 days`)
`shared_ip_count`	9 — all 10 share one IP subnet (`ip_gang_{seed}`)

Graph signals (dynamic — computed at INSPECT time, shift as investigation progresses):

Feature	Why it matters
`mutual_follow_rate`	Gang members mutually follow each other at 0.6–0.9 density; legitimate hubs don't follow back
`flagged_neighbor_count`	Grows as more gang members are flagged — the cascade signal
`post_hour_cluster_score`	Alignment to mean posting hour of currently-flagged accounts (wrap-around aware)
`suspicious_mutual_ratio`	Used to compute hub legitimacy — protects celebrities from false positives

Graph signals are the most powerful: once one gang member is flagged, flagged_neighbor_count rises for all connected members, compounding with each subsequent flag.

Synthetic Network Composition

150 episodes pre-generated deterministically (50 per task). Each episode is a JSON file (episodes/{task}_{seed:03d}.json).

Task	Accounts	Gang	Decoys	Max steps	Evasion
easy	50	10	0	30	None
medium	200	10	20	50	Step 20 (once)
hard	1000	10	50	80	Steps 15/30/45/60

Gang: Dense intra-follow graph (density 0.60–0.80), same IP subnet, tightly clustered post hours (std 0.5/1.5/2.5 by task).
Decoys (medium/hard only): Real accounts with elevated photo_reuse and bio_template scores (0.20–0.40). They score as suspicious but are not gang members — they penalise reckless flagging.
Celebrities (2 per episode): 100k–5M followers, near-zero fake scores. Hub legitimacy formula protects them.
Zero-edge isolates (2 per episode): follower_count=0, no edges. Test whether the agent wastes steps on disconnected nodes.

Actions

Action	Cost	Effect
`inspect`	1 step	Reveals full `AccountProfile` (all 22 features), adds neighbors to visible set
`investigate_network`	2 steps	Bidirectional 2-hop expansion — reveals account IDs only (no profiles); re-cascades SUSPECT
`flag`	0 steps	Marks account CONFIRMED_FAKE; dual cascade: follow-graph + IP cluster
`unflag`	0 steps	Clears CONFIRMED_FAKE status
`submit`	0 steps	Ends episode, triggers scoring

Dual SUSPECT cascade on FLAG:

Follow-graph: Every visible account that the flagged account follows → SUSPECT (high precision: gang follow density 0.70+).
IP cluster: Every visible account sharing the same ip_cluster_id → SUSPECT (zero false positives: real accounts each have a unique IP; gang shares ip_gang_{seed}).

Both mechanisms surface in obs.suspect_ids — the agent's highest-priority INSPECT targets.

Risk Scoring (`server/scoring.py`)

All functions are stateless, called inside _build_profile() at INSPECT time and on re-profiling after each FLAG.

node_risk     = 0.60 × photo_reuse + 0.40 × bio_template

age_norm      = min(1.0, account_age_days / 365)
behavior_risk = 0.55 × (1 − age_norm) + 0.45 × post_hour_cluster_score

flagged_ratio = flagged_neighbor_count / max(inspected_neighbor_count, 1)
graph_risk    = 0.45 × flagged_ratio + 0.35 × mutual_follow_rate + 0.20 × avg_neighbor_photo_reuse

hub_legitimacy = 0.45 × log(1+followers)/log(1+1M)
              + 0.25 × (1 − follow_ratio_norm)
              + 0.20 × age_norm
              + 0.10 × (1 − suspicious_mutual_ratio)

fake_risk = clip(0.30×node_risk + 0.25×behavior_risk + 0.45×graph_risk − 0.25×hub_legitimacy, 0, 1)

Weight rationale: Graph risk (0.45) is dominant — structural signals are hardest to fake and compound across the investigation. Hub legitimacy is subtractive — a celebrity with 5M followers produces hub_legitimacy ≈ 1.0, making their fake_risk near zero even if gang members follow them.

Classification thresholds:

fake_risk < 0.35 → normal
0.35 ≤ fake_risk < 0.60 → suspect
fake_risk ≥ 0.60 → confirmed_fake (formula-level; explicit FLAG overrides)

Grader score (normalised [0.0, 1.0], returned by /grader):

recall    = tp / 10
precision = tp / max(tp + fp, 1)
efficiency = max(0, (max_steps − steps_used) / max_steps)

if recall ≥ 0.8 AND precision ≥ 0.7:
    score = 0.55 + 0.20×recall + 0.15×precision + 0.10×efficiency
else:
    score = 0.30×recall + 0.10×precision

Maximum 1.0 (all 10 found, zero false positives, zero steps used). Win threshold ≈ 0.815.

Hybrid Policy (`agent/hybrid_policy.py`)

The agent blends a deterministic rule engine with Qwen3-Next-80B (via AWS Bedrock) using a per-task trust weight α.

Alpha update (per episode, after win/loss recorded):

reflection_factor = min(1.0, n_reflections / 4.0)
raw   = 0.20 + reflection_factor × (0.80 × recent_win_rate + 0.12)
alpha = clamp(raw, 0.20, task_cap)

Task	α cap	Rationale
easy	0.50	Rule engine alone hits ~91% — LLM assists, doesn't override
medium	0.70	Decoys require LLM judgment, but cascade must stay
hard	0.85	LLM needs latitude for evasion adaptation

reflection_factor gates α: the LLM must accumulate ≥4 post-episode lessons before reaching meaningful trust, regardless of raw win rate.

Blending decision:

rule_action, rule_conf = get_rule_action(obs)   # deterministic, with confidence score
llm_action,  _        = get_action(obs, ...)    # Qwen3 via Bedrock

if rule_action == llm_action:   final = llm_action     # agree
elif rule_conf >= alpha:        final = rule_action     # rule overrides
else:                           final = llm_action      # LLM trusted

Rule confidences: SUBMIT-forced=1.00, INSPECT-suspect=0.95, FLAG-high-risk=0.95, FLAG-threshold=0.70+, INSPECT-explore=0.30. At α=0.50 (easy cap), safety decisions (suspects, forced submit) always override; exploration goes to the LLM.

Reflexion learning: After each episode, Qwen3 generates a 2–3 sentence lesson from the action log and outcome. Lessons are stored in memory/reflections_{task}.jsonl and injected into every future prompt (last 4 lessons + best winning trajectory as few-shot example). Memory persists across container restarts via Docker volume.

API Reference

Endpoint	Method	Description
`/health`	GET	`{"status": "healthy"}`
`/tasks`	GET	Task list + `action_schema` + `score_range: [0.0, 1.0]`
`/reset`	POST	`{task, seed}` → initial observation
`/step`	POST	`{action_type, account_id?}` → updated observation
`/state`	GET	Episode metadata (step count, task, score, evasion count)
`/grader`	GET	Normalised [0.0, 1.0] score after SUBMIT (400 if not done)
`/baseline`	POST	Runs rule-based agent on all 3 tasks, seed=0
`/metadata`	GET	OpenEnv metadata block
`/schema`	GET	Full JSON schema for actions and observations
`/mcp`	POST	JSON-RPC 2.0 tool discovery (Model Context Protocol)

Live: https://pandago-graphstrike.hf.space

File Structure

server/
  app.py          — FastAPI + Gradio UI (gr.mount_gradio_app)
  environment.py  — Episode lifecycle, action mechanics, cascade logic
  generator.py    — Deterministic episode generation (150 JSON files)
  scoring.py      — Stateless risk formula functions
  models.py       — Pydantic models: AccountProfile, FakeGangObservation, ActionType

agent/
  policy.py       — Qwen3 prompt construction + action parsing
  hybrid_policy.py — Alpha blending, rule engine with confidence scores
  reflection.py   — Post-episode lesson generation
  memory.py       — JSONL persistence for reflections, trajectories, alpha

inference.py      — Submission entrypoint: [START]/[STEP]/[END] structured logs, OpenAI client
validate.py       — 24-point pre-submission validator (local + HTTP)
train.py          — Full training loop with curriculum
episodes/         — 150 pre-generated JSON episode files (baked into Docker image)
memory/           — Docker volume: reflections, win history, alpha values

Baseline Scores

Task	Seed=0	Win rate (50 seeds)	Mean (50 seeds)
easy	0.910	100%	~0.91
medium	0.906	84%	~0.77
hard	0.9038	52%	~0.47

The rule-based baseline (no LLM) is competitive on easy/medium. Hard is the real differentiator — evasion events drop intra-gang edges mid-investigation, destroying graph signals. Frontier LLM agents with accumulated reflections adapt; the rule engine degrades.

Built by team computeXor

Deployed Endpoint Verification

Detection Signals

Synthetic Network Composition

Actions

Risk Scoring (server/scoring.py)

Hybrid Policy (agent/hybrid_policy.py)

API Reference

File Structure

Baseline Scores

Risk Scoring (`server/scoring.py`)

Hybrid Policy (`agent/hybrid_policy.py`)