Spaces:

Pandago
/

graphstrike

Running

App Files Files Community

graphstrike / docs.md

Pandago

Upload folder using huggingface_hub

87f2d84 verified about 2 months ago

preview code

raw

history blame contribute delete

13.7 kB

	---
	title: GraphStrike
	emoji: 🕵️
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 7860
	pinned: false
	license: mit
	tags:
	- reinforcement-learning
	- social-network
	- fraud-detection
	- openenv
	- llm-agent
	---
	<br>

	<p align="center">
	<img src="images/logo.png" width="600"/>
	</p>

	<br>

	<p align="center">
	<img src="https://img.shields.io/badge/Hugging%20Face-FFD21E?style=for-the-badge&logo=huggingface&logoColor=black"/>
	<img src="https://img.shields.io/badge/HF%20Spaces-FFBF00?style=for-the-badge&logo=huggingface&logoColor=black"/>
	<img src="https://img.shields.io/badge/FastAPI-009688?style=for-the-badge&logo=fastapi&logoColor=white"/>
	<img src="https://img.shields.io/badge/Docker-2496ED?style=for-the-badge&logo=docker&logoColor=white"/>
	<img src="https://img.shields.io/badge/Gradio-F97316?style=for-the-badge&logo=gradio&logoColor=white"/>
	<img src="https://img.shields.io/badge/OpenEnv-4B5563?style=for-the-badge&logo=envato&logoColor=white"/>
	<img src="https://img.shields.io/badge/Amazon%20Bedrock-FF9900?style=for-the-badge&logo=amazonaws&logoColor=white"/>
	</p>
	<br>

	<h1 align="center">
	</h1>
	<p align="center">
	An OpenEnv-compatible reinforcement learning environment where an LLM agent must identify all 10 members of a coordinated fake account network hidden inside a synthetic social network. The agent learns via Reflexion and a dynamic hybrid rule/LLM policy , not via gradient updates or fine-tuning.
	<br />
	</p>
	</p>

	<br>
	<br>

	### Deployed Endpoint Verification

	The live environment at [huggingface.co/spaces/Pandago/graphstrike](https://huggingface.co/spaces/Pandago/graphstrike)
	responds to all standard OpenEnv endpoints:

	```bash
	# Health check
	curl https://pandago-graphstrike.hf.space/health
	# → {"status": "healthy"}

	# Task discovery
	curl https://pandago-graphstrike.hf.space/tasks
	# → {"tasks": ["easy","medium","hard"], "action_schema": {...}, "score_range": [0.0, 1.0]}

	# Baseline (deterministic, reproducible)
	curl -X POST https://pandago-graphstrike.hf.space/baseline
	# → {"scores": {"easy": 0.91, "medium": 0.906, "hard": 0.9038}, "agent": "rule_based"}
	```

	---

	<br>

	We evaluate GraphStrike's hybrid rule/LLM policy across multiple *frontier models to measure how well each model handles the investigation task. All runs use
	the same inference pipeline (`inference.py`) with identical system prompts and structured logging. Each model ran: (1) seed=0 on all 3 tasks, and
	(2) seeds 0-2 on all 3 tasks for variance measurement.*

	<br>

	Seed=0 scores (single episode per task):

	<p align="center">
	<img src="images/table1.png" alt="Model Performance Table" width="1600"/>
	</p>
	<br>

	3-seed variance scores (mean across seeds 0, 1, 2):

	<p align="center">
	<img src="images/table2.png" alt="Model Performance Table" width="1600"/>
	</p>
	<br>

	Rule-Based Baseline (no LLM, deterministic)

	<p align="center">
	<img src="images/table3.png" alt="Model Performance Table" width="1600"/>
	</p>
	<br>

	---

	The task: A social network contains fake accounts organised into a
	single coordinated network of 10. The network behaves in a coordinated way — same posting hour,
	same IP subnet, stolen celebrity photos, copy-paste bios. The agent must find
	all 10 by navigating a limited step budget, inspecting accounts, and flagging suspects.

	What makes this non-trivial: The network is large (50–1000 accounts depending on difficulty). Fake accounts are mixed with innocent high-signal "decoy" accounts.In hard mode, the fake accounts actively evades — dropping intra-account follows, renaming profiles — while the agent is mid-investigation.The agent cannot see the full network upfront: it must explore via INSPECT and INVESTIGATE_NETWORK actions, spending steps to reveal information.

	What makes the learning novel: The LInference LLM (via aws bedrock) cannot be fine-tuned.it's a black-box API. The agent learns via Reflexion i.e., post-episode lessons are written back into memory and injected into every future prompt. A dynamic hybrid policy (α-weighted) blends the LLM with a deterministic rule engine, with the blend weight α updating based on recent win rate. Rules dominate early; the LLM takes over as it proves itself.

	---

	## Detection Signals

	Detection operates entirely on numeric metadata — no content processing. Three signal categories, computed at different points:

	Node signals (pre-computed by content pipeline, static per account):

	\| Feature \| Fake range \| Real range \| Notes \|
	\|---\|---\|---\|---\|
	\| `photo_reuse_score` \| 0.30–0.95 \| 0.00–0.15 \| pHash fingerprint match against celebrity photo DB \|
	\| `bio_template_score` \| 0.20–0.90 \| 0.00–0.12 \| Cosine sim to known fake bio templates \|
	\| `comment_repeat_score` \| 0.60–0.90 \| 0.00–0.08 \| Copy-paste spam fraction across accounts \|

	Behavioral signals (static, from account metadata):

	\| Feature \| Fake pattern \|
	\|---\|---\|
	\| `avg_post_hour` \| All 10 gang members cluster within ±0.5 hours — coordinated scheduling \|
	\| `account_age_days` \| Created within the same week (`base_age ± 7 days`) \|
	\| `shared_ip_count` \| 9 — all 10 share one IP subnet (`ip_gang_{seed}`) \|

	Graph signals (dynamic — computed at INSPECT time, shift as investigation progresses):

	\| Feature \| Why it matters \|
	\|---\|---\|
	\| `mutual_follow_rate` \| Gang members mutually follow each other at 0.6–0.9 density; legitimate hubs don't follow back \|
	\| `flagged_neighbor_count` \| Grows as more gang members are flagged — the cascade signal \|
	\| `post_hour_cluster_score` \| Alignment to mean posting hour of currently-flagged accounts (wrap-around aware) \|
	\| `suspicious_mutual_ratio` \| Used to compute hub legitimacy — protects celebrities from false positives \|

	Graph signals are the most powerful: once one gang member is flagged, `flagged_neighbor_count` rises for all connected members, compounding with each subsequent flag.

	---

	## Synthetic Network Composition

	150 episodes pre-generated deterministically (50 per task). Each episode is a JSON file (`episodes/{task}_{seed:03d}.json`).

	\| Task \| Accounts \| Gang \| Decoys \| Max steps \| Evasion \|
	\|---\|---\|---\|---\|---\|---\|
	\| easy \| 50 \| 10 \| 0 \| 30 \| None \|
	\| medium \| 200 \| 10 \| 20 \| 50 \| Step 20 (once) \|
	\| hard \| 1000 \| 10 \| 50 \| 80 \| Steps 15/30/45/60 \|

	- Gang: Dense intra-follow graph (density 0.60–0.80), same IP subnet, tightly clustered post hours (std 0.5/1.5/2.5 by task).
	- Decoys (medium/hard only): Real accounts with elevated `photo_reuse` and `bio_template` scores (0.20–0.40). They score as suspicious but are not gang members — they penalise reckless flagging.
	- Celebrities (2 per episode): 100k–5M followers, near-zero fake scores. Hub legitimacy formula protects them.
	- Zero-edge isolates (2 per episode): `follower_count=0`, no edges. Test whether the agent wastes steps on disconnected nodes.

	---

	## Actions

	\| Action \| Cost \| Effect \|
	\|---\|---\|---\|
	\| `inspect` \| 1 step \| Reveals full `AccountProfile` (all 22 features), adds neighbors to visible set \|
	\| `investigate_network` \| 2 steps \| Bidirectional 2-hop expansion — reveals account IDs only (no profiles); re-cascades SUSPECT \|
	\| `flag` \| 0 steps \| Marks account CONFIRMED_FAKE; dual cascade: follow-graph + IP cluster \|
	\| `unflag` \| 0 steps \| Clears CONFIRMED_FAKE status \|
	\| `submit` \| 0 steps \| Ends episode, triggers scoring \|

	Dual SUSPECT cascade on FLAG:
	1. Follow-graph: Every visible account that the flagged account follows → SUSPECT (high precision: gang follow density 0.70+).
	2. IP cluster: Every visible account sharing the same `ip_cluster_id` → SUSPECT (zero false positives: real accounts each have a unique IP; gang shares `ip_gang_{seed}`).

	Both mechanisms surface in `obs.suspect_ids` — the agent's highest-priority INSPECT targets.

	---

	## Risk Scoring (`server/scoring.py`)

	All functions are stateless, called inside `_build_profile()` at INSPECT time and on re-profiling after each FLAG.

	```
	node_risk = 0.60 × photo_reuse + 0.40 × bio_template

	age_norm = min(1.0, account_age_days / 365)
	behavior_risk = 0.55 × (1 − age_norm) + 0.45 × post_hour_cluster_score

	flagged_ratio = flagged_neighbor_count / max(inspected_neighbor_count, 1)
	graph_risk = 0.45 × flagged_ratio + 0.35 × mutual_follow_rate + 0.20 × avg_neighbor_photo_reuse

	hub_legitimacy = 0.45 × log(1+followers)/log(1+1M)
	+ 0.25 × (1 − follow_ratio_norm)
	+ 0.20 × age_norm
	+ 0.10 × (1 − suspicious_mutual_ratio)

	fake_risk = clip(0.30×node_risk + 0.25×behavior_risk + 0.45×graph_risk − 0.25×hub_legitimacy, 0, 1)
	```

	Weight rationale: Graph risk (0.45) is dominant — structural signals are hardest to fake and compound across the investigation. Hub legitimacy is subtractive — a celebrity with 5M followers produces `hub_legitimacy ≈ 1.0`, making their fake_risk near zero even if gang members follow them.

	Classification thresholds:
	- `fake_risk < 0.35` → normal
	- `0.35 ≤ fake_risk < 0.60` → suspect
	- `fake_risk ≥ 0.60` → confirmed_fake (formula-level; explicit FLAG overrides)

	Grader score (normalised [0.0, 1.0], returned by `/grader`):
	```
	recall = tp / 10
	precision = tp / max(tp + fp, 1)
	efficiency = max(0, (max_steps − steps_used) / max_steps)

	if recall ≥ 0.8 AND precision ≥ 0.7:
	score = 0.55 + 0.20×recall + 0.15×precision + 0.10×efficiency
	else:
	score = 0.30×recall + 0.10×precision
	```
	Maximum 1.0 (all 10 found, zero false positives, zero steps used). Win threshold ≈ 0.815.

	---

	## Hybrid Policy (`agent/hybrid_policy.py`)

	The agent blends a deterministic rule engine with Qwen3-Next-80B (via AWS Bedrock) using a per-task trust weight α.

	Alpha update (per episode, after win/loss recorded):
	```
	reflection_factor = min(1.0, n_reflections / 4.0)
	raw = 0.20 + reflection_factor × (0.80 × recent_win_rate + 0.12)
	alpha = clamp(raw, 0.20, task_cap)
	```

	\| Task \| α cap \| Rationale \|
	\|---\|---\|---\|
	\| easy \| 0.50 \| Rule engine alone hits ~91% — LLM assists, doesn't override \|
	\| medium \| 0.70 \| Decoys require LLM judgment, but cascade must stay \|
	\| hard \| 0.85 \| LLM needs latitude for evasion adaptation \|

	`reflection_factor` gates α: the LLM must accumulate ≥4 post-episode lessons before reaching meaningful trust, regardless of raw win rate.

	Blending decision:
	```python
	rule_action, rule_conf = get_rule_action(obs) # deterministic, with confidence score
	llm_action, _ = get_action(obs, ...) # Qwen3 via Bedrock

	if rule_action == llm_action: final = llm_action # agree
	elif rule_conf >= alpha: final = rule_action # rule overrides
	else: final = llm_action # LLM trusted
	```

	Rule confidences: SUBMIT-forced=1.00, INSPECT-suspect=0.95, FLAG-high-risk=0.95, FLAG-threshold=0.70+, INSPECT-explore=0.30. At `α=0.50` (easy cap), safety decisions (suspects, forced submit) always override; exploration goes to the LLM.

	Reflexion learning: After each episode, Qwen3 generates a 2–3 sentence lesson from the action log and outcome. Lessons are stored in `memory/reflections_{task}.jsonl` and injected into every future prompt (last 4 lessons + best winning trajectory as few-shot example). Memory persists across container restarts via Docker volume.

	---

	## API Reference

	\| Endpoint \| Method \| Description \|
	\|---\|---\|---\|
	\| `/health` \| GET \| `{"status": "healthy"}` \|
	\| `/tasks` \| GET \| Task list + `action_schema` + `score_range: [0.0, 1.0]` \|
	\| `/reset` \| POST \| `{task, seed}` → initial observation \|
	\| `/step` \| POST \| `{action_type, account_id?}` → updated observation \|
	\| `/state` \| GET \| Episode metadata (step count, task, score, evasion count) \|
	\| `/grader` \| GET \| Normalised [0.0, 1.0] score after SUBMIT (400 if not done) \|
	\| `/baseline` \| POST \| Runs rule-based agent on all 3 tasks, seed=0 \|
	\| `/metadata` \| GET \| OpenEnv metadata block \|
	\| `/schema` \| GET \| Full JSON schema for actions and observations \|
	\| `/mcp` \| POST \| JSON-RPC 2.0 tool discovery (Model Context Protocol) \|

	Live: `https://pandago-graphstrike.hf.space`

	---

	## File Structure

	```
	server/
	app.py — FastAPI + Gradio UI (gr.mount_gradio_app)
	environment.py — Episode lifecycle, action mechanics, cascade logic
	generator.py — Deterministic episode generation (150 JSON files)
	scoring.py — Stateless risk formula functions
	models.py — Pydantic models: AccountProfile, FakeGangObservation, ActionType

	agent/
	policy.py — Qwen3 prompt construction + action parsing
	hybrid_policy.py — Alpha blending, rule engine with confidence scores
	reflection.py — Post-episode lesson generation
	memory.py — JSONL persistence for reflections, trajectories, alpha

	inference.py — Submission entrypoint: [START]/[STEP]/[END] structured logs, OpenAI client
	validate.py — 24-point pre-submission validator (local + HTTP)
	train.py — Full training loop with curriculum
	episodes/ — 150 pre-generated JSON episode files (baked into Docker image)
	memory/ — Docker volume: reflections, win history, alpha values
	```

	---

	## Baseline Scores

	\| Task \| Seed=0 \| Win rate (50 seeds) \| Mean (50 seeds) \|
	\|---\|---\|---\|---\|
	\| easy \| 0.910 \| 100% \| ~0.91 \|
	\| medium \| 0.906 \| 84% \| ~0.77 \|
	\| hard \| 0.9038 \| 52% \| ~0.47 \|

	The rule-based baseline (no LLM) is competitive on easy/medium. Hard is the real differentiator — evasion events drop intra-gang edges mid-investigation, destroying graph signals. Frontier LLM agents with accumulated reflections adapt; the rule engine degrades.

	---

	Built by team computeXor