Spaces:

williyam
/

agentic-rag-gym

Sleeping

App Files Files Community

agentic-rag-gym / documents /data_flow.md

williyam

feat: complete Agentic RAG Gym implementation

7f38b9c about 1 month ago

preview code

raw

history blame contribute delete

5.91 kB

	# Data Flow — Agentic RAG Gym

	## Episode Lifecycle

	```
	Agent Environment Backend
	│ │ │
	│──── POST /reset ────────▶│ │
	│ │── Select Task │
	│ │── Initialize State │
	│ │── Clear Trajectory │
	│◀──── Observation ────────│ │
	│ │ │
	│──── POST /step ─────────▶│ │
	│ {type: "retrieve"} │── Dispatch to Handler │
	│ │── Multi-Agent Processing ──▶│── FAISS Search
	│ │◀── Retrieval Results ──────│
	│ │── Compute Step Reward │
	│ │── Record to Trajectory │
	│◀──── (obs, reward) ─────│ │
	│ │ │
	│──── POST /step ─────────▶│ │
	│ {type: "reason"} │── Dispatch to Reasoner │
	│ │── Agent Reasoning ─────────▶│── LLM API Call
	│ │◀── Reasoning Output ───────│
	│ │── Compute Step Reward │
	│◀──── (obs, reward) ─────│ │
	│ │ │
	│──── POST /step ─────────▶│ │
	│ {type: "answer"} │── Dispatch to Answer │
	│ │── Check Termination │
	│ │── Compute Episode Reward │
	│◀──── (obs, reward, done)│ │
	│ │ │
	│──── POST /grade ────────▶│ │
	│ │── Domain Grader Evaluation │
	│◀──── Score [0.01-0.99] ─│ │
	```

	## Multi-Agent Message Flow

	```
	┌──────────┐ query ┌──────────┐ retrieval ┌──────────┐
	│ Planner │────────────────▶│ Retriever│───results─────▶│ Reasoner │
	│ Agent │ │ Agent │ │ Agent │
	└──────────┘ └──────────┘ └────┬─────┘
	▲ │
	│ reasoning
	refinement output
	query │
	│ ┌────▼─────┐
	┌────┴─────┐ │ Critic │
	│ Critic │◀─────────────│ Agent │
	│ (loop) │ └──────────┘
	└──────────┘
	┌──────────┐
	│ Verifier │
	│ Agent │
	└──────────┘
	```

	## Reward Computation Pipeline

	```
	Step Record
	│
	├──▶ Retrieval Quality Signal ──────────┐
	│ (doc relevance scores) │
	│ │
	├──▶ Reasoning Quality Signal ──────────┤
	│ (trace analysis: evidence, logic) │ ┌──────────────────┐
	│ ├────▶│ Weighted │
	├──▶ Answer Quality Signal ─────────────┤ │ Combination │──▶ Step Reward
	│ (length, grounding, coverage) │ └──────────────────┘
	│ │ │
	├──▶ Efficiency Signal ─────────────────┤ │
	│ (step_ratio penalty) │ ▼
	│ │ Anti-Hacking
	└──▶ Anti-Hacking Penalty ──────────────┘ Verification
	(repetition, degenerate output) │
	▼
	Clamped [0.01, 0.99]
	```

	## Database Persistence Flow

	```
	Episode Complete
	│
	├──▶ Save EpisodeRecord (episode_id, task_id, score, answer)
	│
	└──▶ Save StepLogs (per-step action, reward, reasoning trace)
	│
	└──▶ Available for:
	• Training data extraction
	• Performance analytics
	• Self-improvement curriculum
	```