Spaces:

ThejasRao
/

openenv-content-moderation

Sleeping

App Files Files Community

openenv-content-moderation / docs /ARCHITECTURE.md

ThejasRao

Initial OpenENV hackathon submission

c492c3f about 2 months ago

preview code

raw

history blame contribute delete

8.71 kB

	# 📄 ARCHITECTURE.md — System Design & Component Architecture

	---

	## 🧠 Overview

	This document defines the system architecture for the AI Community Moderation Environment.

	The system is designed to:

	* comply with OpenEnv specification
	* support multi-step agent interaction
	* provide deterministic evaluation
	* be easily deployable via FastAPI + Docker

	---

	## 🏗️ High-Level Architecture

	### 🔹 Updated — Agent layer added

	```text
	┌─────────────────────────────────────────────────────────────────┐
	│ Agent Layer │
	│ │
	│ ┌──────────────────────┐ ┌──────────────────────────────┐ │
	│ │ Gemini 2.5 Flash │ │ Rule-based Baseline Agent │ │
	│ │ (agent/gemini_ │ │ (baseline/agent.py) │ │
	│ │ agent.py) │ │ │ │
	│ └──────────┬───────────┘ └─────────────┬────────────────┘ │
	│ │ /reset /step /agent/run │ /baseline │
	└──────────────┼──────────────────────────────┼───────────────────┘
	│ │
	▼ ▼
	┌─────────────────────────────────────────────────────────────────┐
	│ FastAPI API (api/app.py) │
	│ /reset /step /state /tasks /grader /baseline /agent/run │
	└────────────────────────────┬────────────────────────────────────┘
	│
	┌────────────────────▼────────────────────┐
	│ Environment Core (env/) │
	└───────┬─────────────────┬───────────────┘
	│ │
	┌─────────▼──────┐ ┌───────▼───────┐
	│ State Manager │ │ Reward Engine │
	└─────────┬──────┘ └───────────────┘
	│
	┌─────────▼──────────┐
	│ Policy Engine │
	└─────────┬──────────┘
	│
	┌─────────▼──────────┐
	│ Data Generator │
	└─────────────────────┘

	┌─────────────────────────────────────┐
	│ Grader Engine │
	└─────────────────────────────────────┘
	```

	---

	## 📦 Core Modules

	---

	## 1. 🧠 Environment Core (`env/`)

	### Responsibility

	Implements OpenEnv interface:

	* `reset()`
	* `step(action)`
	* `state()`

	### Key File

	```python id="f36n7z"
	env/moderation_env.py
	```

	### Responsibilities

	* orchestrates entire flow
	* maintains episode lifecycle
	* integrates all sub-components

	---

	## 2. 📊 State Manager (`env/state_manager.py`)

	### Responsibility

	Handles:

	* current state representation
	* updates after each action

	### State Includes

	* post content
	* user history
	* reports
	* geo
	* context

	---

	## 3. 🧾 Policy Engine (`env/policy_engine.py`)

	### Responsibility

	* implements rules from `POLICY.md`
	* computes:

	* violation type
	* severity
	* expected action

	### Key Function

	```python id="0w8e3u"
	def evaluate_policy(state) -> dict:
	return {
	"violation_type": ...,
	"severity": ...,
	"expected_action": ...
	}
	```

	---

	## 4. 🧬 Data Generator (`env/data_generator.py`)

	### Responsibility

	* generates synthetic moderation scenarios
	* ensures deterministic outputs

	### Features

	* template-based post generation
	* context simulation (history, reports, geo)
	* seed-controlled reproducibility

	---

	## 5. 🏆 Reward Engine (`env/reward_engine.py`)

	### Responsibility

	* computes step-wise rewards
	* aligns with `REWARD.md`

	### Input

	* current state
	* action
	* ground truth

	---

	## 6. 🧪 Grader Engine (`graders/`)

	### Responsibility

	* computes final episode score
	* aligns with `GRADERS.md`

	### Key File

	```python id="4bqj7f"
	graders/grader.py
	```

	---

	## 7. 🌐 API Layer (`api/`)

	### Framework

	* FastAPI

	### Responsibilities

	Expose endpoints:

	\| Endpoint \| Function \|
	\| ------------- \| -------------------------------- \|
	\| `/reset` \| start new episode \|
	\| `/step` \| take action \|
	\| `/state` \| current state \|
	\| `/tasks` \| list tasks \|
	\| `/grader` \| compute score \|
	\| `/baseline` \| run rule-based baseline agent \|
	\| `/agent/run` \| run Gemini 2.5 Flash agent 🔹 \|
	\| `/health` \| liveness check \|

	---

	## 8. 🤖 Baseline Agent (`baseline/`)

	### Responsibility

	* runs simple rule-based heuristic agent
	* produces reproducible benchmark without any LLM dependency

	---

	## 9. 🧠 Gemini Agent (`agent/`) — 🔹 Added

	### Responsibility

	* LLM-driven agent using Gemini 2.5 Flash (google-genai SDK)
	* interacts with the environment via the same `/reset` and `/step` API
	* uses multi-turn chat with a structured system prompt
	* parses JSON action responses; falls back to `escalate` on parse failure

	### Key Files

	```
	agent/gemini_agent.py — GeminiAgent class
	agent/prompts.py — SYSTEM_PROMPT + build_turn_prompt()
	```

	### Design Constraint

	The LLM is only the decision-making layer. Policy evaluation, reward
	computation, and grading remain fully deterministic in the environment.

	---

	## 🔁 Data Flow

	### 1. Reset

	```text id="5v3sbh"
	API → Environment.reset()
	→ DataGenerator
	→ StateManager
	→ Initial State returned
	```

	---

	### 2. Step

	```text id="4cx7qf"
	Agent Action → API
	→ Environment.step()
	→ StateManager update
	→ PolicyEngine evaluate
	→ RewardEngine compute
	→ New State returned
	```

	---

	### 3. Grading

	```text id="yy9gsl"
	Episode complete → GraderEngine
	→ Score computed
	→ Returned via API
	```

	---

	## 🧠 Internal Interaction Flow

	```text id="4l9p6p"
	Action
	↓
	State Update
	↓
	Policy Evaluation
	↓
	Reward Calculation
	↓
	Next State
	```

	---

	## 🧩 Component Dependencies

	\| Component \| Depends On \|
	\| ------------- \| --------------------- \|
	\| Environment \| State, Policy, Reward \|
	\| State Manager \| Data Generator \|
	\| Reward Engine \| Policy Engine \|
	\| Grader \| Policy + Trajectory \|
	\| API \| Environment \|

	---

	## ⚙️ Execution Model

	* single-threaded environment (MVP)
	* stateless API with session tracking (in-memory)
	* reproducible seeds for all scenarios

	---

	## 🐳 Deployment Architecture

	```text id="0shdn3"
	Docker Container
	├── FastAPI Server
	├── Environment Core
	├── Graders
	└── Baseline Agent
	```

	Runs on:

	* Hugging Face Spaces
	* local Docker

	---

	## 🧠 Design Principles

	### 1. Modularity

	Each component is isolated and testable

	---

	### 2. Determinism

	All outputs reproducible

	---

	### 3. Simplicity

	Avoid unnecessary abstraction

	---

	### 4. Spec Compliance

	Strict adherence to OpenEnv interface

	---

	## 🧠 One-Line Summary

	> A modular, deterministic system where an environment core orchestrates policy evaluation, reward computation, and grading through well-defined components exposed via a FastAPI interface.

	---