Spaces:
Sleeping
Sleeping
| title: ContextPrune | |
| emoji: 🧹 | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: false | |
| # ContextPrune: Adaptive Context Garbage Collection for RAG | |
| ContextPrune is a benchmark environment designed to solve the **"Attention Dilution"** problem in Large Language Model (LLM) workflows. It treats context management as a form of **Garbage Collection**, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines. | |
| --- | |
| ## 1. System Overview | |
| In standard RAG, retrieval often returns too much irrelevant data, causing models to "lose the signal" or hallucinate. ContextPrune provides a Reinforcement Learning (RL) environment where agents are trained to surgically manage their context window. | |
| ### Architecture Flow | |
| ```mermaid | |
| graph TD | |
| A[User / Agent] -->|Execute Actions| B[FastAPI / Streamlit Interface] | |
| B -->|RagAction| C[ContextPrune Environment] | |
| C -->|Update State| D[State Machine] | |
| D -->|Token Budgeting| E[Context Working Set] | |
| D -->|Hybrid Retrieval| F[Corpus Search] | |
| C -->|Terminal Action| G[Deterministic Grader] | |
| G -->|Weighted Reward| A | |
| ``` | |
| --- | |
| ## 2. Methodology: The Operational Loop | |
| ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response. | |
| | Stage | Action | Rationale | | |
| | :--- | :--- | :--- | | |
| | **Triage** | `inspect_artifact` | Low-cost preview of artifact keywords and domains to filter out "Garbage" early. | | |
| | **Analysis** | `prioritize_artifact` | Committing specific evidence to the working set. Consumes token budget. | | |
| | **Optimization** | `summarize_artifact` | AI-driven compression. Reduces token footprint while attempting to preserve "Grounding" tokens. | | |
| | **Resolution** | `set_resolution_plan` | Forces the agent to internalize the evidence into a logical plan before producing an output. | | |
| | **Submission** | `submit_report` | Terminates the episode. The output must be grounded exclusively in the working set. | | |
| --- | |
| ## 3. Observation Space | |
| The `RagObservation` provides the agent with the internal state of the incident and the current working set budget. | |
| | Field | Type | Description | | |
| | :--- | :--- | :--- | | |
| | `case_id` | `str` | Unique simulated case identifier | | |
| | `case_summary` | `str` | Real-world case context and background | | |
| | `objective` | `str` | Specific deliverable the agent must produce | | |
| | `workflow_stage` | `triage \| analysis \| resolution \| submitted` | Current stage in the operational loop | | |
| | `customer_tier` | `standard \| business \| enterprise` | Customer criticality and SLA priority | | |
| | `incident_severity` | `sev3 \| sev2 \| sev1` | Impact magnitude of the incident | | |
| | `available_artifacts` | `List[ChunkSummary]` | Metadata for artifacts available for inspection | | |
| | `reviewed_artifacts` | `List[str]` | IDs of artifacts already triaged | | |
| | `prioritized_artifacts` | `List[str]` | IDs of artifacts currently in the working set | | |
| | `plan_draft` | `Optional[str]` | Current state of the resolution plan | | |
| | `total_tokens_used` | `int` | Current token cost of the working set | | |
| | `token_budget` | `int` | Maximum allowed token budget | | |
| | `step_number` | `int` | Current step index in the episode | | |
| | `task_name` | `str` | Name of the active benchmark task | | |
| --- | |
| ## 4. Action Space | |
| Agents interact with the environment through the following canonical actions: | |
| | Action Type | Parameters | Effect | | |
| | :--- | :--- | :--- | | |
| | `inspect_artifact` | `artifact_id` | Review artifact keywords without committing to the working set | | |
| | `prioritize_artifact` | `artifact_id` | Add a reviewed artifact to the working set (consumes tokens) | | |
| | `summarize_artifact` | `artifact_id`, `compression_ratio` | Compress a prioritized artifact using AI summarization | | |
| | `set_resolution_plan` | `plan` | Update the draft plan before final submission | | |
| | `submit_report` | `answer` | Generate final response and terminate the episode | | |
| --- | |
| ## 5. Reward Engineering (The Benchmarking Grader) | |
| The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics. | |
| - **Required Coverage (24%)**: Inclusion of critical "Gold" artifacts. | |
| - **Cross-Domain Variety (12%)**: Rewards correlation across Support, Incident logs, and Release guardrails. | |
| - **Triage Thoroughness (12%)**: Penalizes skipping the inspection phase. | |
| - **Planning Logic (16%)**: Alignment between the drafted plan and ground truth steps. | |
| - **Reporting Accuracy (18%)**: Presence of mission-critical operational keywords. | |
| - **Citation Fidelity (10%)**: Verification that claimed evidence is in the working set. | |
| - **Token Efficiency (8%)**: Scaled bonus for minimal context usage. | |
| - **Hallucination Penalty (-18%)**: Severe deduction for unsupported claims. | |
| --- | |
| ## 6. Scenario Benchmarks | |
| | Task | Difficulty | Steps | Budget | Key Challenge | | |
| | :--- | :--- | :--- | :--- | :--- | | |
| | `refund_triage_easy` | Easy | 7 | 850 | Systematically checking policy artifacts before relief. | | |
| | `cross_function_brief_medium` | Medium | 8 | 620 | Filtering overlapping narratives for a singular source of truth. | | |
| | `executive_escalation_hard` | Hard | 10 | 360 | Correlating suspicious logs with release freezes on a tight budget. | | |
| --- | |
| ## 7. Configuration & Environment | |
| ### Environment Variables | |
| | Variable | Default | Purpose | | |
| | :--- | :--- | :--- | | |
| | `API_BASE_URL` | `https://router.huggingface.co/v1` | OpenAI-compatible inference endpoint | | |
| | `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | Model used for baseline tasks | | |
| | `HF_TOKEN` | *None* | Authentication for Hugging Face Inference API | | |
| | `RAG_ENV_URL` | `http://localhost:7860` | Base URL for the ContextPrune server | | |
| ### Project Components | |
| - **`rag_optimizer_env/`**: State machine, hybrid retrieval, and token estimation. | |
| - **`app.py`**: FastAPI implementation for remote agent interaction. | |
| - **`inference.py`**: Baseline agent script (OpenAI-compatible). | |
| - **`validate.py`**: Robust validation suite for episode lifecycle verification. | |
| --- | |
| ## 🚀 Quick Start | |
| 1. **Setup**: `pip install -r requirements.txt` | |
| 2. **Server**: `python app.py` (Runs on Port 7860) | |
| 3. **Control Panel**: `streamlit run optimizer_ui.py` | |
| 4. **Validation**: `python validate.py` | |
| --- | |
| ## 🌎 Live Deployment | |
| - **Space URL**: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune) | |
| - **Direct App Link**: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/) | |
| - **Space Repo ID**: `prithic07/context-prune` | |
| Built for Context Optimization Research. |