Spaces:
Sleeping
Sleeping
File size: 6,517 Bytes
0dc27c9 e308182 0f72558 e308182 0f72558 e308182 0f72558 e308182 0f72558 24935a5 e308182 0dc27c9 e308182 0dc27c9 e308182 0dc27c9 e308182 0dc27c9 8b0bc5e 0dc27c9 8b0bc5e 0dc27c9 e308182 0dc27c9 e308182 0dc27c9 e308182 0dc27c9 e308182 0dc27c9 e308182 0dc27c9 e308182 0dc27c9 e308182 0dc27c9 e308182 0dc27c9 e308182 291987c 0dc27c9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | ---
title: ContextPrune
emoji: 🧹
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
# ContextPrune: Adaptive Context Garbage Collection for RAG
ContextPrune is a benchmark environment designed to solve the **"Attention Dilution"** problem in Large Language Model (LLM) workflows. It treats context management as a form of **Garbage Collection**, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.
---
## 1. System Overview
In standard RAG, retrieval often returns too much irrelevant data, causing models to "lose the signal" or hallucinate. ContextPrune provides a Reinforcement Learning (RL) environment where agents are trained to surgically manage their context window.
### Architecture Flow
```mermaid
graph TD
A[User / Agent] -->|Execute Actions| B[FastAPI / Streamlit Interface]
B -->|RagAction| C[ContextPrune Environment]
C -->|Update State| D[State Machine]
D -->|Token Budgeting| E[Context Working Set]
D -->|Hybrid Retrieval| F[Corpus Search]
C -->|Terminal Action| G[Deterministic Grader]
G -->|Weighted Reward| A
```
---
## 2. Methodology: The Operational Loop
ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.
| Stage | Action | Rationale |
| :--- | :--- | :--- |
| **Triage** | `inspect_artifact` | Low-cost preview of artifact keywords and domains to filter out "Garbage" early. |
| **Analysis** | `prioritize_artifact` | Committing specific evidence to the working set. Consumes token budget. |
| **Optimization** | `summarize_artifact` | AI-driven compression. Reduces token footprint while attempting to preserve "Grounding" tokens. |
| **Resolution** | `set_resolution_plan` | Forces the agent to internalize the evidence into a logical plan before producing an output. |
| **Submission** | `submit_report` | Terminates the episode. The output must be grounded exclusively in the working set. |
---
## 3. Observation Space
The `RagObservation` provides the agent with the internal state of the incident and the current working set budget.
| Field | Type | Description |
| :--- | :--- | :--- |
| `case_id` | `str` | Unique simulated case identifier |
| `case_summary` | `str` | Real-world case context and background |
| `objective` | `str` | Specific deliverable the agent must produce |
| `workflow_stage` | `triage \| analysis \| resolution \| submitted` | Current stage in the operational loop |
| `customer_tier` | `standard \| business \| enterprise` | Customer criticality and SLA priority |
| `incident_severity` | `sev3 \| sev2 \| sev1` | Impact magnitude of the incident |
| `available_artifacts` | `List[ChunkSummary]` | Metadata for artifacts available for inspection |
| `reviewed_artifacts` | `List[str]` | IDs of artifacts already triaged |
| `prioritized_artifacts` | `List[str]` | IDs of artifacts currently in the working set |
| `plan_draft` | `Optional[str]` | Current state of the resolution plan |
| `total_tokens_used` | `int` | Current token cost of the working set |
| `token_budget` | `int` | Maximum allowed token budget |
| `step_number` | `int` | Current step index in the episode |
| `task_name` | `str` | Name of the active benchmark task |
---
## 4. Action Space
Agents interact with the environment through the following canonical actions:
| Action Type | Parameters | Effect |
| :--- | :--- | :--- |
| `inspect_artifact` | `artifact_id` | Review artifact keywords without committing to the working set |
| `prioritize_artifact` | `artifact_id` | Add a reviewed artifact to the working set (consumes tokens) |
| `summarize_artifact` | `artifact_id`, `compression_ratio` | Compress a prioritized artifact using AI summarization |
| `set_resolution_plan` | `plan` | Update the draft plan before final submission |
| `submit_report` | `answer` | Generate final response and terminate the episode |
---
## 5. Reward Engineering (The Benchmarking Grader)
The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics.
- **Required Coverage (24%)**: Inclusion of critical "Gold" artifacts.
- **Cross-Domain Variety (12%)**: Rewards correlation across Support, Incident logs, and Release guardrails.
- **Triage Thoroughness (12%)**: Penalizes skipping the inspection phase.
- **Planning Logic (16%)**: Alignment between the drafted plan and ground truth steps.
- **Reporting Accuracy (18%)**: Presence of mission-critical operational keywords.
- **Citation Fidelity (10%)**: Verification that claimed evidence is in the working set.
- **Token Efficiency (8%)**: Scaled bonus for minimal context usage.
- **Hallucination Penalty (-18%)**: Severe deduction for unsupported claims.
---
## 6. Scenario Benchmarks
| Task | Difficulty | Steps | Budget | Key Challenge |
| :--- | :--- | :--- | :--- | :--- |
| `refund_triage_easy` | Easy | 7 | 850 | Systematically checking policy artifacts before relief. |
| `cross_function_brief_medium` | Medium | 8 | 620 | Filtering overlapping narratives for a singular source of truth. |
| `executive_escalation_hard` | Hard | 10 | 360 | Correlating suspicious logs with release freezes on a tight budget. |
---
## 7. Configuration & Environment
### Environment Variables
| Variable | Default | Purpose |
| :--- | :--- | :--- |
| `API_BASE_URL` | `https://router.huggingface.co/v1` | OpenAI-compatible inference endpoint |
| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | Model used for baseline tasks |
| `HF_TOKEN` | *None* | Authentication for Hugging Face Inference API |
| `RAG_ENV_URL` | `http://localhost:7860` | Base URL for the ContextPrune server |
### Project Components
- **`rag_optimizer_env/`**: State machine, hybrid retrieval, and token estimation.
- **`app.py`**: FastAPI implementation for remote agent interaction.
- **`inference.py`**: Baseline agent script (OpenAI-compatible).
- **`validate.py`**: Robust validation suite for episode lifecycle verification.
---
## 🚀 Quick Start
1. **Setup**: `pip install -r requirements.txt`
2. **Server**: `python app.py` (Runs on Port 7860)
3. **Control Panel**: `streamlit run optimizer_ui.py`
4. **Validation**: `python validate.py`
---
## 🌎 Live Deployment
- **Space URL**: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune)
- **Direct App Link**: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/)
- **Space Repo ID**: `prithic07/context-prune`
Built for Context Optimization Research. |