Spaces:
Sleeping
Sleeping
Docs: Fully restore technical specifications and tables
Browse files
README.md
CHANGED
|
@@ -1,3 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# ContextPrune: Adaptive Context Garbage Collection for RAG
|
| 2 |
|
| 3 |
ContextPrune is a benchmark environment designed to solve the **"Attention Dilution"** problem in Large Language Model (LLM) workflows. It treats context management as a form of **Garbage Collection**, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.
|
|
@@ -24,7 +33,7 @@ graph TD
|
|
| 24 |
|
| 25 |
## 2. Methodology: The Operational Loop
|
| 26 |
|
| 27 |
-
ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.
|
| 28 |
|
| 29 |
| Stage | Action | Rationale |
|
| 30 |
| :--- | :--- | :--- |
|
|
@@ -36,63 +45,97 @@ ContextPrune enforces a 5-staged workflow that mirrors enterprise incident respo
|
|
| 36 |
|
| 37 |
---
|
| 38 |
|
| 39 |
-
## 3.
|
| 40 |
|
| 41 |
-
The
|
| 42 |
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
---
|
| 53 |
|
| 54 |
-
##
|
| 55 |
|
| 56 |
-
|
| 57 |
|
| 58 |
-
|
| 59 |
-
- **
|
| 60 |
-
- **
|
| 61 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
##
|
| 64 |
-
- **Objective**: Align Support, Incident Command, and Release Engineering during a payment processing failure.
|
| 65 |
-
- **Budget**: 620 Tokens.
|
| 66 |
-
- **Core Challenge**: Filtering through overlapping narratives to find the "single source of truth" for customer comms.
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
|
|
|
| 72 |
|
| 73 |
---
|
| 74 |
|
| 75 |
-
##
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
---
|
| 83 |
|
| 84 |
## 🚀 Quick Start
|
| 85 |
|
| 86 |
1. **Setup**: `pip install -r requirements.txt`
|
| 87 |
-
2. **Server**: `python app.py` (Runs on Port
|
| 88 |
3. **Control Panel**: `streamlit run optimizer_ui.py`
|
| 89 |
4. **Validation**: `python validate.py`
|
| 90 |
|
| 91 |
-
|
| 92 |
---
|
| 93 |
|
| 94 |
## 🌎 Live Deployment
|
| 95 |
|
| 96 |
- **Space URL**: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune)
|
| 97 |
- **Direct App Link**: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/)
|
| 98 |
-
- **Space Repo**: `prithic07/context-prune`
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: ContextPrune
|
| 3 |
+
emoji: 🧹
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: indigo
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: false
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
# ContextPrune: Adaptive Context Garbage Collection for RAG
|
| 11 |
|
| 12 |
ContextPrune is a benchmark environment designed to solve the **"Attention Dilution"** problem in Large Language Model (LLM) workflows. It treats context management as a form of **Garbage Collection**, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.
|
|
|
|
| 33 |
|
| 34 |
## 2. Methodology: The Operational Loop
|
| 35 |
|
| 36 |
+
ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.
|
| 37 |
|
| 38 |
| Stage | Action | Rationale |
|
| 39 |
| :--- | :--- | :--- |
|
|
|
|
| 45 |
|
| 46 |
---
|
| 47 |
|
| 48 |
+
## 3. Observation Space
|
| 49 |
|
| 50 |
+
The `RagObservation` provides the agent with the internal state of the incident and the current working set budget.
|
| 51 |
|
| 52 |
+
| Field | Type | Description |
|
| 53 |
+
| :--- | :--- | :--- |
|
| 54 |
+
| `case_id` | `str` | Unique simulated case identifier |
|
| 55 |
+
| `case_summary` | `str` | Real-world case context and background |
|
| 56 |
+
| `objective` | `str` | Specific deliverable the agent must produce |
|
| 57 |
+
| `workflow_stage` | `triage \| analysis \| resolution \| submitted` | Current stage in the operational loop |
|
| 58 |
+
| `customer_tier` | `standard \| business \| enterprise` | Customer criticality and SLA priority |
|
| 59 |
+
| `incident_severity` | `sev3 \| sev2 \| sev1` | Impact magnitude of the incident |
|
| 60 |
+
| `available_artifacts` | `List[ChunkSummary]` | Metadata for artifacts available for inspection |
|
| 61 |
+
| `reviewed_artifacts` | `List[str]` | IDs of artifacts already triaged |
|
| 62 |
+
| `prioritized_artifacts` | `List[str]` | IDs of artifacts currently in the working set |
|
| 63 |
+
| `plan_draft` | `Optional[str]` | Current state of the resolution plan |
|
| 64 |
+
| `total_tokens_used` | `int` | Current token cost of the working set |
|
| 65 |
+
| `token_budget` | `int` | Maximum allowed token budget |
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
## 4. Action Space
|
| 70 |
+
|
| 71 |
+
Agents interact with the environment through the following canonical actions:
|
| 72 |
+
|
| 73 |
+
| Action Type | Parameters | Effect |
|
| 74 |
+
| :--- | :--- | :--- |
|
| 75 |
+
| `inspect_artifact` | `artifact_id` | Review artifact keywords without committing to the working set |
|
| 76 |
+
| `prioritize_artifact` | `artifact_id` | Add a reviewed artifact to the working set (consumes tokens) |
|
| 77 |
+
| `summarize_artifact` | `artifact_id`, `ratio` | Compress a prioritized artifact using AI summarization |
|
| 78 |
+
| `set_resolution_plan` | `plan` | Update the draft plan before final submission |
|
| 79 |
+
| `submit_report` | `answer` | Generate final response and terminate the episode |
|
| 80 |
|
| 81 |
---
|
| 82 |
|
| 83 |
+
## 5. Reward Engineering (The Benchmarking Grader)
|
| 84 |
|
| 85 |
+
The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics.
|
| 86 |
|
| 87 |
+
- **Required Coverage (24%)**: Inclusion of critical "Gold" artifacts.
|
| 88 |
+
- **Cross-Domain Variety (12%)**: Rewards correlation across Support, Incident logs, and Release guardrails.
|
| 89 |
+
- **Triage Thoroughness (12%)**: Penalizes skipping the inspection phase.
|
| 90 |
+
- **Planning Logic (16%)**: Alignment between the drafted plan and ground truth steps.
|
| 91 |
+
- **Reporting Accuracy (18%)**: Presence of mission-critical operational keywords.
|
| 92 |
+
- **Citation Fidelity (10%)**: Verification that claimed evidence is in the working set.
|
| 93 |
+
- **Token Efficiency (8%)**: Scaled bonus for minimal context usage.
|
| 94 |
+
- **Hallucination Penalty (-18%)**: Severe deduction for unsupported claims.
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
|
| 98 |
+
## 6. Scenario Benchmarks
|
|
|
|
|
|
|
|
|
|
| 99 |
|
| 100 |
+
| Task | Difficulty | Steps | Budget | Key Challenge |
|
| 101 |
+
| :--- | :--- | :--- | :--- | :--- |
|
| 102 |
+
| `refund_triage_easy` | Easy | 7 | 850 | Systematically checking policy artifacts before relief. |
|
| 103 |
+
| `cross_function_brief_medium` | Medium | 8 | 620 | Filtering overlapping narratives for a singular source of truth. |
|
| 104 |
+
| `executive_escalation_hard` | Hard | 10 | 360 | Correlating suspicious logs with release freezes on a tight budget. |
|
| 105 |
|
| 106 |
---
|
| 107 |
|
| 108 |
+
## 7. Configuration & Environment
|
| 109 |
|
| 110 |
+
### Environment Variables
|
| 111 |
+
| Variable | Default | Purpose |
|
| 112 |
+
| :--- | :--- | :--- |
|
| 113 |
+
| `API_BASE_URL` | `https://router.huggingface.co/v1` | OpenAI-compatible inference endpoint |
|
| 114 |
+
| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | Model used for baseline tasks |
|
| 115 |
+
| `HF_TOKEN` | *None* | Authentication for Hugging Face Inference API |
|
| 116 |
+
| `RAG_ENV_URL` | `http://localhost:7860` | Base URL for the ContextPrune server |
|
| 117 |
+
|
| 118 |
+
### Project Components
|
| 119 |
+
- **`rag_optimizer_env/`**: State machine, hybrid retrieval, and token estimation.
|
| 120 |
+
- **`app.py`**: FastAPI implementation for remote agent interaction.
|
| 121 |
+
- **`inference.py`**: Baseline agent script (OpenAI-compatible).
|
| 122 |
+
- **`validate.py`**: Robust validation suite for episode lifecycle verification.
|
| 123 |
|
| 124 |
---
|
| 125 |
|
| 126 |
## 🚀 Quick Start
|
| 127 |
|
| 128 |
1. **Setup**: `pip install -r requirements.txt`
|
| 129 |
+
2. **Server**: `python app.py` (Runs on Port 7860)
|
| 130 |
3. **Control Panel**: `streamlit run optimizer_ui.py`
|
| 131 |
4. **Validation**: `python validate.py`
|
| 132 |
|
|
|
|
| 133 |
---
|
| 134 |
|
| 135 |
## 🌎 Live Deployment
|
| 136 |
|
| 137 |
- **Space URL**: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune)
|
| 138 |
- **Direct App Link**: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/)
|
| 139 |
+
- **Space Repo ID**: `prithic07/context-prune`
|
| 140 |
+
|
| 141 |
+
Built for Context Optimization Research.
|