Spaces:

prithic07
/

context-prune

Sleeping

App Files Files Community

prithic07 commited on Apr 12

Commit

0dc27c9

1 Parent(s): 291987c

Docs: Fully restore technical specifications and tables

Browse files

Files changed (1) hide show

README.md +76 -33

README.md CHANGED Viewed

@@ -1,3 +1,12 @@
 # ContextPrune: Adaptive Context Garbage Collection for RAG
 ContextPrune is a benchmark environment designed to solve the **"Attention Dilution"** problem in Large Language Model (LLM) workflows. It treats context management as a form of **Garbage Collection**, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.
@@ -24,7 +33,7 @@ graph TD
 ## 2. Methodology: The Operational Loop
-ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response. Each stage is designed to penalize laziness and reward systematic evidence handling.
 | Stage | Action | Rationale |
 | :--- | :--- | :--- |
@@ -36,63 +45,97 @@ ContextPrune enforces a 5-staged workflow that mirrors enterprise incident respo
 ---
-## 3. Reward Engineering (The Benchmarking Grader)
-The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics. This ensures that a high score represents not just a "correct" answer, but an **optimal trajectory**.
-- **Required Coverage (24%)**: Inclusion of critical "Gold" artifacts identified in `tasks.py`.
-- **Cross-Domain Variety (12%)**: Rewards agents that correlate evidence across Support, Incident logs, and Release guardrails.
-- **Triage Thoroughness (12%)**: Penalizes agents that skip the inspection phase and blindly prioritize.
-- **Planning Logic (16%)**: Measures alignment between the drafted plan and the ground truth operational steps.
-- **Reporting Accuracy (18%)**: Presence of mission-critical operational keywords.
-- **Citation Fidelity (10%)**: Verification that claimed evidence is actually present in the working set.
-- **Token Efficiency (8%)**: Scaled bonus for solving the task with the smallest possible context.
-- **Hallucination Penalty (-18%)**: Severe deduction for claims made in the final report that lack any evidence in the prioritized chunks.
 ---
-## 4. Scenario Benchmarks
-ContextPrune includes three canonical tasks that simulate high-pressure operational incidents:
-### **[Hard] Executive Escalation: Suspected Admin Compromise**
-- **Objective**: Balance immediate customer protection, evidence preservation, and release safeguards.
-- **Budget**: 360 Tokens (Extremely Tight).
-- **Core Challenge**: Correlating suspicious incident logs with release-engineering change freezes across disjointed domains.
-### **[Medium] Cross-Functional Outage Brief**
-- **Objective**: Align Support, Incident Command, and Release Engineering during a payment processing failure.
-- **Budget**: 620 Tokens.
--  **Core Challenge**: Filtering through overlapping narratives to find the "single source of truth" for customer comms.
-### **[Easy] Refund Triage Memo**
-- **Objective**: Determine refund eligibility from support policies and outage impact artifacts.
-- **Budget**: 850 Tokens.
-- **Core Challenge**: Systematic inspection of policy artifacts to ensure relief is justified before escalation.
 ---
-## 5. Technical Components
-- **`rag_optimizer_env/`**: Core state management, hybrid retrieval (Keyword + Semantic), and token estimation using `llm_runtime`.
-- **`app.py`**: A standard FastAPI implementation. Built for Context Optimization Research.
-- **`inference.py`**: A baseline agent script demonstrating how to use the OpenAI-compatible interface.
-- **`validate.py`**: A robust validation suite that runs a full episode lifecycle locally to ensure 100% environment compliance.
 ---
 ## 🚀 Quick Start
 1. **Setup**: `pip install -r requirements.txt`
-2. **Server**: `python app.py` (Runs on Port 8000)
 3. **Control Panel**: `streamlit run optimizer_ui.py`
 4. **Validation**: `python validate.py`
 ---
 ## 🌎 Live Deployment
 - **Space URL**: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune)
 - **Direct App Link**: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/)
-- **Space Repo**: `prithic07/context-prune`

+---
+title: ContextPrune
+emoji: 🧹
+colorFrom: blue
+colorTo: indigo
+sdk: docker
+pinned: false
+---
 # ContextPrune: Adaptive Context Garbage Collection for RAG
 ContextPrune is a benchmark environment designed to solve the **"Attention Dilution"** problem in Large Language Model (LLM) workflows. It treats context management as a form of **Garbage Collection**, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.
 ## 2. Methodology: The Operational Loop
+ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.
 | Stage | Action | Rationale |
 | :--- | :--- | :--- |
 ---
+## 3. Observation Space
+The `RagObservation` provides the agent with the internal state of the incident and the current working set budget.
+| Field | Type | Description |
+| :--- | :--- | :--- |
+| `case_id` | `str` | Unique simulated case identifier |
+| `case_summary` | `str` | Real-world case context and background |
+| `objective` | `str` | Specific deliverable the agent must produce |
+| `workflow_stage` | `triage \| analysis \| resolution \| submitted` | Current stage in the operational loop |
+| `customer_tier` | `standard \| business \| enterprise` | Customer criticality and SLA priority |
+| `incident_severity` | `sev3 \| sev2 \| sev1` | Impact magnitude of the incident |
+| `available_artifacts` | `List[ChunkSummary]` | Metadata for artifacts available for inspection |
+| `reviewed_artifacts` | `List[str]` | IDs of artifacts already triaged |
+| `prioritized_artifacts` | `List[str]` | IDs of artifacts currently in the working set |
+| `plan_draft` | `Optional[str]` | Current state of the resolution plan |
+| `total_tokens_used` | `int` | Current token cost of the working set |
+| `token_budget` | `int` | Maximum allowed token budget |
+---
+## 4. Action Space
+Agents interact with the environment through the following canonical actions:
+| Action Type | Parameters | Effect |
+| :--- | :--- | :--- |
+| `inspect_artifact` | `artifact_id` | Review artifact keywords without committing to the working set |
+| `prioritize_artifact` | `artifact_id` | Add a reviewed artifact to the working set (consumes tokens) |
+| `summarize_artifact` | `artifact_id`, `ratio` | Compress a prioritized artifact using AI summarization |
+| `set_resolution_plan` | `plan` | Update the draft plan before final submission |
+| `submit_report` | `answer` | Generate final response and terminate the episode |
 ---
+## 5. Reward Engineering (The Benchmarking Grader)
+The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics.
+- **Required Coverage (24%)**: Inclusion of critical "Gold" artifacts.
+- **Cross-Domain Variety (12%)**: Rewards correlation across Support, Incident logs, and Release guardrails.
+- **Triage Thoroughness (12%)**: Penalizes skipping the inspection phase.
+- **Planning Logic (16%)**: Alignment between the drafted plan and ground truth steps.
+- **Reporting Accuracy (18%)**: Presence of mission-critical operational keywords.
+- **Citation Fidelity (10%)**: Verification that claimed evidence is in the working set.
+- **Token Efficiency (8%)**: Scaled bonus for minimal context usage.
+- **Hallucination Penalty (-18%)**: Severe deduction for unsupported claims.
+---
+## 6. Scenario Benchmarks
+| Task | Difficulty | Steps | Budget | Key Challenge |
+| :--- | :--- | :--- | :--- | :--- |
+| `refund_triage_easy` | Easy | 7 | 850 | Systematically checking policy artifacts before relief. |
+| `cross_function_brief_medium` | Medium | 8 | 620 | Filtering overlapping narratives for a singular source of truth. |
+| `executive_escalation_hard` | Hard | 10 | 360 | Correlating suspicious logs with release freezes on a tight budget. |
 ---
+## 7. Configuration & Environment
+### Environment Variables
+| Variable | Default | Purpose |
+| :--- | :--- | :--- |
+| `API_BASE_URL` | `https://router.huggingface.co/v1` | OpenAI-compatible inference endpoint |
+| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | Model used for baseline tasks |
+| `HF_TOKEN` | *None* | Authentication for Hugging Face Inference API |
+| `RAG_ENV_URL` | `http://localhost:7860` | Base URL for the ContextPrune server |
+### Project Components
+- **`rag_optimizer_env/`**: State machine, hybrid retrieval, and token estimation.
+- **`app.py`**: FastAPI implementation for remote agent interaction.
+- **`inference.py`**: Baseline agent script (OpenAI-compatible).
+- **`validate.py`**: Robust validation suite for episode lifecycle verification.
 ---
 ## 🚀 Quick Start
 1. **Setup**: `pip install -r requirements.txt`
+2. **Server**: `python app.py` (Runs on Port 7860)
 3. **Control Panel**: `streamlit run optimizer_ui.py`
 4. **Validation**: `python validate.py`
 ---
 ## 🌎 Live Deployment
 - **Space URL**: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune)
 - **Direct App Link**: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/)
+- **Space Repo ID**: `prithic07/context-prune`
+Built for Context Optimization Research.