File size: 6,517 Bytes
0dc27c9
 
 
 
 
 
 
 
 
e308182
0f72558
e308182
0f72558
e308182
 
 
0f72558
e308182
 
 
 
 
 
 
 
 
 
 
 
0f72558
24935a5
e308182
 
 
 
0dc27c9
e308182
 
 
 
 
 
 
 
 
 
 
0dc27c9
e308182
0dc27c9
e308182
0dc27c9
 
 
 
 
 
 
 
 
 
 
 
 
 
8b0bc5e
 
0dc27c9
 
 
 
 
 
 
 
 
 
 
8b0bc5e
0dc27c9
 
e308182
 
 
0dc27c9
e308182
0dc27c9
e308182
0dc27c9
 
 
 
 
 
 
 
 
 
e308182
0dc27c9
e308182
0dc27c9
 
 
 
 
e308182
 
 
0dc27c9
e308182
0dc27c9
 
 
 
 
 
 
 
 
 
 
 
 
e308182
 
 
 
 
 
0dc27c9
e308182
 
 
291987c
 
 
 
 
 
0dc27c9
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
title: ContextPrune
emoji: 🧹
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---

# ContextPrune: Adaptive Context Garbage Collection for RAG

ContextPrune is a benchmark environment designed to solve the **"Attention Dilution"** problem in Large Language Model (LLM) workflows. It treats context management as a form of **Garbage Collection**, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.

---

## 1. System Overview

In standard RAG, retrieval often returns too much irrelevant data, causing models to "lose the signal" or hallucinate. ContextPrune provides a Reinforcement Learning (RL) environment where agents are trained to surgically manage their context window.

### Architecture Flow
```mermaid
graph TD
    A[User / Agent] -->|Execute Actions| B[FastAPI / Streamlit Interface]
    B -->|RagAction| C[ContextPrune Environment]
    C -->|Update State| D[State Machine]
    D -->|Token Budgeting| E[Context Working Set]
    D -->|Hybrid Retrieval| F[Corpus Search]
    C -->|Terminal Action| G[Deterministic Grader]
    G -->|Weighted Reward| A
```

---

## 2. Methodology: The Operational Loop

ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.

| Stage | Action | Rationale |
| :--- | :--- | :--- |
| **Triage** | `inspect_artifact` | Low-cost preview of artifact keywords and domains to filter out "Garbage" early. |
| **Analysis** | `prioritize_artifact` | Committing specific evidence to the working set. Consumes token budget. |
| **Optimization** | `summarize_artifact` | AI-driven compression. Reduces token footprint while attempting to preserve "Grounding" tokens. |
| **Resolution** | `set_resolution_plan` | Forces the agent to internalize the evidence into a logical plan before producing an output. |
| **Submission** | `submit_report` | Terminates the episode. The output must be grounded exclusively in the working set. |

---

## 3. Observation Space

The `RagObservation` provides the agent with the internal state of the incident and the current working set budget.

| Field | Type | Description |
| :--- | :--- | :--- |
| `case_id` | `str` | Unique simulated case identifier |
| `case_summary` | `str` | Real-world case context and background |
| `objective` | `str` | Specific deliverable the agent must produce |
| `workflow_stage` | `triage \| analysis \| resolution \| submitted` | Current stage in the operational loop |
| `customer_tier` | `standard \| business \| enterprise` | Customer criticality and SLA priority |
| `incident_severity` | `sev3 \| sev2 \| sev1` | Impact magnitude of the incident |
| `available_artifacts` | `List[ChunkSummary]` | Metadata for artifacts available for inspection |
| `reviewed_artifacts` | `List[str]` | IDs of artifacts already triaged |
| `prioritized_artifacts` | `List[str]` | IDs of artifacts currently in the working set |
| `plan_draft` | `Optional[str]` | Current state of the resolution plan |
| `total_tokens_used` | `int` | Current token cost of the working set |
| `token_budget` | `int` | Maximum allowed token budget |
| `step_number` | `int` | Current step index in the episode |
| `task_name` | `str` | Name of the active benchmark task |

---

## 4. Action Space

Agents interact with the environment through the following canonical actions:

| Action Type | Parameters | Effect |
| :--- | :--- | :--- |
| `inspect_artifact` | `artifact_id` | Review artifact keywords without committing to the working set |
| `prioritize_artifact` | `artifact_id` | Add a reviewed artifact to the working set (consumes tokens) |
| `summarize_artifact` | `artifact_id`, `compression_ratio` | Compress a prioritized artifact using AI summarization |
| `set_resolution_plan` | `plan` | Update the draft plan before final submission |
| `submit_report` | `answer` | Generate final response and terminate the episode |

---

## 5. Reward Engineering (The Benchmarking Grader)

The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics.

- **Required Coverage (24%)**: Inclusion of critical "Gold" artifacts.
- **Cross-Domain Variety (12%)**: Rewards correlation across Support, Incident logs, and Release guardrails.
- **Triage Thoroughness (12%)**: Penalizes skipping the inspection phase.
- **Planning Logic (16%)**: Alignment between the drafted plan and ground truth steps.
- **Reporting Accuracy (18%)**: Presence of mission-critical operational keywords.
- **Citation Fidelity (10%)**: Verification that claimed evidence is in the working set.
- **Token Efficiency (8%)**: Scaled bonus for minimal context usage.
- **Hallucination Penalty (-18%)**: Severe deduction for unsupported claims.

---

## 6. Scenario Benchmarks

| Task | Difficulty | Steps | Budget | Key Challenge |
| :--- | :--- | :--- | :--- | :--- |
| `refund_triage_easy` | Easy | 7 | 850 | Systematically checking policy artifacts before relief. |
| `cross_function_brief_medium` | Medium | 8 | 620 | Filtering overlapping narratives for a singular source of truth. |
| `executive_escalation_hard` | Hard | 10 | 360 | Correlating suspicious logs with release freezes on a tight budget. |

---

## 7. Configuration & Environment

### Environment Variables
| Variable | Default | Purpose |
| :--- | :--- | :--- |
| `API_BASE_URL` | `https://router.huggingface.co/v1` | OpenAI-compatible inference endpoint |
| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | Model used for baseline tasks |
| `HF_TOKEN` | *None* | Authentication for Hugging Face Inference API |
| `RAG_ENV_URL` | `http://localhost:7860` | Base URL for the ContextPrune server |

### Project Components
- **`rag_optimizer_env/`**: State machine, hybrid retrieval, and token estimation.
- **`app.py`**: FastAPI implementation for remote agent interaction.
- **`inference.py`**: Baseline agent script (OpenAI-compatible).
- **`validate.py`**: Robust validation suite for episode lifecycle verification.

---

## 🚀 Quick Start

1. **Setup**: `pip install -r requirements.txt`
2. **Server**: `python app.py` (Runs on Port 7860)
3. **Control Panel**: `streamlit run optimizer_ui.py`
4. **Validation**: `python validate.py`

---

## 🌎 Live Deployment

- **Space URL**: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune)
- **Direct App Link**: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/)
- **Space Repo ID**: `prithic07/context-prune`

Built for Context Optimization Research.