Spaces:

prithic07
/

context-prune

Sleeping

App Files Files Community

context-prune / README.md

prithic07

Docs: Fix technical inaccuracies in Action and Observation tables

8b0bc5e about 1 month ago

preview code

raw

history blame contribute delete

6.52 kB

	---
	title: ContextPrune
	emoji: 🧹
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: false
	---

	# ContextPrune: Adaptive Context Garbage Collection for RAG

	ContextPrune is a benchmark environment designed to solve the "Attention Dilution" problem in Large Language Model (LLM) workflows. It treats context management as a form of Garbage Collection, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.

	---

	## 1. System Overview

	In standard RAG, retrieval often returns too much irrelevant data, causing models to "lose the signal" or hallucinate. ContextPrune provides a Reinforcement Learning (RL) environment where agents are trained to surgically manage their context window.

	### Architecture Flow
	```mermaid
	graph TD
	A[User / Agent] -->\|Execute Actions\| B[FastAPI / Streamlit Interface]
	B -->\|RagAction\| C[ContextPrune Environment]
	C -->\|Update State\| D[State Machine]
	D -->\|Token Budgeting\| E[Context Working Set]
	D -->\|Hybrid Retrieval\| F[Corpus Search]
	C -->\|Terminal Action\| G[Deterministic Grader]
	G -->\|Weighted Reward\| A
	```

	---

	## 2. Methodology: The Operational Loop

	ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.

	\| Stage \| Action \| Rationale \|
	\| :--- \| :--- \| :--- \|
	\| Triage \| `inspect_artifact` \| Low-cost preview of artifact keywords and domains to filter out "Garbage" early. \|
	\| Analysis \| `prioritize_artifact` \| Committing specific evidence to the working set. Consumes token budget. \|
	\| Optimization \| `summarize_artifact` \| AI-driven compression. Reduces token footprint while attempting to preserve "Grounding" tokens. \|
	\| Resolution \| `set_resolution_plan` \| Forces the agent to internalize the evidence into a logical plan before producing an output. \|
	\| Submission \| `submit_report` \| Terminates the episode. The output must be grounded exclusively in the working set. \|

	---

	## 3. Observation Space

	The `RagObservation` provides the agent with the internal state of the incident and the current working set budget.

	\| Field \| Type \| Description \|
	\| :--- \| :--- \| :--- \|
	\| `case_id` \| `str` \| Unique simulated case identifier \|
	\| `case_summary` \| `str` \| Real-world case context and background \|
	\| `objective` \| `str` \| Specific deliverable the agent must produce \|
	\| `workflow_stage` \| `triage \\| analysis \\| resolution \\| submitted` \| Current stage in the operational loop \|
	\| `customer_tier` \| `standard \\| business \\| enterprise` \| Customer criticality and SLA priority \|
	\| `incident_severity` \| `sev3 \\| sev2 \\| sev1` \| Impact magnitude of the incident \|
	\| `available_artifacts` \| `List[ChunkSummary]` \| Metadata for artifacts available for inspection \|
	\| `reviewed_artifacts` \| `List[str]` \| IDs of artifacts already triaged \|
	\| `prioritized_artifacts` \| `List[str]` \| IDs of artifacts currently in the working set \|
	\| `plan_draft` \| `Optional[str]` \| Current state of the resolution plan \|
	\| `total_tokens_used` \| `int` \| Current token cost of the working set \|
	\| `token_budget` \| `int` \| Maximum allowed token budget \|
	\| `step_number` \| `int` \| Current step index in the episode \|
	\| `task_name` \| `str` \| Name of the active benchmark task \|

	---

	## 4. Action Space

	Agents interact with the environment through the following canonical actions:

	\| Action Type \| Parameters \| Effect \|
	\| :--- \| :--- \| :--- \|
	\| `inspect_artifact` \| `artifact_id` \| Review artifact keywords without committing to the working set \|
	\| `prioritize_artifact` \| `artifact_id` \| Add a reviewed artifact to the working set (consumes tokens) \|
	\| `summarize_artifact` \| `artifact_id`, `compression_ratio` \| Compress a prioritized artifact using AI summarization \|
	\| `set_resolution_plan` \| `plan` \| Update the draft plan before final submission \|
	\| `submit_report` \| `answer` \| Generate final response and terminate the episode \|

	---

	## 5. Reward Engineering (The Benchmarking Grader)

	The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics.

	- Required Coverage (24%): Inclusion of critical "Gold" artifacts.
	- Cross-Domain Variety (12%): Rewards correlation across Support, Incident logs, and Release guardrails.
	- Triage Thoroughness (12%): Penalizes skipping the inspection phase.
	- Planning Logic (16%): Alignment between the drafted plan and ground truth steps.
	- Reporting Accuracy (18%): Presence of mission-critical operational keywords.
	- Citation Fidelity (10%): Verification that claimed evidence is in the working set.
	- Token Efficiency (8%): Scaled bonus for minimal context usage.
	- Hallucination Penalty (-18%): Severe deduction for unsupported claims.

	---

	## 6. Scenario Benchmarks

	\| Task \| Difficulty \| Steps \| Budget \| Key Challenge \|
	\| :--- \| :--- \| :--- \| :--- \| :--- \|
	\| `refund_triage_easy` \| Easy \| 7 \| 850 \| Systematically checking policy artifacts before relief. \|
	\| `cross_function_brief_medium` \| Medium \| 8 \| 620 \| Filtering overlapping narratives for a singular source of truth. \|
	\| `executive_escalation_hard` \| Hard \| 10 \| 360 \| Correlating suspicious logs with release freezes on a tight budget. \|

	---

	## 7. Configuration & Environment

	### Environment Variables
	\| Variable \| Default \| Purpose \|
	\| :--- \| :--- \| :--- \|
	\| `API_BASE_URL` \| `https://router.huggingface.co/v1` \| OpenAI-compatible inference endpoint \|
	\| `MODEL_NAME` \| `Qwen/Qwen2.5-72B-Instruct` \| Model used for baseline tasks \|
	\| `HF_TOKEN` \| None \| Authentication for Hugging Face Inference API \|
	\| `RAG_ENV_URL` \| `http://localhost:7860` \| Base URL for the ContextPrune server \|

	### Project Components
	- `rag_optimizer_env/`: State machine, hybrid retrieval, and token estimation.
	- `app.py`: FastAPI implementation for remote agent interaction.
	- `inference.py`: Baseline agent script (OpenAI-compatible).
	- `validate.py`: Robust validation suite for episode lifecycle verification.

	---

	## 🚀 Quick Start

	1. Setup: `pip install -r requirements.txt`
	2. Server: `python app.py` (Runs on Port 7860)
	3. Control Panel: `streamlit run optimizer_ui.py`
	4. Validation: `python validate.py`

	---

	## 🌎 Live Deployment

	- Space URL: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune)
	- Direct App Link: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/)
	- Space Repo ID: `prithic07/context-prune`

	Built for Context Optimization Research.

	---
	title: ContextPrune
	emoji: 🧹
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: false
	---

	# ContextPrune: Adaptive Context Garbage Collection for RAG

	ContextPrune is a benchmark environment designed to solve the "Attention Dilution" problem in Large Language Model (LLM) workflows. It treats context management as a form of Garbage Collection, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.

	---

	## 1. System Overview

	In standard RAG, retrieval often returns too much irrelevant data, causing models to "lose the signal" or hallucinate. ContextPrune provides a Reinforcement Learning (RL) environment where agents are trained to surgically manage their context window.

	### Architecture Flow
	```mermaid
	graph TD
	A[User / Agent] -->\|Execute Actions\| B[FastAPI / Streamlit Interface]
	B -->\|RagAction\| C[ContextPrune Environment]
	C -->\|Update State\| D[State Machine]
	D -->\|Token Budgeting\| E[Context Working Set]
	D -->\|Hybrid Retrieval\| F[Corpus Search]
	C -->\|Terminal Action\| G[Deterministic Grader]
	G -->\|Weighted Reward\| A
	```

	---

	## 2. Methodology: The Operational Loop

	ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.

	\| Stage \| Action \| Rationale \|
	\| :--- \| :--- \| :--- \|
	\| Triage \| `inspect_artifact` \| Low-cost preview of artifact keywords and domains to filter out "Garbage" early. \|
	\| Analysis \| `prioritize_artifact` \| Committing specific evidence to the working set. Consumes token budget. \|
	\| Optimization \| `summarize_artifact` \| AI-driven compression. Reduces token footprint while attempting to preserve "Grounding" tokens. \|
	\| Resolution \| `set_resolution_plan` \| Forces the agent to internalize the evidence into a logical plan before producing an output. \|
	\| Submission \| `submit_report` \| Terminates the episode. The output must be grounded exclusively in the working set. \|

	---

	## 3. Observation Space

	The `RagObservation` provides the agent with the internal state of the incident and the current working set budget.

	\| Field \| Type \| Description \|
	\| :--- \| :--- \| :--- \|
	\| `case_id` \| `str` \| Unique simulated case identifier \|
	\| `case_summary` \| `str` \| Real-world case context and background \|
	\| `objective` \| `str` \| Specific deliverable the agent must produce \|
	\| `workflow_stage` \| `triage \\| analysis \\| resolution \\| submitted` \| Current stage in the operational loop \|
	\| `customer_tier` \| `standard \\| business \\| enterprise` \| Customer criticality and SLA priority \|
	\| `incident_severity` \| `sev3 \\| sev2 \\| sev1` \| Impact magnitude of the incident \|
	\| `available_artifacts` \| `List[ChunkSummary]` \| Metadata for artifacts available for inspection \|
	\| `reviewed_artifacts` \| `List[str]` \| IDs of artifacts already triaged \|
	\| `prioritized_artifacts` \| `List[str]` \| IDs of artifacts currently in the working set \|
	\| `plan_draft` \| `Optional[str]` \| Current state of the resolution plan \|
	\| `total_tokens_used` \| `int` \| Current token cost of the working set \|
	\| `token_budget` \| `int` \| Maximum allowed token budget \|
	\| `step_number` \| `int` \| Current step index in the episode \|
	\| `task_name` \| `str` \| Name of the active benchmark task \|

	---

	## 4. Action Space

	Agents interact with the environment through the following canonical actions:

	\| Action Type \| Parameters \| Effect \|
	\| :--- \| :--- \| :--- \|
	\| `inspect_artifact` \| `artifact_id` \| Review artifact keywords without committing to the working set \|
	\| `prioritize_artifact` \| `artifact_id` \| Add a reviewed artifact to the working set (consumes tokens) \|
	\| `summarize_artifact` \| `artifact_id`, `compression_ratio` \| Compress a prioritized artifact using AI summarization \|
	\| `set_resolution_plan` \| `plan` \| Update the draft plan before final submission \|
	\| `submit_report` \| `answer` \| Generate final response and terminate the episode \|

	---

	## 5. Reward Engineering (The Benchmarking Grader)

	The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics.

	- Required Coverage (24%): Inclusion of critical "Gold" artifacts.
	- Cross-Domain Variety (12%): Rewards correlation across Support, Incident logs, and Release guardrails.
	- Triage Thoroughness (12%): Penalizes skipping the inspection phase.
	- Planning Logic (16%): Alignment between the drafted plan and ground truth steps.
	- Reporting Accuracy (18%): Presence of mission-critical operational keywords.
	- Citation Fidelity (10%): Verification that claimed evidence is in the working set.
	- Token Efficiency (8%): Scaled bonus for minimal context usage.
	- Hallucination Penalty (-18%): Severe deduction for unsupported claims.

	---

	## 6. Scenario Benchmarks

	\| Task \| Difficulty \| Steps \| Budget \| Key Challenge \|
	\| :--- \| :--- \| :--- \| :--- \| :--- \|
	\| `refund_triage_easy` \| Easy \| 7 \| 850 \| Systematically checking policy artifacts before relief. \|
	\| `cross_function_brief_medium` \| Medium \| 8 \| 620 \| Filtering overlapping narratives for a singular source of truth. \|
	\| `executive_escalation_hard` \| Hard \| 10 \| 360 \| Correlating suspicious logs with release freezes on a tight budget. \|

	---

	## 7. Configuration & Environment

	### Environment Variables
	\| Variable \| Default \| Purpose \|
	\| :--- \| :--- \| :--- \|
	\| `API_BASE_URL` \| `https://router.huggingface.co/v1` \| OpenAI-compatible inference endpoint \|
	\| `MODEL_NAME` \| `Qwen/Qwen2.5-72B-Instruct` \| Model used for baseline tasks \|
	\| `HF_TOKEN` \| None \| Authentication for Hugging Face Inference API \|
	\| `RAG_ENV_URL` \| `http://localhost:7860` \| Base URL for the ContextPrune server \|

	### Project Components
	- `rag_optimizer_env/`: State machine, hybrid retrieval, and token estimation.
	- `app.py`: FastAPI implementation for remote agent interaction.
	- `inference.py`: Baseline agent script (OpenAI-compatible).
	- `validate.py`: Robust validation suite for episode lifecycle verification.

	---

	## 🚀 Quick Start

	1. Setup: `pip install -r requirements.txt`
	2. Server: `python app.py` (Runs on Port 7860)
	3. Control Panel: `streamlit run optimizer_ui.py`
	4. Validation: `python validate.py`

	---

	## 🌎 Live Deployment

	- Space URL: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune)
	- Direct App Link: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/)
	- Space Repo ID: `prithic07/context-prune`

	Built for Context Optimization Research.