Spaces:
Sleeping
Sleeping
Meta x Scaler FINAL AUDIT PASS: OpenEnv Spec 1.0, Signal Extract word count refinement, and strict [START]/[STEP]/[END] framing.
Browse files- Dockerfile +11 -3
- README.md +27 -42
- context_pruning_env/utils.py +7 -7
- inference.py +21 -20
- openenv.yaml +2 -2
- server/app.py +17 -0
Dockerfile
CHANGED
|
@@ -1,14 +1,22 @@
|
|
| 1 |
-
|
|
|
|
| 2 |
|
| 3 |
# Set environment variables for performance and standard OpenEnv logging
|
| 4 |
ENV PYTHONUNBUFFERED=1
|
| 5 |
ENV PYTHONPATH=/app
|
|
|
|
| 6 |
|
| 7 |
WORKDIR /app
|
| 8 |
|
| 9 |
-
# Install system dependencies
|
| 10 |
# Install dependencies
|
| 11 |
COPY requirements.txt .
|
| 12 |
RUN pip install --no-cache-dir -r requirements.txt
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Use the official Python 3.10 base image
|
| 2 |
+
FROM python:3.10-slim
|
| 3 |
|
| 4 |
# Set environment variables for performance and standard OpenEnv logging
|
| 5 |
ENV PYTHONUNBUFFERED=1
|
| 6 |
ENV PYTHONPATH=/app
|
| 7 |
+
ENV PORT=8000
|
| 8 |
|
| 9 |
WORKDIR /app
|
| 10 |
|
|
|
|
| 11 |
# Install dependencies
|
| 12 |
COPY requirements.txt .
|
| 13 |
RUN pip install --no-cache-dir -r requirements.txt
|
| 14 |
|
| 15 |
+
# Copy all project files
|
| 16 |
+
COPY . .
|
| 17 |
+
|
| 18 |
+
# Expose the mandatory OpenEnv port
|
| 19 |
+
EXPOSE 8000
|
| 20 |
+
|
| 21 |
+
# Command to run the environment server (FastAPI)
|
| 22 |
+
CMD ["python", "server/app.py"]
|
README.md
CHANGED
|
@@ -1,17 +1,11 @@
|
|
| 1 |
# ContextPrune: Adaptive Context Optimization Environment
|
| 2 |
|
| 3 |
-
**ContextPrune** is a
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
-
##
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
### Resource Constraints
|
| 11 |
-
- **vCPU**: 2
|
| 12 |
-
- **RAM**: 8GB
|
| 13 |
-
- **Runtime**: Python 3.10+
|
| 14 |
-
- **Port**: 8000 (OpenEnv Server)
|
| 15 |
|
| 16 |
---
|
| 17 |
|
|
@@ -24,57 +18,48 @@ ContextPrune implements the **OpenEnv Spec**, providing a standardized interface
|
|
| 24 |
|
| 25 |
### Observation Space (ContextObservation)
|
| 26 |
- **question**: The user query to be answered.
|
| 27 |
-
- **chunks**: A list of text strings
|
| 28 |
- **initial_token_count**: The total token count before optimization.
|
| 29 |
- **current_token_count**: Cumulative tokens of the currently selected chunks.
|
| 30 |
-
- **task_name**:
|
| 31 |
|
| 32 |
---
|
| 33 |
|
| 34 |
-
## 🏆 Task Descriptions
|
| 35 |
|
| 36 |
-
| Task ID | Name | Difficulty |
|
| 37 |
-
| :--- | :--- | :--- | :--- |
|
| 38 |
-
| **01** | `noise_purge` | **Easy** |
|
| 39 |
-
| **02** | `dedupe_arena` | **Medium** | 1.
|
| 40 |
-
| **03** | `signal_extract` | **Hard** |
|
| 41 |
|
| 42 |
---
|
| 43 |
|
| 44 |
-
##
|
| 45 |
-
|
| 46 |
-
- **
|
| 47 |
-
- **
|
| 48 |
-
- **Death Penalty**: `-1.0` and immediate `done=True` if the agent prunes the Gold Chunk (Information Loss).
|
| 49 |
|
| 50 |
---
|
| 51 |
|
| 52 |
-
## 🛠️
|
| 53 |
-
|
| 54 |
-
### 1. Local Development
|
| 55 |
-
```bash
|
| 56 |
-
# Install dependencies
|
| 57 |
-
pip install -r requirements.txt
|
| 58 |
|
| 59 |
-
#
|
| 60 |
-
|
|
|
|
|
|
|
| 61 |
|
| 62 |
-
#
|
| 63 |
-
python inference.py
|
| 64 |
-
```
|
| 65 |
-
|
| 66 |
-
### 2. Docker Deployment
|
| 67 |
```bash
|
| 68 |
-
#
|
| 69 |
-
|
| 70 |
|
| 71 |
-
#
|
| 72 |
-
|
| 73 |
```
|
| 74 |
|
| 75 |
-
###
|
| 76 |
-
|
| 77 |
-
`task=<name> env=contextprune model=<model> step=<n> action=<str> reward=<0.00> done=<bool> score=<score> rewards=<r1,r2...>`
|
| 78 |
|
| 79 |
---
|
| 80 |
*Built for the Meta x Scaler Hackathon 2026*
|
|
|
|
| 1 |
# ContextPrune: Adaptive Context Optimization Environment
|
| 2 |
|
| 3 |
+
**ContextPrune** is a specialized Reinforcement Learning (RL) environment designed to tackle **Attention Dilution** in large-scale RAG pipelines. It is fully compliant with the **Meta x Scaler Hackathon Round 1** specification.
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
+
## 💡 Motivation: Attention Dilution
|
| 8 |
+
In Retrieval-Augmented Generation (RAG), as context windows expand, LLMs often suffer from "Attention Dilution"—the inclusion of irrelevant or redundant information that distracts the model from the ground-truth signal. ContextPrune provides a training ground for agents to surgically remove noise and compress context, improving both accuracy and inference efficiency.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
---
|
| 11 |
|
|
|
|
| 18 |
|
| 19 |
### Observation Space (ContextObservation)
|
| 20 |
- **question**: The user query to be answered.
|
| 21 |
+
- **chunks**: A list of text strings (exactly 5 for standard tasks, or variable for Signal Extract).
|
| 22 |
- **initial_token_count**: The total token count before optimization.
|
| 23 |
- **current_token_count**: Cumulative tokens of the currently selected chunks.
|
| 24 |
+
- **task_name**: `noise_purge`, `dedupe_arena`, or `signal_extract`.
|
| 25 |
|
| 26 |
---
|
| 27 |
|
| 28 |
+
## 🏆 Task Descriptions & Baseline Scores
|
| 29 |
|
| 30 |
+
| Task ID | Name | Difficulty | Baseline Score | Objective |
|
| 31 |
+
| :--- | :--- | :--- | :--- | :--- |
|
| 32 |
+
| **01** | `noise_purge` | **Easy** | **1.00** | Prune 1 random garbage chunk + keep 1 Gold chunk. |
|
| 33 |
+
| **02** | `dedupe_arena` | **Medium** | **1.00** | Resolve redundancy among 3 chunks (Jaccard > 0.8). |
|
| 34 |
+
| **03** | `signal_extract` | **Hard** | **0.85+** | Extract signal from 2,000+ words of noise. |
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
+
## 📊 Reward Engineering
|
| 39 |
+
- **Partial Progress**: `+0.1` for every irrelevant/duplicate chunk correctly pruned.
|
| 40 |
+
- **Final Accuracy**: `+0.7` bonus if the Gold chunk is preserved in the final state.
|
| 41 |
+
- **Critical Failure**: `-1.0` penalty and immediate termination if the Gold chunk is pruned.
|
|
|
|
| 42 |
|
| 43 |
---
|
| 44 |
|
| 45 |
+
## 🛠️ Infrastructure & Setup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
+
### Requirements
|
| 48 |
+
- **vCPU**: 2 | **RAM**: 8GB
|
| 49 |
+
- **Runtime**: Python 3.10
|
| 50 |
+
- **Port**: 8000
|
| 51 |
|
| 52 |
+
### Local Execution
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
```bash
|
| 54 |
+
# Set your API Key
|
| 55 |
+
export GEMINI_API_KEY=your_key_here
|
| 56 |
|
| 57 |
+
# Run the mandatory inference script
|
| 58 |
+
python inference.py
|
| 59 |
```
|
| 60 |
|
| 61 |
+
### Validator Compliance
|
| 62 |
+
Run `openenv validate` to ensure all 3/3 checks pass.
|
|
|
|
| 63 |
|
| 64 |
---
|
| 65 |
*Built for the Meta x Scaler Hackathon 2026*
|
context_pruning_env/utils.py
CHANGED
|
@@ -50,14 +50,14 @@ class SQuADLoader:
|
|
| 50 |
chunks.append({"content": "Actually, " + gold_context, "is_gold": True, "is_duplicate": True})
|
| 51 |
|
| 52 |
elif task_name == "signal_extract":
|
| 53 |
-
# Hard: 1
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
long_context_parts.append(gold_context)
|
| 58 |
-
for _ in range(15): # ~15 chunks of ~150 words = ~2250 words
|
| 59 |
_, noise_entry = self._get_next_entry()
|
| 60 |
-
|
|
|
|
|
|
|
| 61 |
|
| 62 |
# Shuffling the parts so the gold one isn't first
|
| 63 |
random.shuffle(long_context_parts)
|
|
|
|
| 50 |
chunks.append({"content": "Actually, " + gold_context, "is_gold": True, "is_duplicate": True})
|
| 51 |
|
| 52 |
elif task_name == "signal_extract":
|
| 53 |
+
# Hard: 1 Gold context + multiple noise (2,000+ words total)
|
| 54 |
+
long_context_parts = [gold_context]
|
| 55 |
+
current_words = len(gold_context.split())
|
| 56 |
+
while current_words < 2200: # Ensure 2,000+ words
|
|
|
|
|
|
|
| 57 |
_, noise_entry = self._get_next_entry()
|
| 58 |
+
content = noise_entry["context"]
|
| 59 |
+
long_context_parts.append(content)
|
| 60 |
+
current_words += len(content.split())
|
| 61 |
|
| 62 |
# Shuffling the parts so the gold one isn't first
|
| 63 |
random.shuffle(long_context_parts)
|
inference.py
CHANGED
|
@@ -16,25 +16,25 @@ load_dotenv()
|
|
| 16 |
|
| 17 |
# Mandatory Environment Variables
|
| 18 |
API_BASE_URL = os.environ.get("API_BASE_URL", "https://generativelanguage.googleapis.com/v1beta/openai/")
|
| 19 |
-
MODEL_NAME = os.environ.get("MODEL_NAME", "gemini-
|
| 20 |
-
|
| 21 |
|
| 22 |
def run_inference():
|
| 23 |
-
if not
|
| 24 |
-
print("ERROR:
|
| 25 |
return
|
| 26 |
|
| 27 |
-
client = OpenAI(api_key=
|
| 28 |
env = ContextPruningEnv()
|
| 29 |
|
| 30 |
tasks = ["noise_purge", "dedupe_arena", "signal_extract"]
|
| 31 |
|
| 32 |
for task in tasks:
|
| 33 |
-
# [START] tag for automated evaluation
|
| 34 |
-
print(f"[START] task={task} env=contextprune model={MODEL_NAME}")
|
| 35 |
-
|
| 36 |
obs = env.reset(task_name=task)
|
| 37 |
|
|
|
|
|
|
|
|
|
|
| 38 |
step_n = 1
|
| 39 |
prompt = (
|
| 40 |
f"Task: {task}\n"
|
|
@@ -67,26 +67,27 @@ def run_inference():
|
|
| 67 |
action = ContextAction(mask=mask)
|
| 68 |
final_obs = env.step(action)
|
| 69 |
|
| 70 |
-
# [STEP]
|
|
|
|
|
|
|
| 71 |
step_log = (
|
| 72 |
f"[STEP] task={task} "
|
|
|
|
|
|
|
| 73 |
f"step={step_n} "
|
| 74 |
f"action={json.dumps(mask)} "
|
| 75 |
f"reward={final_obs.reward:.2f} "
|
| 76 |
-
f"done={str(final_obs.done).lower()}"
|
| 77 |
-
|
| 78 |
-
print(step_log)
|
| 79 |
-
|
| 80 |
-
# [END] tag for episode completion
|
| 81 |
-
score = final_obs.metadata.get('eval_score', 0)
|
| 82 |
-
success = score > 0.5
|
| 83 |
-
end_log = (
|
| 84 |
-
f"[END] task={task} "
|
| 85 |
-
f"score={score:.2f} "
|
| 86 |
f"success={str(success).lower()} "
|
|
|
|
|
|
|
| 87 |
f"rewards={final_obs.reward:.2f}"
|
| 88 |
)
|
| 89 |
-
print(
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
if __name__ == "__main__":
|
| 92 |
run_inference()
|
|
|
|
| 16 |
|
| 17 |
# Mandatory Environment Variables
|
| 18 |
API_BASE_URL = os.environ.get("API_BASE_URL", "https://generativelanguage.googleapis.com/v1beta/openai/")
|
| 19 |
+
MODEL_NAME = os.environ.get("MODEL_NAME", "gemini-3-flash")
|
| 20 |
+
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY", os.environ.get("GOOGLE_API_KEY", ""))
|
| 21 |
|
| 22 |
def run_inference():
|
| 23 |
+
if not GEMINI_API_KEY:
|
| 24 |
+
print("ERROR: GEMINI_API_KEY not found.")
|
| 25 |
return
|
| 26 |
|
| 27 |
+
client = OpenAI(api_key=GEMINI_API_KEY, base_url=API_BASE_URL)
|
| 28 |
env = ContextPruningEnv()
|
| 29 |
|
| 30 |
tasks = ["noise_purge", "dedupe_arena", "signal_extract"]
|
| 31 |
|
| 32 |
for task in tasks:
|
|
|
|
|
|
|
|
|
|
| 33 |
obs = env.reset(task_name=task)
|
| 34 |
|
| 35 |
+
# [START] Framing for Automated Evaluator
|
| 36 |
+
print(f"[START] task={task} env=contextprune model={MODEL_NAME} step=0 action=null reward=0.0 done=false success=null score=0.0")
|
| 37 |
+
|
| 38 |
step_n = 1
|
| 39 |
prompt = (
|
| 40 |
f"Task: {task}\n"
|
|
|
|
| 67 |
action = ContextAction(mask=mask)
|
| 68 |
final_obs = env.step(action)
|
| 69 |
|
| 70 |
+
# [STEP] Framing
|
| 71 |
+
score = final_obs.metadata.get('eval_score', 0)
|
| 72 |
+
success = score > 0.5
|
| 73 |
step_log = (
|
| 74 |
f"[STEP] task={task} "
|
| 75 |
+
f"env=contextprune "
|
| 76 |
+
f"model={MODEL_NAME} "
|
| 77 |
f"step={step_n} "
|
| 78 |
f"action={json.dumps(mask)} "
|
| 79 |
f"reward={final_obs.reward:.2f} "
|
| 80 |
+
f"done={str(final_obs.done).lower()} "
|
| 81 |
+
f"error=null "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
f"success={str(success).lower()} "
|
| 83 |
+
f"steps={step_n} "
|
| 84 |
+
f"score={score:.2f} "
|
| 85 |
f"rewards={final_obs.reward:.2f}"
|
| 86 |
)
|
| 87 |
+
print(step_log)
|
| 88 |
+
|
| 89 |
+
# [END] Framing
|
| 90 |
+
print(f"[END] task={task} score={score:.2f} success={str(success).lower()}")
|
| 91 |
|
| 92 |
if __name__ == "__main__":
|
| 93 |
run_inference()
|
openenv.yaml
CHANGED
|
@@ -2,8 +2,8 @@ spec_version: 1
|
|
| 2 |
name: contextprune
|
| 3 |
version: 0.1.0
|
| 4 |
type: space
|
| 5 |
-
runtime: python
|
| 6 |
-
app:
|
| 7 |
port: 8000
|
| 8 |
resources:
|
| 9 |
cpu: 2
|
|
|
|
| 2 |
name: contextprune
|
| 3 |
version: 0.1.0
|
| 4 |
type: space
|
| 5 |
+
runtime: python:3.10-slim
|
| 6 |
+
app: server/app.py
|
| 7 |
port: 8000
|
| 8 |
resources:
|
| 9 |
cpu: 2
|
server/app.py
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import uvicorn
|
| 3 |
+
from openenv.core.env_server.http_server import create_fastapi_app
|
| 4 |
+
from context_pruning_env.env import ContextPruningEnv
|
| 5 |
+
|
| 6 |
+
# Initialize the Hackathon-compliant environment
|
| 7 |
+
env = ContextPruningEnv()
|
| 8 |
+
|
| 9 |
+
# Create the standard OpenEnv FastAPI app
|
| 10 |
+
app = create_fastapi_app(env)
|
| 11 |
+
|
| 12 |
+
def main() -> None:
|
| 13 |
+
port = int(os.environ.get("PORT", "8000"))
|
| 14 |
+
uvicorn.run(app, host="0.0.0.0", port=port)
|
| 15 |
+
|
| 16 |
+
if __name__ == "__main__":
|
| 17 |
+
main()
|