prithic07 commited on
Commit
b308a54
·
1 Parent(s): 6cad4bb

Meta x Scaler FINAL AUDIT PASS: OpenEnv Spec 1.0, Signal Extract word count refinement, and strict [START]/[STEP]/[END] framing.

Browse files
Files changed (6) hide show
  1. Dockerfile +11 -3
  2. README.md +27 -42
  3. context_pruning_env/utils.py +7 -7
  4. inference.py +21 -20
  5. openenv.yaml +2 -2
  6. server/app.py +17 -0
Dockerfile CHANGED
@@ -1,14 +1,22 @@
1
- FROM python:3.11-slim
 
2
 
3
  # Set environment variables for performance and standard OpenEnv logging
4
  ENV PYTHONUNBUFFERED=1
5
  ENV PYTHONPATH=/app
 
6
 
7
  WORKDIR /app
8
 
9
- # Install system dependencies
10
  # Install dependencies
11
  COPY requirements.txt .
12
  RUN pip install --no-cache-dir -r requirements.txt
13
 
14
- CMD ["uvicorn", "context_pruning_env.server.app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
 
 
 
 
 
 
 
 
1
+ # Use the official Python 3.10 base image
2
+ FROM python:3.10-slim
3
 
4
  # Set environment variables for performance and standard OpenEnv logging
5
  ENV PYTHONUNBUFFERED=1
6
  ENV PYTHONPATH=/app
7
+ ENV PORT=8000
8
 
9
  WORKDIR /app
10
 
 
11
  # Install dependencies
12
  COPY requirements.txt .
13
  RUN pip install --no-cache-dir -r requirements.txt
14
 
15
+ # Copy all project files
16
+ COPY . .
17
+
18
+ # Expose the mandatory OpenEnv port
19
+ EXPOSE 8000
20
+
21
+ # Command to run the environment server (FastAPI)
22
+ CMD ["python", "server/app.py"]
README.md CHANGED
@@ -1,17 +1,11 @@
1
  # ContextPrune: Adaptive Context Optimization Environment
2
 
3
- **ContextPrune** is a Meta x Scaler Hackathon compliant reinforcement learning environment designed for Phase 1: Automated Validation. It focuses on the critical task of context pruning for RAG pipelines, reducing noise and token counts while strictly preserving answer faithfulness.
4
 
5
  ---
6
 
7
- ## 🌍 Environment Description
8
- ContextPrune implements the **OpenEnv Spec**, providing a standardized interface for RL agents to optimize retrieved contexts. The environment presents a query and multiple context chunks (from SQuAD or synthetic noise) where the agent must decide which chunks to keep and which to prune using a binary mask.
9
-
10
- ### Resource Constraints
11
- - **vCPU**: 2
12
- - **RAM**: 8GB
13
- - **Runtime**: Python 3.10+
14
- - **Port**: 8000 (OpenEnv Server)
15
 
16
  ---
17
 
@@ -24,57 +18,48 @@ ContextPrune implements the **OpenEnv Spec**, providing a standardized interface
24
 
25
  ### Observation Space (ContextObservation)
26
  - **question**: The user query to be answered.
27
- - **chunks**: A list of text strings representing the retrieved context.
28
  - **initial_token_count**: The total token count before optimization.
29
  - **current_token_count**: Cumulative tokens of the currently selected chunks.
30
- - **task_name**: The identifier for the current pruning task.
31
 
32
  ---
33
 
34
- ## 🏆 Task Descriptions
35
 
36
- | Task ID | Name | Difficulty | Scoring Logic |
37
- | :--- | :--- | :--- | :--- |
38
- | **01** | `noise_purge` | **Easy** | 0.0 or 1.0. Perfect score if all noise is deleted and the answer is kept. |
39
- | **02** | `dedupe_arena` | **Medium** | 1.0 if word count is reduced by >50% while preserving the answer. |
40
- | **03** | `signal_extract` | **Hard** | $1 - (FinalTokens/InitialTokens)$. Score scales with compression ratio. |
41
 
42
  ---
43
 
44
- ## 📈 Reward Function (Trajectory Signals)
45
- The environment emits rewards based on the agent's efficiency and accuracy:
46
- - **Efficiency**: `+0.1` for every irrelevant chunk or duplicate correctly pruned.
47
- - **Accuracy**: `+0.7` bonus at the end of the trajectory if the "Gold Chunk" is preserved.
48
- - **Death Penalty**: `-1.0` and immediate `done=True` if the agent prunes the Gold Chunk (Information Loss).
49
 
50
  ---
51
 
52
- ## 🛠️ Setup Instructions
53
-
54
- ### 1. Local Development
55
- ```bash
56
- # Install dependencies
57
- pip install -r requirements.txt
58
 
59
- # Configure API (Optional for testing)
60
- echo "GOOGLE_API_KEY=your_key" > .env
 
 
61
 
62
- # Run Inference Evaluation
63
- python inference.py
64
- ```
65
-
66
- ### 2. Docker Deployment
67
  ```bash
68
- # Build the standardized image
69
- docker build -t contextprune .
70
 
71
- # Start the environment server
72
- docker run -p 8000:8000 contextprune
73
  ```
74
 
75
- ### 3. Inference Logging
76
- Mandatory logs are emitted in the following format for the Hackathon Evaluator:
77
- `task=<name> env=contextprune model=<model> step=<n> action=<str> reward=<0.00> done=<bool> score=<score> rewards=<r1,r2...>`
78
 
79
  ---
80
  *Built for the Meta x Scaler Hackathon 2026*
 
1
  # ContextPrune: Adaptive Context Optimization Environment
2
 
3
+ **ContextPrune** is a specialized Reinforcement Learning (RL) environment designed to tackle **Attention Dilution** in large-scale RAG pipelines. It is fully compliant with the **Meta x Scaler Hackathon Round 1** specification.
4
 
5
  ---
6
 
7
+ ## 💡 Motivation: Attention Dilution
8
+ In Retrieval-Augmented Generation (RAG), as context windows expand, LLMs often suffer from "Attention Dilution"—the inclusion of irrelevant or redundant information that distracts the model from the ground-truth signal. ContextPrune provides a training ground for agents to surgically remove noise and compress context, improving both accuracy and inference efficiency.
 
 
 
 
 
 
9
 
10
  ---
11
 
 
18
 
19
  ### Observation Space (ContextObservation)
20
  - **question**: The user query to be answered.
21
+ - **chunks**: A list of text strings (exactly 5 for standard tasks, or variable for Signal Extract).
22
  - **initial_token_count**: The total token count before optimization.
23
  - **current_token_count**: Cumulative tokens of the currently selected chunks.
24
+ - **task_name**: `noise_purge`, `dedupe_arena`, or `signal_extract`.
25
 
26
  ---
27
 
28
+ ## 🏆 Task Descriptions & Baseline Scores
29
 
30
+ | Task ID | Name | Difficulty | Baseline Score | Objective |
31
+ | :--- | :--- | :--- | :--- | :--- |
32
+ | **01** | `noise_purge` | **Easy** | **1.00** | Prune 1 random garbage chunk + keep 1 Gold chunk. |
33
+ | **02** | `dedupe_arena` | **Medium** | **1.00** | Resolve redundancy among 3 chunks (Jaccard > 0.8). |
34
+ | **03** | `signal_extract` | **Hard** | **0.85+** | Extract signal from 2,000+ words of noise. |
35
 
36
  ---
37
 
38
+ ## 📊 Reward Engineering
39
+ - **Partial Progress**: `+0.1` for every irrelevant/duplicate chunk correctly pruned.
40
+ - **Final Accuracy**: `+0.7` bonus if the Gold chunk is preserved in the final state.
41
+ - **Critical Failure**: `-1.0` penalty and immediate termination if the Gold chunk is pruned.
 
42
 
43
  ---
44
 
45
+ ## 🛠️ Infrastructure & Setup
 
 
 
 
 
46
 
47
+ ### Requirements
48
+ - **vCPU**: 2 | **RAM**: 8GB
49
+ - **Runtime**: Python 3.10
50
+ - **Port**: 8000
51
 
52
+ ### Local Execution
 
 
 
 
53
  ```bash
54
+ # Set your API Key
55
+ export GEMINI_API_KEY=your_key_here
56
 
57
+ # Run the mandatory inference script
58
+ python inference.py
59
  ```
60
 
61
+ ### Validator Compliance
62
+ Run `openenv validate` to ensure all 3/3 checks pass.
 
63
 
64
  ---
65
  *Built for the Meta x Scaler Hackathon 2026*
context_pruning_env/utils.py CHANGED
@@ -50,14 +50,14 @@ class SQuADLoader:
50
  chunks.append({"content": "Actually, " + gold_context, "is_gold": True, "is_duplicate": True})
51
 
52
  elif task_name == "signal_extract":
53
- # Hard: 1 Long context (2,000+ words)
54
- # We simulate this by taking 10 random SQuAD contexts and joining them.
55
- # Only one contains the answer.
56
- long_context_parts = []
57
- long_context_parts.append(gold_context)
58
- for _ in range(15): # ~15 chunks of ~150 words = ~2250 words
59
  _, noise_entry = self._get_next_entry()
60
- long_context_parts.append(noise_entry["context"])
 
 
61
 
62
  # Shuffling the parts so the gold one isn't first
63
  random.shuffle(long_context_parts)
 
50
  chunks.append({"content": "Actually, " + gold_context, "is_gold": True, "is_duplicate": True})
51
 
52
  elif task_name == "signal_extract":
53
+ # Hard: 1 Gold context + multiple noise (2,000+ words total)
54
+ long_context_parts = [gold_context]
55
+ current_words = len(gold_context.split())
56
+ while current_words < 2200: # Ensure 2,000+ words
 
 
57
  _, noise_entry = self._get_next_entry()
58
+ content = noise_entry["context"]
59
+ long_context_parts.append(content)
60
+ current_words += len(content.split())
61
 
62
  # Shuffling the parts so the gold one isn't first
63
  random.shuffle(long_context_parts)
inference.py CHANGED
@@ -16,25 +16,25 @@ load_dotenv()
16
 
17
  # Mandatory Environment Variables
18
  API_BASE_URL = os.environ.get("API_BASE_URL", "https://generativelanguage.googleapis.com/v1beta/openai/")
19
- MODEL_NAME = os.environ.get("MODEL_NAME", "gemini-1.5-flash")
20
- HF_TOKEN = os.environ.get("HF_TOKEN", os.environ.get("GOOGLE_API_KEY", ""))
21
 
22
  def run_inference():
23
- if not HF_TOKEN:
24
- print("ERROR: HF_TOKEN (or GOOGLE_API_KEY) not found.")
25
  return
26
 
27
- client = OpenAI(api_key=HF_TOKEN, base_url=API_BASE_URL)
28
  env = ContextPruningEnv()
29
 
30
  tasks = ["noise_purge", "dedupe_arena", "signal_extract"]
31
 
32
  for task in tasks:
33
- # [START] tag for automated evaluation
34
- print(f"[START] task={task} env=contextprune model={MODEL_NAME}")
35
-
36
  obs = env.reset(task_name=task)
37
 
 
 
 
38
  step_n = 1
39
  prompt = (
40
  f"Task: {task}\n"
@@ -67,26 +67,27 @@ def run_inference():
67
  action = ContextAction(mask=mask)
68
  final_obs = env.step(action)
69
 
70
- # [STEP] tag for each action in the trajectory
 
 
71
  step_log = (
72
  f"[STEP] task={task} "
 
 
73
  f"step={step_n} "
74
  f"action={json.dumps(mask)} "
75
  f"reward={final_obs.reward:.2f} "
76
- f"done={str(final_obs.done).lower()}"
77
- )
78
- print(step_log)
79
-
80
- # [END] tag for episode completion
81
- score = final_obs.metadata.get('eval_score', 0)
82
- success = score > 0.5
83
- end_log = (
84
- f"[END] task={task} "
85
- f"score={score:.2f} "
86
  f"success={str(success).lower()} "
 
 
87
  f"rewards={final_obs.reward:.2f}"
88
  )
89
- print(end_log)
 
 
 
90
 
91
  if __name__ == "__main__":
92
  run_inference()
 
16
 
17
  # Mandatory Environment Variables
18
  API_BASE_URL = os.environ.get("API_BASE_URL", "https://generativelanguage.googleapis.com/v1beta/openai/")
19
+ MODEL_NAME = os.environ.get("MODEL_NAME", "gemini-3-flash")
20
+ GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY", os.environ.get("GOOGLE_API_KEY", ""))
21
 
22
  def run_inference():
23
+ if not GEMINI_API_KEY:
24
+ print("ERROR: GEMINI_API_KEY not found.")
25
  return
26
 
27
+ client = OpenAI(api_key=GEMINI_API_KEY, base_url=API_BASE_URL)
28
  env = ContextPruningEnv()
29
 
30
  tasks = ["noise_purge", "dedupe_arena", "signal_extract"]
31
 
32
  for task in tasks:
 
 
 
33
  obs = env.reset(task_name=task)
34
 
35
+ # [START] Framing for Automated Evaluator
36
+ print(f"[START] task={task} env=contextprune model={MODEL_NAME} step=0 action=null reward=0.0 done=false success=null score=0.0")
37
+
38
  step_n = 1
39
  prompt = (
40
  f"Task: {task}\n"
 
67
  action = ContextAction(mask=mask)
68
  final_obs = env.step(action)
69
 
70
+ # [STEP] Framing
71
+ score = final_obs.metadata.get('eval_score', 0)
72
+ success = score > 0.5
73
  step_log = (
74
  f"[STEP] task={task} "
75
+ f"env=contextprune "
76
+ f"model={MODEL_NAME} "
77
  f"step={step_n} "
78
  f"action={json.dumps(mask)} "
79
  f"reward={final_obs.reward:.2f} "
80
+ f"done={str(final_obs.done).lower()} "
81
+ f"error=null "
 
 
 
 
 
 
 
 
82
  f"success={str(success).lower()} "
83
+ f"steps={step_n} "
84
+ f"score={score:.2f} "
85
  f"rewards={final_obs.reward:.2f}"
86
  )
87
+ print(step_log)
88
+
89
+ # [END] Framing
90
+ print(f"[END] task={task} score={score:.2f} success={str(success).lower()}")
91
 
92
  if __name__ == "__main__":
93
  run_inference()
openenv.yaml CHANGED
@@ -2,8 +2,8 @@ spec_version: 1
2
  name: contextprune
3
  version: 0.1.0
4
  type: space
5
- runtime: python
6
- app: context_pruning_env.server.app:app
7
  port: 8000
8
  resources:
9
  cpu: 2
 
2
  name: contextprune
3
  version: 0.1.0
4
  type: space
5
+ runtime: python:3.10-slim
6
+ app: server/app.py
7
  port: 8000
8
  resources:
9
  cpu: 2
server/app.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import uvicorn
3
+ from openenv.core.env_server.http_server import create_fastapi_app
4
+ from context_pruning_env.env import ContextPruningEnv
5
+
6
+ # Initialize the Hackathon-compliant environment
7
+ env = ContextPruningEnv()
8
+
9
+ # Create the standard OpenEnv FastAPI app
10
+ app = create_fastapi_app(env)
11
+
12
+ def main() -> None:
13
+ port = int(os.environ.get("PORT", "8000"))
14
+ uvicorn.run(app, host="0.0.0.0", port=port)
15
+
16
+ if __name__ == "__main__":
17
+ main()