Spaces:

prithic07
/

context-prune

Sleeping

App Files Files Community

prithic07 commited on Apr 4

Commit

6cad4bb

1 Parent(s): 222f8ce

Meta x Scaler Compliance: Strict logging, Port 8000 sync, and mandatory env vars.

Browse files

Files changed (15) hide show

Dockerfile +2 -17
README.md +57 -39
app_ui.py +47 -20
context_pruning_env/env.py +56 -21
context_pruning_env/graders.py +31 -31
context_pruning_env/models.py +12 -12
context_pruning_env/server/app.py +6 -4
context_pruning_env/utils.py +30 -43
inference.py +62 -49
openenv.yaml +5 -4
requirements.txt +1 -1
stderr.log +0 -0
stderr_utf8.log +63 -0
stdout.log +0 -0
stdout_utf8.log +3 -0

Dockerfile CHANGED Viewed

@@ -7,23 +7,8 @@ ENV PYTHONPATH=/app
 WORKDIR /app
 # Install system dependencies
-RUN apt-get update && apt-get install -y \
-    build-essential \
-    curl \
-    && rm -rf /var/lib/apt/lists/*
-# Optimize for 2 vCPU and 8GB RAM
-# Copy and install Python dependencies separately for layer caching
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
-# Copy all source files
-COPY context_pruning_env ./context_pruning_env
-COPY inference.py .
-COPY openenv.yaml .
-# Expose the standard OpenEnv port
-EXPOSE 7860
-# FastAPI app entrypoint
-CMD ["uvicorn", "context_pruning_env.server.app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "2"]

 WORKDIR /app
 # Install system dependencies
+# Install dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
+CMD ["uvicorn", "context_pruning_env.server.app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

README.md CHANGED Viewed

@@ -1,62 +1,80 @@
-# Adaptive Context Optimization Agent (ContextPrune)
-**ContextPrune** is a specialized Reinforcement Learning (RL) environment for optimizing retrieved context in RAG systems, designed for the **Meta x Scaler Hackathon**.
-> “ContextPrune reduces noise and tokens in RAG pipelines while preserving answer quality.”
 ---
-## 🚀 Hackathon Tasks
-| Task Name | Difficulty | Objective |
-| :--- | :--- | :--- |
-| `noise_filter` | **Easy** | Identify and prune 4 random noise chunks while keeping the gold chunk. |
-| `deduplication` | **Medium** | Recognize duplicate gold chunks and prune exactly one of them. |
-| `sentence_distillation` | **Hard** | Sharp pruning of context to isolate the core sentence containing the answer. |
-## 🛠️ Installation & Setup
-```bash
-# Install dependencies
-pip install -r requirements.txt
-# Set your Gemini API Key
-export GOOGLE_API_KEY=your_key_here
-# Verify the environment and task logic
-pytest test_tasks.py
-```
-## 🧠 Environment API
-The environment strictly follows the **openenv-core** interface (`reset`, `step`, `state`):
-- **Reset**: `env.reset(task_name=...)`
-- **Step**: `env.step(ContextAction(mask=...))`
-- **Observation**: Question + 5 context chunks as strings.
-- **Reward**: Programmatic 0.0–1.0 score based on accuracy and efficiency.
-## 🐳 Docker & OpenEnv
-The project is containerized for deployment on 2 vCPU and 8GB RAM environments:
 ```bash
-# Build the production container
-docker build -t context-prune .
-# Run the OpenEnv server
-docker run -p 7860:7860 context-prune
-```
-## 📜 Standardized Inference
-The [inference.py](file:///d:/Projects/RAG/inference.py) script emits logs with mandatory tags for automated evaluation:
-```text
-<OBSERVATION>{...}</OBSERVATION>
-<ACTION>{...}</ACTION>
-<REWARD>{...}</REWARD>
 ```
 ---
-*Powered by OpenEnv*

+# ContextPrune: Adaptive Context Optimization Environment
+**ContextPrune** is a Meta x Scaler Hackathon compliant reinforcement learning environment designed for Phase 1: Automated Validation. It focuses on the critical task of context pruning for RAG pipelines, reducing noise and token counts while strictly preserving answer faithfulness.
 ---
+## 🌍 Environment Description
+ContextPrune implements the **OpenEnv Spec**, providing a standardized interface for RL agents to optimize retrieved contexts. The environment presents a query and multiple context chunks (from SQuAD or synthetic noise) where the agent must decide which chunks to keep and which to prune using a binary mask.
+### Resource Constraints
+- **vCPU**: 2
+- **RAM**: 8GB
+- **Runtime**: Python 3.10+
+- **Port**: 8000 (OpenEnv Server)
+---
+## 🎮 Action & Observation Spaces
+### Action Space (ContextAction)
+- **Type**: Binary Mask (`List[int]`)
+- **Values**: `1` (Keep), `0` (Prune)
+- **Constraint**: Must match the number of chunks in the current observation.
+### Observation Space (ContextObservation)
+- **question**: The user query to be answered.
+- **chunks**: A list of text strings representing the retrieved context.
+- **initial_token_count**: The total token count before optimization.
+- **current_token_count**: Cumulative tokens of the currently selected chunks.
+- **task_name**: The identifier for the current pruning task.
+---
+## 🏆 Task Descriptions
+| Task ID | Name | Difficulty | Scoring Logic |
+| :--- | :--- | :--- | :--- |
+| **01** | `noise_purge` | **Easy** | 0.0 or 1.0. Perfect score if all noise is deleted and the answer is kept. |
+| **02** | `dedupe_arena` | **Medium** | 1.0 if word count is reduced by >50% while preserving the answer. |
+| **03** | `signal_extract` | **Hard** | $1 - (FinalTokens/InitialTokens)$. Score scales with compression ratio. |
+---
+## 📈 Reward Function (Trajectory Signals)
+The environment emits rewards based on the agent's efficiency and accuracy:
+- **Efficiency**: `+0.1` for every irrelevant chunk or duplicate correctly pruned.
+- **Accuracy**: `+0.7` bonus at the end of the trajectory if the "Gold Chunk" is preserved.
+- **Death Penalty**: `-1.0` and immediate `done=True` if the agent prunes the Gold Chunk (Information Loss).
+---
+## 🛠️ Setup Instructions
+### 1. Local Development
 ```bash
+# Install dependencies
+pip install -r requirements.txt
+# Configure API (Optional for testing)
+echo "GOOGLE_API_KEY=your_key" > .env
+# Run Inference Evaluation
+python inference.py
+```
+### 2. Docker Deployment
+```bash
+# Build the standardized image
+docker build -t contextprune .
+# Start the environment server
+docker run -p 8000:8000 contextprune
 ```
+### 3. Inference Logging
+Mandatory logs are emitted in the following format for the Hackathon Evaluator:
+`task=<name> env=contextprune model=<model> step=<n> action=<str> reward=<0.00> done=<bool> score=<score> rewards=<r1,r2...>`
 ---
+*Built for the Meta x Scaler Hackathon 2026*

app_ui.py CHANGED Viewed

@@ -30,16 +30,20 @@ async def call_gemini(prompt: str, model_name: str = "gemini-1.5-flash") -> str:
     except Exception as e:
         return f"ERROR: {str(e)}"
-def chunk_text(text: str, max_chunks: int = 5) -> List[str]:
     """Split text into manageable chunks (paragraphs or sentences)."""
-    # Split by double newline first
-    chunks = [c.strip() for c in re.split(r'\n\s*\n', text) if c.strip()]
-    if len(chunks) < 2:
-        # Split by sentence if only one paragraph
-        chunks = [c.strip() for c in re.split(r'(?<=[.!?])\s+', text) if c.strip()]
-    # Simple limit to 5-10 chunks for the demo
-    return chunks[:10]
 async def prune_context(query: str, raw_text: str) -> Tuple[str, dict, str]:
     """
@@ -53,26 +57,49 @@ async def prune_context(query: str, raw_text: str) -> Tuple[str, dict, str]:
     # Prompt for selection
     selection_prompt = (
         f"Query: {query}\n\n"
-        "Below are several context chunks. Identify which are RELEVANT and which are NOISE or DUPLICATES. "
-        "Output a JSON list of indices (0-indexed) of the chunks to KEEP.\n"
-        "Example output: [0, 2, 3]\n\n"
         "Chunks:\n"
     )
     for i, c in enumerate(chunks):
         selection_prompt += f"Chunk {i}: {c}\n\n"
     raw_response = await call_gemini(selection_prompt)
-    # Extract indices
-    match = re.search(r"\[([\d\s,]+)\]", raw_response)
-    if match:
-        try:
-            indices = json.loads(f"[{match.group(1)}]")
-            kept_chunks = [chunks[i] for i in indices if i < len(chunks)]
-        except:
-            kept_chunks = chunks # Fallback
     else:
-        kept_chunks = chunks # Fallback
     optimized_text = " ".join(kept_chunks)

     except Exception as e:
         return f"ERROR: {str(e)}"
+def chunk_text(text: str, max_chunks: int = 10) -> List[str]:
     """Split text into manageable chunks (paragraphs or sentences)."""
+    # 1. First split by double newlines (paragraphs)
+    initial_chunks = [c.strip() for c in re.split(r'\n\s*\n', text) if c.strip()]
+    final_chunks = []
+    # 2. If paragraphs are too few or long, split them into sentences
+    for chunk in initial_chunks:
+        # Split by sentence markers [.!?] followed by space or newline
+        sentences = [s.strip() for s in re.split(r'(?<=[.!?])\s+|\n', chunk) if s.strip()]
+        final_chunks.extend(sentences)
+    # Simple limit to 10 chunks to avoid overwhelming the prompt
+    return final_chunks[:max_chunks]
 async def prune_context(query: str, raw_text: str) -> Tuple[str, dict, str]:
     """
     # Prompt for selection
     selection_prompt = (
         f"Query: {query}\n\n"
+        "TASK: Select indices of context chunks that are directly relevant to the query. "
+        "Remove noise, random facts, and duplicates. "
+        "OUTPUT: Output ONLY the list of indices as a JSON array like [0, 2, 4]. No explanations.\n\n"
         "Chunks:\n"
     )
     for i, c in enumerate(chunks):
         selection_prompt += f"Chunk {i}: {c}\n\n"
     raw_response = await call_gemini(selection_prompt)
+    print(f"DEBUG: Gemini Response: {raw_response}")
+    from context_pruning_env.graders import (
+        grade_noise_purge,
+        grade_dedupe_arena,
+        grade_signal_extract
+    )
+    # Ultra-robust extraction
+    indices = []
+    try:
+        match = re.search(r"\[([\d\s,]+)\]", raw_response)
+        if match:
+            # Found a bracketed list of numbers
+            content = match.group(0) # e.g. "[0, 1, 2]"
+            indices = json.loads(content)
+        else:
+            # Try finding any numbers in the response if no brackets
+            nums = re.findall(r"\d+", raw_response)
+            indices = [int(n) for n in nums]
+        # Clean up: only valid unique indices
+        indices = list(set([int(i) for i in indices if isinstance(i, int) and 0 <= i < len(chunks)]))
+        print(f"DEBUG: Successfully extracted indices: {indices}")
+    except Exception as e:
+        print(f"DEBUG: Extraction Error: {e}")
+        indices = []
+    if indices:
+        kept_chunks = [chunks[i] for i in sorted(indices)]
     else:
+        # Fallback to keep everything if AI fails, but message it
+        print("DEBUG: Pruning failed, keeping original context.")
+        kept_chunks = chunks
     optimized_text = " ".join(kept_chunks)

context_pruning_env/env.py CHANGED Viewed

@@ -12,9 +12,9 @@ from context_pruning_env.models import (
 )
 from context_pruning_env.utils import SQuADLoader, count_tokens
 from context_pruning_env.graders import (
-    grade_noise_filter,
-    grade_deduplication,
-    grade_sentence_distillation
 )
 class ContextPruningEnv(Environment[ContextAction, ContextObservation, PruningState]):
@@ -31,13 +31,14 @@ class ContextPruningEnv(Environment[ContextAction, ContextObservation, PruningSt
         self,
         seed: Optional[int] = None,
         episode_id: Optional[str] = None,
-        task_name: Optional[str] = "noise_filter",
         **kwargs: Any,
     ) -> ContextObservation:
         """
         Starts a new episode with the specified task.
         """
-        question, chunks_data = self.loader.get_episode(task_name or "noise_filter")
         chunks = []
         total_tokens = 0
@@ -53,7 +54,7 @@ class ContextPruningEnv(Environment[ContextAction, ContextObservation, PruningSt
         self._state = PruningState(
             episode_id=episode_id or str(uuid4()),
-            task_name=task_name or "noise_filter",
             question=question,
             chunks=chunks,
             initial_tokens=total_tokens,
@@ -69,7 +70,8 @@ class ContextPruningEnv(Environment[ContextAction, ContextObservation, PruningSt
             done=self._state.done,
             question=self._state.question,
             chunks=[c.content for c in self._state.chunks],
-            token_count=sum(c.tokens for c in self._state.chunks),
             task_name=self._state.task_name,
             message=message
         )
@@ -80,34 +82,67 @@ class ContextPruningEnv(Environment[ContextAction, ContextObservation, PruningSt
         **kwargs: Any,
     ) -> ContextObservation:
         """
-        Takes a binary mask and grades according to task rules.
         """
         if self._state.done:
             return self._observe(message="Episode is already done.")
         mask = action.mask
-        # 1. Select Grader
-        if self._state.task_name == "noise_filter":
-            reward_obj = grade_noise_filter(mask, self._state.chunks)
-        elif self._state.task_name == "deduplication":
-            reward_obj = grade_deduplication(mask, self._state.chunks)
-        elif self._state.task_name == "sentence_distillation":
-            reward_obj = grade_sentence_distillation(mask, self._state.chunks)
         else:
-            reward_obj = grade_noise_filter(mask, self._state.chunks)
         self._state.done = True
         self._state.step_count += 1
-        # Signal reward in observation
-        obs = self._observe(message=reward_obj.message)
-        obs.reward = reward_obj.score
-        # Add detailed reward to metadata for hackathon transparency
         if not obs.metadata:
             obs.metadata = {}
-        obs.metadata["reward_detail"] = reward_obj.model_dump()
         return obs

 )
 from context_pruning_env.utils import SQuADLoader, count_tokens
 from context_pruning_env.graders import (
+    grade_noise_purge,
+    grade_dedupe_arena,
+    grade_signal_extract
 )
 class ContextPruningEnv(Environment[ContextAction, ContextObservation, PruningState]):
         self,
         seed: Optional[int] = None,
         episode_id: Optional[str] = None,
+        task_name: Optional[str] = "noise_purge",
         **kwargs: Any,
     ) -> ContextObservation:
         """
         Starts a new episode with the specified task.
         """
+        task_name = task_name or "noise_purge"
+        question, chunks_data = self.loader.get_episode(task_name)
         chunks = []
         total_tokens = 0
         self._state = PruningState(
             episode_id=episode_id or str(uuid4()),
+            task_name=task_name,
             question=question,
             chunks=chunks,
             initial_tokens=total_tokens,
             done=self._state.done,
             question=self._state.question,
             chunks=[c.content for c in self._state.chunks],
+            initial_token_count=self._state.initial_tokens,
+            current_token_count=sum(c.tokens for c in self._state.chunks),
             task_name=self._state.task_name,
             message=message
         )
         **kwargs: Any,
     ) -> ContextObservation:
         """
+        Takes a binary mask and calculates rewards based on trajectory signals.
         """
         if self._state.done:
             return self._observe(message="Episode is already done.")
         mask = action.mask
+        if len(mask) != len(self._state.chunks):
+            # Pad or truncate mask to match chunk count if agent is misaligned
+            mask = (mask + [1] * len(self._state.chunks))[:len(self._state.chunks)]
+        # Trajectory Simulation Logic
+        total_reward = 0.0
+        efficiency_reward = 0.0
+        accuracy_reward = 0.0
+        gold_penalty = 0.0
+        success = True
+        for i, kept in enumerate(mask):
+            chunk = self._state.chunks[i]
+            if not kept: # Pruned
+                if chunk.is_gold:
+                    # Critical Failure
+                    gold_penalty = -1.0
+                    success = False
+                    break # Immediate stop
+                else:
+                    # Correctly pruned noise/duplicate
+                    efficiency_reward += 0.1
+            else: # Kept
+                pass
+        # Final Accuracy Bonus
+        if success:
+            accuracy_reward = 0.7
+        total_reward = efficiency_reward + accuracy_reward + gold_penalty
+        # Task Score (Normalized 0.0 to 1.0 for the evaluator)
+        if self._state.task_name == "noise_purge":
+            score_obj = grade_noise_purge(mask, self._state.chunks)
+        elif self._state.task_name == "dedupe_arena":
+            score_obj = grade_dedupe_arena(mask, self._state.chunks)
+        elif self._state.task_name == "signal_extract":
+            score_obj = grade_signal_extract(mask, self._state.chunks)
         else:
+            score_obj = grade_noise_purge(mask, self._state.chunks)
         self._state.done = True
         self._state.step_count += 1
+        obs = self._observe(message=score_obj.message)
+        obs.reward = total_reward # Trajectory reward
         if not obs.metadata:
             obs.metadata = {}
+        obs.metadata["eval_score"] = score_obj.score # Grader score
+        obs.metadata["reward_detail"] = {
+            "efficiency": efficiency_reward,
+            "accuracy": accuracy_reward,
+            "penalty": gold_penalty
+        }
         return obs

context_pruning_env/graders.py CHANGED Viewed

@@ -1,56 +1,56 @@
 from typing import List
 from context_pruning_env.models import ChunkItem, ContextReward
-def grade_noise_filter(mask: List[int], chunks: List[ChunkItem]) -> ContextReward:
     """
-    Score: 1.0 if gold kept AND noise pruned.
     """
     gold_kept = any(mask[i] == 1 and chunks[i].is_gold for i in range(len(mask)))
     noise_pruned = all(mask[i] == 0 for i in range(len(mask)) if not chunks[i].is_gold)
     if not gold_kept:
-        return ContextReward(score=0.0, penalty=-1.0, message="Critical Failure: Gold chunk lost.")
     if noise_pruned:
-        return ContextReward(score=1.0, accuracy_bonus=0.5, efficiency_bonus=0.5, message="Perfect: Gold kept and all noise pruned.")
     else:
-        return ContextReward(score=0.5, accuracy_bonus=0.5, message="Partial: Gold kept but some noise remains.")
-def grade_deduplication(mask: List[int], chunks: List[ChunkItem]) -> ContextReward:
     """
-    Score: 1.0 if EXACTLY 1 gold kept AND 0 noise kept.
     """
-    gold_indices = [i for i, c in enumerate(chunks) if c.is_gold]
-    noise_indices = [i for i, c in enumerate(chunks) if not c.is_gold]
-    kept_gold_count = sum(1 for i in gold_indices if mask[i] == 1)
-    kept_noise_count = sum(1 for i in noise_indices if mask[i] == 1)
-    if kept_gold_count == 0:
-        return ContextReward(score=0.0, message="Critical Failure: All gold chunks lost.")
-    if kept_gold_count == 1 and kept_noise_count == 0:
-        return ContextReward(score=1.0, message="Perfect: Exactly 1 gold kept and 0 noise.")
-    elif kept_gold_count > 1:
-        return ContextReward(score=0.4, message="Partial: Duplicates detected.")
     else:
-        return ContextReward(score=0.5, message="Partial: Gold kept but noise remains.")
-def grade_sentence_distillation(mask: List[int], chunks: List[ChunkItem]) -> ContextReward:
     """
-    Score: 1.0 if gold kept AND at least 3 noise sentences pruned.
     """
-    gold_index = next(i for i, c in enumerate(chunks) if c.is_gold)
-    gold_kept = (mask[gold_index] == 1)
-    noise_indices = [i for i, c in enumerate(chunks) if not c.is_gold]
-    pruned_noise_count = sum(1 for i in noise_indices if mask[i] == 0)
     if not gold_kept:
-        return ContextReward(score=0.0, message="Critical Failure: Summary is missing the answer.")
-    if pruned_noise_count >= 3:
-        return ContextReward(score=1.0, message="Perfect: Sharp distillation achieved.")
-    else:
-        efficiency = pruned_noise_count / len(noise_indices) if noise_indices else 1.0
-        return ContextReward(score=0.2 + 0.5 * efficiency, message="Partial: Distillation incomplete.")

 from typing import List
 from context_pruning_env.models import ChunkItem, ContextReward
+def grade_noise_purge(mask: List[int], chunks: List[ChunkItem]) -> ContextReward:
     """
+    Easy Task: Score 1.0 if gold kept AND noise pruned.
     """
     gold_kept = any(mask[i] == 1 and chunks[i].is_gold for i in range(len(mask)))
     noise_pruned = all(mask[i] == 0 for i in range(len(mask)) if not chunks[i].is_gold)
     if not gold_kept:
+        return ContextReward(score=0.0, gold_penalty=-1.0, message="Critical: Gold chunk lost.")
     if noise_pruned:
+        return ContextReward(score=1.0, message="Perfect: All noise purged.")
     else:
+        return ContextReward(score=0.5, message="Partial: Gold kept but noise remains.")
+def grade_dedupe_arena(mask: List[int], chunks: List[ChunkItem]) -> ContextReward:
     """
+    Medium Task: 1.0 if word count reduced > 50% AND gold kept.
     """
+    initial_words = sum(len(c.content.split()) for c in chunks)
+    final_words = sum(len(chunks[i].content.split()) for i, kept in enumerate(mask) if kept)
+    gold_kept = any(mask[i] == 1 and chunks[i].is_gold for i in range(len(mask)))
+    reduction = 1.0 - (final_words / initial_words) if initial_words > 0 else 1.0
+    if not gold_kept:
+        return ContextReward(score=0.0, message="Critical: Answer lost during deduplication.")
+    if reduction >= 0.5:
+        return ContextReward(score=1.0, message=f"Great: {reduction:.1%} word reduction achieved.")
     else:
+        return ContextReward(score=0.5, message=f"Partial: Only {reduction:.1%} reduction.")
+def grade_signal_extract(mask: List[int], chunks: List[ChunkItem]) -> ContextReward:
     """
+    Hard Task: 1 - (FinalTokens/InitialTokens) if gold kept.
     """
+    initial_tokens = sum(c.tokens for c in chunks)
+    final_tokens = sum(chunks[i].tokens for i, kept in enumerate(mask) if kept)
+    gold_kept = any(mask[i] == 1 and chunks[i].is_gold for i in range(len(mask)))
     if not gold_kept:
+        return ContextReward(score=0.0, message="Critical: Signal lost in noise.")
+    reduction_score = 1.0 - (final_tokens / initial_tokens) if initial_tokens > 0 else 0.0
+    # Ensure score is at least positive if gold is kept
+    final_score = max(0.1, reduction_score)
+    return ContextReward(
+        score=final_score,
+        message=f"Signal Extracted: {reduction_score:.1%} compression."
+    )

context_pruning_env/models.py CHANGED Viewed

@@ -5,13 +5,12 @@ from openenv.core.env_server.types import Action, Observation, State
 class ContextAction(Action):
     """
-    Action space: A binary mask of 5 values (1 = keep, 0 = prune).
     """
     mask: List[int] = Field(
         ...,
-        min_items=5,
-        max_items=5,
-        description="Binary mask of 5 integers (0 or 1) indicating which chunks to keep."
     )
 class ContextObservation(Observation):
@@ -19,31 +18,32 @@ class ContextObservation(Observation):
     Observation provided to the agent.
     """
     question: str
-    chunks: List[str] = Field(default_factory=list, description="List of 5 context chunks.")
-    token_count: int = 0
     task_name: str = ""
     message: str = ""
 class ContextReward(BaseModel):
     """
-    Detailed reward breakdown.
     """
     score: float = Field(0.0, ge=0.0, le=1.0, description="Overall task score (0 to 1).")
-    accuracy_bonus: float = 0.0
-    efficiency_bonus: float = 0.0
-    penalty: float = 0.0
     message: str = ""
 class ChunkItem(BaseModel):
     """Internal representation of a context chunk."""
     content: str
     is_gold: bool = False
-    is_duplicate: bool = False
     tokens: int = 0
 class PruningState(State):
     """
-    Internal state of the environment.
     """
     task_name: str
     question: str

 class ContextAction(Action):
     """
+    Action space: A binary mask of N values (1 = keep, 0 = prune).
     """
     mask: List[int] = Field(
         ...,
+        min_items=1,
+        description="Binary mask of integers (0 or 1) indicating which chunks to keep."
     )
 class ContextObservation(Observation):
     Observation provided to the agent.
     """
     question: str
+    chunks: List[str] = Field(default_factory=list, description="Current context chunks.")
+    initial_token_count: int = 0
+    current_token_count: int = 0
     task_name: str = ""
     message: str = ""
 class ContextReward(BaseModel):
     """
+    Detailed reward breakdown for Meta x Scaler audit.
     """
     score: float = Field(0.0, ge=0.0, le=1.0, description="Overall task score (0 to 1).")
+    efficiency_reward: float = 0.0
+    accuracy_reward: float = 0.0
+    gold_penalty: float = 0.0
     message: str = ""
 class ChunkItem(BaseModel):
     """Internal representation of a context chunk."""
     content: str
     is_gold: bool = False
     tokens: int = 0
+    is_duplicate: bool = False
 class PruningState(State):
     """
+    Internal state for ContextPrune.
     """
     task_name: str
     question: str

context_pruning_env/server/app.py CHANGED Viewed

@@ -1,18 +1,20 @@
 import os
 from openenv.core.env_server.http_server import create_fastapi_app
 from context_pruning_env.env import ContextPruningEnv
-from context_pruning_env.models import PruningAction, PruningObservation
 app = create_fastapi_app(
     ContextPruningEnv,
-    PruningAction,
-    PruningObservation,
 )
 def main() -> None:
     import uvicorn
-    port = int(os.environ.get("PORT", "7860"))
     uvicorn.run(app, host="0.0.0.0", port=port)
 if __name__ == "__main__":
     main()

 import os
 from openenv.core.env_server.http_server import create_fastapi_app
 from context_pruning_env.env import ContextPruningEnv
+from context_pruning_env.models import ContextAction, ContextObservation
 app = create_fastapi_app(
     ContextPruningEnv,
+    ContextAction,
+    ContextObservation,
 )
 def main() -> None:
     import uvicorn
+    port = int(os.environ.get("PORT", "8000"))
     uvicorn.run(app, host="0.0.0.0", port=port)
 if __name__ == "__main__":
     main()

context_pruning_env/utils.py CHANGED Viewed

@@ -35,57 +35,44 @@ class SQuADLoader:
         chunks = []
-        if task_name == "noise_filter":
-            # 1 Gold, 4 Noise from random entries
             chunks.append({"content": gold_context, "is_gold": True, "is_duplicate": False})
-            for _ in range(4):
-                _, noise_entry = self._get_next_entry()
-                chunks.append({"content": noise_entry["context"], "is_gold": False, "is_duplicate": False})
-        elif task_name == "deduplication":
-            # 2 Gold (Identical), 3 Noise
             chunks.append({"content": gold_context, "is_gold": True, "is_duplicate": False})
-            chunks.append({"content": gold_context, "is_gold": True, "is_duplicate": True})
-            for _ in range(3):
-                _, noise_entry = self._get_next_entry()
-                chunks.append({"content": noise_entry["context"], "is_gold": False, "is_duplicate": False})
-        elif task_name == "sentence_distillation":
-            # Split gold context into sentences. Take the one with the answer.
-            # Fill remaining slots with other sentences from the same context.
-            sentences = re.split(r'(?<=[.!?])\s+', gold_context)
-            answer_text = entry["answers"]["text"][0]
-            gold_sentence = None
-            other_sentences = []
-            for s in sentences:
-                if answer_text.lower() in s.lower() and gold_sentence is None:
-                    gold_sentence = s
-                else:
-                    other_sentences.append(s)
-            if gold_sentence is None:
-                # Fallback if answer spans multiple sentences or is not found cleanly
-                gold_sentence = sentences[0]
-                other_sentences = sentences[1:]
-            chunks.append({"content": gold_sentence, "is_gold": True, "is_duplicate": False})
-            # Sample 4 more or fill with random if not enough sentences
-            random.shuffle(other_sentences)
-            for i in range(4):
-                if i < len(other_sentences):
-                    chunks.append({"content": other_sentences[i], "is_gold": False, "is_duplicate": False})
-                else:
-                    _, noise_entry = self._get_next_entry()
-                    chunks.append({"content": noise_entry["context"][:100], "is_gold": False, "is_duplicate": False})
         else:
-            # Default to noise_filter
-            return self.get_episode("noise_filter")
-        # Shuffle all tasks
-        random.shuffle(chunks)
         return question, chunks
 def count_tokens(text: str) -> int:

         chunks = []
+        if task_name == "noise_purge":
+            # Easy: 1 Gold + 1 Irrelevant
             chunks.append({"content": gold_context, "is_gold": True, "is_duplicate": False})
+            _, noise_entry = self._get_next_entry()
+            chunks.append({"content": noise_entry["context"], "is_gold": False, "is_duplicate": False})
+        elif task_name == "dedupe_arena":
+            # Medium: 1 Gold + 2 Near-Duplicates (Simulated by repeating gold)
             chunks.append({"content": gold_context, "is_gold": True, "is_duplicate": False})
+            # Duplicate 1: slightly modified or identical
+            chunks.append({"content": gold_context + " ", "is_gold": True, "is_duplicate": True})
+            # Duplicate 2: slightly modified
+            chunks.append({"content": "Actually, " + gold_context, "is_gold": True, "is_duplicate": True})
+        elif task_name == "signal_extract":
+            # Hard: 1 Long context (2,000+ words)
+            # We simulate this by taking 10 random SQuAD contexts and joining them.
+            # Only one contains the answer.
+            long_context_parts = []
+            long_context_parts.append(gold_context)
+            for _ in range(15): # ~15 chunks of ~150 words = ~2250 words
+                _, noise_entry = self._get_next_entry()
+                long_context_parts.append(noise_entry["context"])
+            # Shuffling the parts so the gold one isn't first
+            random.shuffle(long_context_parts)
+            for part in long_context_parts:
+                is_gold = (part == gold_context)
+                chunks.append({"content": part, "is_gold": is_gold, "is_duplicate": False})
         else:
+            # Default to noise_purge
+            return self.get_episode("noise_purge")
+        # Shuffle chunks for non-signal tasks
+        if task_name != "signal_extract":
+            random.shuffle(chunks)
         return question, chunks
 def count_tokens(text: str) -> int:

inference.py CHANGED Viewed

@@ -2,78 +2,91 @@ import os
 import json
 import logging
 import re
-import google.generativeai as genai
 from dotenv import load_dotenv
 from context_pruning_env.env import ContextPruningEnv
-# Load API keys from .env
-load_dotenv()
 from context_pruning_env.models import ContextAction
-# Setup simple logging
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-# Configure Gemini
-GOOGLE_API_KEY = os.environ.get("GOOGLE_API_KEY", "")
-if GOOGLE_API_KEY:
-    genai.configure(api_key=GOOGLE_API_KEY)
-def main():
-    if not GOOGLE_API_KEY:
-        logger.error("GOOGLE_API_KEY not found in environment or .env file.")
         return
-    # 1. Setup Gemini Model
-    model = genai.GenerativeModel("gemini-1.5-flash")
-    # 2. Initialize Environment
-    env = ContextPruningEnv(squad_split="train")
-    # Run a few episodes across different tasks
-    tasks = ["noise_filter", "deduplication", "sentence_distillation"]
-    for task_name in tasks:
-        logger.info(f"--- Running Task: {task_name} ---")
-        # 3. Reset (Observation)
-        obs = env.reset(task_name=task_name)
-        print(f"<OBSERVATION>{obs.model_dump_json()}</OBSERVATION>")
-        # 4. Agent Logic (Gemini Call)
         prompt = (
             f"Question: {obs.question}\n\n"
-            "Below are 5 context chunks. Output ONLY a JSON list of 5 integers (0 or 1) "
-            "where 1 means 'keep' and 0 means 'prune'. "
-            "Prioritize keeping the answer while removing noise and duplicates.\n"
-            f"Chunks: {json.dumps(obs.chunks, indent=2)}\n\n"
-            "Action format: [1, 0, 1, 1, 0]"
         )
         try:
-            response = model.generate_content(prompt)
-            completion = response.text
-            # Simple extraction of the mask [x,x,x,x,x]
-            match = re.search(r"\[\s*([01])\s*,\s*([01])\s*,\s*([01])\s*,\s*([01])\s*,\s*([01])\s*\]", completion)
             if match:
-                mask = [int(m) for m in match.groups()]
             else:
-                logger.warning(f"Failed to parse mask from Gemini output: {completion}. Falling back to [1,1,1,1,1]")
-                mask = [1, 1, 1, 1, 1]
         except Exception as e:
-            logger.error(f"Gemini Inference failed: {e}")
-            mask = [1, 1, 1, 1, 1]
-        # 5. Take Action
         action = ContextAction(mask=mask)
-        print(f"<ACTION>{action.model_dump_json()}</ACTION>")
-        # 6. Step (Reward)
         final_obs = env.step(action)
-        print(f"<REWARD>{json.dumps({'score': final_obs.reward, 'message': final_obs.message})}</REWARD>")
-        logger.info(f"Task {task_name} Result: {final_obs.message} (Score: {final_obs.reward})")
 if __name__ == "__main__":
-    main()

 import json
 import logging
 import re
+from typing import List, Optional
+from openai import OpenAI
 from dotenv import load_dotenv
 from context_pruning_env.env import ContextPruningEnv
 from context_pruning_env.models import ContextAction
+# Setup mandatory log format for Meta x Scaler Evaluator
+logging.basicConfig(level=logging.INFO, format='%(message)s')
+logger = logging.getLogger("evaluator")
+load_dotenv()
+# Mandatory Environment Variables
+API_BASE_URL = os.environ.get("API_BASE_URL", "https://generativelanguage.googleapis.com/v1beta/openai/")
+MODEL_NAME = os.environ.get("MODEL_NAME", "gemini-1.5-flash")
+HF_TOKEN = os.environ.get("HF_TOKEN", os.environ.get("GOOGLE_API_KEY", ""))
+def run_inference():
+    if not HF_TOKEN:
+        print("ERROR: HF_TOKEN (or GOOGLE_API_KEY) not found.")
         return
+    client = OpenAI(api_key=HF_TOKEN, base_url=API_BASE_URL)
+    env = ContextPruningEnv()
+    tasks = ["noise_purge", "dedupe_arena", "signal_extract"]
+    for task in tasks:
+        # [START] tag for automated evaluation
+        print(f"[START] task={task} env=contextprune model={MODEL_NAME}")
+        obs = env.reset(task_name=task)
+        step_n = 1
         prompt = (
+            f"Task: {task}\n"
             f"Question: {obs.question}\n\n"
+            "Chunks:\n"
         )
+        for i, c in enumerate(obs.chunks):
+            prompt += f"[{i}]: {c}\n"
+        prompt += "\nOutput ONLY a JSON list of indices (0 or 1) for each chunk. Example: [1, 0, 1]"
         try:
+            response = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=[{"role": "user", "content": prompt}],
+                temperature=0.0
+            )
+            content = response.choices[0].message.content
+            match = re.search(r"\[([\d\s,]+)\]", content)
             if match:
+                mask = json.loads(match.group(0))
             else:
+                mask = [1] * len(obs.chunks)
         except Exception as e:
+            logger.error(f"Inference Error: {e}")
+            mask = [1] * len(obs.chunks)
+        # Execute Action
         action = ContextAction(mask=mask)
         final_obs = env.step(action)
+        # [STEP] tag for each action in the trajectory
+        step_log = (
+            f"[STEP] task={task} "
+            f"step={step_n} "
+            f"action={json.dumps(mask)} "
+            f"reward={final_obs.reward:.2f} "
+            f"done={str(final_obs.done).lower()}"
+        )
+        print(step_log)
+        # [END] tag for episode completion
+        score = final_obs.metadata.get('eval_score', 0)
+        success = score > 0.5
+        end_log = (
+            f"[END] task={task} "
+            f"score={score:.2f} "
+            f"success={str(success).lower()} "
+            f"rewards={final_obs.reward:.2f}"
+        )
+        print(end_log)
 if __name__ == "__main__":
+    run_inference()

openenv.yaml CHANGED Viewed

@@ -1,12 +1,13 @@
 spec_version: 1
-name: ContextPrune
 type: space
-runtime: fastapi
 app: context_pruning_env.server.app:app
-port: 7860
 resources:
   cpu: 2
   memory: 8Gi
   storage: 10Gi
 timeout: 300
-description: "Adaptive Context Optimization Agent (ContextPrune): Reduces noise and tokens in RAG pipelines while preserving answer quality."

 spec_version: 1
+name: contextprune
+version: 0.1.0
 type: space
+runtime: python
 app: context_pruning_env.server.app:app
+port: 8000
 resources:
   cpu: 2
   memory: 8Gi
   storage: 10Gi
 timeout: 300
+description: "ContextPrune: Adaptive Context Optimization Environment (Meta x Scaler Round 1 Compliance)."

requirements.txt CHANGED Viewed

@@ -11,4 +11,4 @@ python-dotenv>=1.0.0
 pytest>=7.4.0
 gradio>=4.0.0
 google-generativeai>=0.3.0
-python-dotenv>=1.0.0

 pytest>=7.4.0
 gradio>=4.0.0
 google-generativeai>=0.3.0
+openai>=1.0.0

stderr.log ADDED Viewed

Binary file (3.9 kB). View file

stderr_utf8.log ADDED Viewed

	@@ -0,0 +1,63 @@

+python : D:\Projects\RAG\context_pr
+uning_env\models.py:10:
+PydanticDeprecatedSince20:
+`min_items` is deprecated and will
+be removed, use `min_length`
+instead. Deprecated in Pydantic
+V2.0 to be removed in V3.0. See
+Pydantic V2 Migration Guide at http
+s://errors.pydantic.dev/2.12/migrat
+ion/
+At line:1 char:1
++ python inference.py 2> stderr.log
++ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    + CategoryInfo          : NotS
+   pecified: (D:\Projects\RAG...2
+  .12/migration/:String) [], Rem
+ oteException
+    + FullyQualifiedErrorId : Nati
+   veCommandError
+  mask: List[int] = Field(
+HTTP Request: POST https://generati
+velanguage.googleapis.com/v1beta/op
+enai/chat/completions "HTTP/1.1
+404 Not Found"
+Inference Error: Error code: 404 -
+[{'error': {'code': 404,
+'message':
+'models/gemini-1.5-flash is not
+found for API version v1main, or
+is not supported for
+generateContent. Call ListModels
+to see the list of available
+models and their supported
+methods.', 'status': 'NOT_FOUND'}}]
+HTTP Request: POST https://generati
+velanguage.googleapis.com/v1beta/op
+enai/chat/completions "HTTP/1.1
+404 Not Found"
+Inference Error: Error code: 404 -
+[{'error': {'code': 404,
+'message':
+'models/gemini-1.5-flash is not
+found for API version v1main, or
+is not supported for
+generateContent. Call ListModels
+to see the list of available
+models and their supported
+methods.', 'status': 'NOT_FOUND'}}]
+HTTP Request: POST https://generati
+velanguage.googleapis.com/v1beta/op
+enai/chat/completions "HTTP/1.1
+404 Not Found"
+Inference Error: Error code: 404 -
+[{'error': {'code': 404,
+'message':
+'models/gemini-1.5-flash is not
+found for API version v1main, or
+is not supported for
+generateContent. Call ListModels
+to see the list of available
+models and their supported
+methods.', 'status': 'NOT_FOUND'}}]

stdout.log ADDED Viewed

Binary file (1.05 kB). View file

stdout_utf8.log ADDED Viewed

	@@ -0,0 +1,3 @@

+task=noise_purge env=contextprune model=gemini-1.5-flash step=1 action=[1, 1] reward=0.70 done=true error=null success=false steps=1 score=0.50 rewards=0.70
+task=dedupe_arena env=contextprune model=gemini-1.5-flash step=1 action=[1, 1, 1] reward=0.70 done=true error=null success=false steps=1 score=0.50 rewards=0.70
+task=signal_extract env=contextprune model=gemini-1.5-flash step=1 action=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] reward=0.70 done=true error=null success=false steps=1 score=0.10 rewards=0.70