Spaces:

DecentSanage
/

constraint-env

Sleeping

App Files Files Community

DecentSanage commited on 10 days ago

Commit

0e2d163

verified ·

1 Parent(s): c2629fe

Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

Dockerfile +5 -0
README.md +35 -1
inference.py +2 -1
openenv.yaml +3 -0
server/constraint_env_environment.py +7 -2

Dockerfile CHANGED Viewed

@@ -65,12 +65,17 @@ COPY --from=builder /app/env/.venv /app/.venv
 # Copy the environment code
 COPY --from=builder /app/env /app/env
 # Set PATH to use the virtual environment
 ENV PATH="/app/.venv/bin:$PATH"
 # Set PYTHONPATH so imports work correctly
 ENV PYTHONPATH="/app/env:$PYTHONPATH"
 ENV ENABLE_WEB_INTERFACE='true'
 # HF Spaces uses port 7860 by default; override with PORT env var for local use

 # Copy the environment code
 COPY --from=builder /app/env /app/env
+# Copy README to /app/README.md where the OpenEnv web UI looks for it
+COPY --from=builder /app/env/README.md /app/README.md
 # Set PATH to use the virtual environment
 ENV PATH="/app/.venv/bin:$PATH"
 # Set PYTHONPATH so imports work correctly
 ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Tell the web UI where to find the README (belt-and-suspenders)
+ENV ENV_README_PATH="/app/README.md"
 ENV ENABLE_WEB_INTERFACE='true'
 # HF Spaces uses port 7860 by default; override with PORT env var for local use

README.md CHANGED Viewed

@@ -12,7 +12,6 @@ base_path: /web
 # Constraint Environment
 This is the environment for training LLMs to learn a specific DSL made for time table scheduling. Model can then directly output constraints from natural language. Why this is needed? Usually time table generation is an NP hard problem, for humans it could take weeks to generate a conflict free tiem table. To solve this problem, tools are created to generate them in reasonable time. One example of those tools is CP SAT. Users can write the hardcoded constraints and the solver will generate a time table based on those constraints. Well, what happens when you want to add new constraints? Yes, you have directly change the code. What if there is a way to directly define constraints in natural language and the solver understands that automatically ? That's what we have tried to do with this project. LLM might not be good at scheduling time tables which have dozens of constraints but what it is good at is understanding natural language. For the specific purpose of defining constraints for university time tables a DSL was created whose specification is as follows:
 ```
@@ -67,6 +66,41 @@ number          ::= digit { digit }
 The model outputs a json which follows the above format which can directly be converted into CP-SAT constraints.
 ## Development & Testing
 ### Direct Environment Testing

 # Constraint Environment
 This is the environment for training LLMs to learn a specific DSL made for time table scheduling. Model can then directly output constraints from natural language. Why this is needed? Usually time table generation is an NP hard problem, for humans it could take weeks to generate a conflict free tiem table. To solve this problem, tools are created to generate them in reasonable time. One example of those tools is CP SAT. Users can write the hardcoded constraints and the solver will generate a time table based on those constraints. Well, what happens when you want to add new constraints? Yes, you have directly change the code. What if there is a way to directly define constraints in natural language and the solver understands that automatically ? That's what we have tried to do with this project. LLM might not be good at scheduling time tables which have dozens of constraints but what it is good at is understanding natural language. For the specific purpose of defining constraints for university time tables a DSL was created whose specification is as follows:
 ```
 The model outputs a json which follows the above format which can directly be converted into CP-SAT constraints.
+## Action and Observation
+The dataset for the training is in this format:
+```python
+        {
+            "prompt": (
+                "No classes should be scheduled on Saturday."
+            ),
+            "target_ast": {
+                "type": "hard",
+                "name": "no_saturday_classes",
+                "forall": [
+                    {"var": "b", "domain": "branches"},
+                    {"var": "sub", "domain": "subjects"},
+                    {"var": "d", "domain": "days"},
+                    {"var": "s", "domain": "slots"},
+                ],
+                "where": "d == 5",
+                "assert": "schedule(b, sub, d, s) == 0",
+            },
+        }
+```
+The model will get the prompt from the environment and will guess the target_ast. It will get the rewards based on how good the guess is.
+| Output              | Reward   |
+| ------------------- | -------- |
+| Valid Json          | 0.125    |
+| Correct Structure   | 0.250    |
+| Same as the target  | 0.625    |
 ## Development & Testing
 ### Direct Environment Testing

inference.py CHANGED Viewed

@@ -210,7 +210,8 @@ def _run_task(task_id: str, env_url: str = "http://localhost:8000") -> None:
             step_result = await env.step(action)
             step_count  = 1
-            reward      = float(step_result.reward or 0.0)
             done        = bool(step_result.done)
             last_error  = step_result.observation.info.get("error") if step_result.observation else None
             rewards.append(reward)

             step_result = await env.step(action)
             step_count  = 1
+            raw_reward  = float(step_result.reward or 0.0)
+            reward      = max(0.01, min(0.99, raw_reward))  # clamp [0.01, 0.99]
             done        = bool(step_result.done)
             last_error  = step_result.observation.info.get("error") if step_result.observation else None
             rewards.append(reward)

openenv.yaml CHANGED Viewed

@@ -13,12 +13,15 @@ tasks:
   - id: easy
     description: Single quantifier, direct assert, no WHERE clause
     difficulty: easy
   - id: medium
     description: Two quantifiers with a WHERE filter clause and combined assert
     difficulty: medium
   - id: hard
     description: Multiple quantifiers, nested WHERE with AND/OR, minimize objective
     difficulty: hard
 tags:
   - openenv
   - scheduling

   - id: easy
     description: Single quantifier, direct assert, no WHERE clause
     difficulty: easy
+    grader: constraint_env.server.constraint_env_environment:ConstraintEnvironment
   - id: medium
     description: Two quantifiers with a WHERE filter clause and combined assert
     difficulty: medium
+    grader: constraint_env.server.constraint_env_environment:ConstraintEnvironment
   - id: hard
     description: Multiple quantifiers, nested WHERE with AND/OR, minimize objective
     difficulty: hard
+    grader: constraint_env.server.constraint_env_environment:ConstraintEnvironment
 tags:
   - openenv
   - scheduling

server/constraint_env_environment.py CHANGED Viewed

@@ -59,6 +59,11 @@ _PENALTY_BAD_STRUCTURE = -0.250
 _PENALTY_INVALID_JSON  = -0.250
 # ---------------------------------------------------------------------------
 # Environment
 # ---------------------------------------------------------------------------
@@ -135,7 +140,7 @@ class ConstraintEnvironment(Environment):
             return ConstraintObservation(
                 prompt=self._current_sample["prompt"],
                 done=True,
-                reward=round(reward, 4),
                 info={**info, "error": "invalid_json"},
             )
@@ -157,7 +162,7 @@ class ConstraintEnvironment(Environment):
         return ConstraintObservation(
             prompt=self._current_sample["prompt"],
             done=True,
-            reward=round(reward, 4),
             info=info,
         )

 _PENALTY_INVALID_JSON  = -0.250
+def _clamp_reward(raw: float) -> float:
+    """Clamp reward to [0.01, 0.99] — autograder rejects exact 0.00 / 1.00."""
+    return round(max(0.01, min(0.99, float(raw))), 4)
 # ---------------------------------------------------------------------------
 # Environment
 # ---------------------------------------------------------------------------
             return ConstraintObservation(
                 prompt=self._current_sample["prompt"],
                 done=True,
+                reward=_clamp_reward(reward),
                 info={**info, "error": "invalid_json"},
             )
         return ConstraintObservation(
             prompt=self._current_sample["prompt"],
             done=True,
+            reward=_clamp_reward(reward),
             info=info,
         )