Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files- Dockerfile +5 -0
- README.md +35 -1
- inference.py +2 -1
- openenv.yaml +3 -0
- server/constraint_env_environment.py +7 -2
Dockerfile
CHANGED
|
@@ -65,12 +65,17 @@ COPY --from=builder /app/env/.venv /app/.venv
|
|
| 65 |
# Copy the environment code
|
| 66 |
COPY --from=builder /app/env /app/env
|
| 67 |
|
|
|
|
|
|
|
|
|
|
| 68 |
# Set PATH to use the virtual environment
|
| 69 |
ENV PATH="/app/.venv/bin:$PATH"
|
| 70 |
|
| 71 |
# Set PYTHONPATH so imports work correctly
|
| 72 |
ENV PYTHONPATH="/app/env:$PYTHONPATH"
|
| 73 |
|
|
|
|
|
|
|
| 74 |
ENV ENABLE_WEB_INTERFACE='true'
|
| 75 |
|
| 76 |
# HF Spaces uses port 7860 by default; override with PORT env var for local use
|
|
|
|
| 65 |
# Copy the environment code
|
| 66 |
COPY --from=builder /app/env /app/env
|
| 67 |
|
| 68 |
+
# Copy README to /app/README.md where the OpenEnv web UI looks for it
|
| 69 |
+
COPY --from=builder /app/env/README.md /app/README.md
|
| 70 |
+
|
| 71 |
# Set PATH to use the virtual environment
|
| 72 |
ENV PATH="/app/.venv/bin:$PATH"
|
| 73 |
|
| 74 |
# Set PYTHONPATH so imports work correctly
|
| 75 |
ENV PYTHONPATH="/app/env:$PYTHONPATH"
|
| 76 |
|
| 77 |
+
# Tell the web UI where to find the README (belt-and-suspenders)
|
| 78 |
+
ENV ENV_README_PATH="/app/README.md"
|
| 79 |
ENV ENABLE_WEB_INTERFACE='true'
|
| 80 |
|
| 81 |
# HF Spaces uses port 7860 by default; override with PORT env var for local use
|
README.md
CHANGED
|
@@ -12,7 +12,6 @@ base_path: /web
|
|
| 12 |
|
| 13 |
# Constraint Environment
|
| 14 |
|
| 15 |
-
|
| 16 |
This is the environment for training LLMs to learn a specific DSL made for time table scheduling. Model can then directly output constraints from natural language. Why this is needed? Usually time table generation is an NP hard problem, for humans it could take weeks to generate a conflict free tiem table. To solve this problem, tools are created to generate them in reasonable time. One example of those tools is CP SAT. Users can write the hardcoded constraints and the solver will generate a time table based on those constraints. Well, what happens when you want to add new constraints? Yes, you have directly change the code. What if there is a way to directly define constraints in natural language and the solver understands that automatically ? That's what we have tried to do with this project. LLM might not be good at scheduling time tables which have dozens of constraints but what it is good at is understanding natural language. For the specific purpose of defining constraints for university time tables a DSL was created whose specification is as follows:
|
| 17 |
|
| 18 |
```
|
|
@@ -67,6 +66,41 @@ number ::= digit { digit }
|
|
| 67 |
|
| 68 |
The model outputs a json which follows the above format which can directly be converted into CP-SAT constraints.
|
| 69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
## Development & Testing
|
| 71 |
|
| 72 |
### Direct Environment Testing
|
|
|
|
| 12 |
|
| 13 |
# Constraint Environment
|
| 14 |
|
|
|
|
| 15 |
This is the environment for training LLMs to learn a specific DSL made for time table scheduling. Model can then directly output constraints from natural language. Why this is needed? Usually time table generation is an NP hard problem, for humans it could take weeks to generate a conflict free tiem table. To solve this problem, tools are created to generate them in reasonable time. One example of those tools is CP SAT. Users can write the hardcoded constraints and the solver will generate a time table based on those constraints. Well, what happens when you want to add new constraints? Yes, you have directly change the code. What if there is a way to directly define constraints in natural language and the solver understands that automatically ? That's what we have tried to do with this project. LLM might not be good at scheduling time tables which have dozens of constraints but what it is good at is understanding natural language. For the specific purpose of defining constraints for university time tables a DSL was created whose specification is as follows:
|
| 16 |
|
| 17 |
```
|
|
|
|
| 66 |
|
| 67 |
The model outputs a json which follows the above format which can directly be converted into CP-SAT constraints.
|
| 68 |
|
| 69 |
+
## Action and Observation
|
| 70 |
+
|
| 71 |
+
The dataset for the training is in this format:
|
| 72 |
+
|
| 73 |
+
```python
|
| 74 |
+
{
|
| 75 |
+
"prompt": (
|
| 76 |
+
"No classes should be scheduled on Saturday."
|
| 77 |
+
),
|
| 78 |
+
"target_ast": {
|
| 79 |
+
"type": "hard",
|
| 80 |
+
"name": "no_saturday_classes",
|
| 81 |
+
"forall": [
|
| 82 |
+
{"var": "b", "domain": "branches"},
|
| 83 |
+
{"var": "sub", "domain": "subjects"},
|
| 84 |
+
{"var": "d", "domain": "days"},
|
| 85 |
+
{"var": "s", "domain": "slots"},
|
| 86 |
+
],
|
| 87 |
+
"where": "d == 5",
|
| 88 |
+
"assert": "schedule(b, sub, d, s) == 0",
|
| 89 |
+
},
|
| 90 |
+
}
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
The model will get the prompt from the environment and will guess the target_ast. It will get the rewards based on how good the guess is.
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
| Output | Reward |
|
| 98 |
+
| ------------------- | -------- |
|
| 99 |
+
| Valid Json | 0.125 |
|
| 100 |
+
| Correct Structure | 0.250 |
|
| 101 |
+
| Same as the target | 0.625 |
|
| 102 |
+
|
| 103 |
+
|
| 104 |
## Development & Testing
|
| 105 |
|
| 106 |
### Direct Environment Testing
|
inference.py
CHANGED
|
@@ -210,7 +210,8 @@ def _run_task(task_id: str, env_url: str = "http://localhost:8000") -> None:
|
|
| 210 |
step_result = await env.step(action)
|
| 211 |
|
| 212 |
step_count = 1
|
| 213 |
-
|
|
|
|
| 214 |
done = bool(step_result.done)
|
| 215 |
last_error = step_result.observation.info.get("error") if step_result.observation else None
|
| 216 |
rewards.append(reward)
|
|
|
|
| 210 |
step_result = await env.step(action)
|
| 211 |
|
| 212 |
step_count = 1
|
| 213 |
+
raw_reward = float(step_result.reward or 0.0)
|
| 214 |
+
reward = max(0.01, min(0.99, raw_reward)) # clamp [0.01, 0.99]
|
| 215 |
done = bool(step_result.done)
|
| 216 |
last_error = step_result.observation.info.get("error") if step_result.observation else None
|
| 217 |
rewards.append(reward)
|
openenv.yaml
CHANGED
|
@@ -13,12 +13,15 @@ tasks:
|
|
| 13 |
- id: easy
|
| 14 |
description: Single quantifier, direct assert, no WHERE clause
|
| 15 |
difficulty: easy
|
|
|
|
| 16 |
- id: medium
|
| 17 |
description: Two quantifiers with a WHERE filter clause and combined assert
|
| 18 |
difficulty: medium
|
|
|
|
| 19 |
- id: hard
|
| 20 |
description: Multiple quantifiers, nested WHERE with AND/OR, minimize objective
|
| 21 |
difficulty: hard
|
|
|
|
| 22 |
tags:
|
| 23 |
- openenv
|
| 24 |
- scheduling
|
|
|
|
| 13 |
- id: easy
|
| 14 |
description: Single quantifier, direct assert, no WHERE clause
|
| 15 |
difficulty: easy
|
| 16 |
+
grader: constraint_env.server.constraint_env_environment:ConstraintEnvironment
|
| 17 |
- id: medium
|
| 18 |
description: Two quantifiers with a WHERE filter clause and combined assert
|
| 19 |
difficulty: medium
|
| 20 |
+
grader: constraint_env.server.constraint_env_environment:ConstraintEnvironment
|
| 21 |
- id: hard
|
| 22 |
description: Multiple quantifiers, nested WHERE with AND/OR, minimize objective
|
| 23 |
difficulty: hard
|
| 24 |
+
grader: constraint_env.server.constraint_env_environment:ConstraintEnvironment
|
| 25 |
tags:
|
| 26 |
- openenv
|
| 27 |
- scheduling
|
server/constraint_env_environment.py
CHANGED
|
@@ -59,6 +59,11 @@ _PENALTY_BAD_STRUCTURE = -0.250
|
|
| 59 |
_PENALTY_INVALID_JSON = -0.250
|
| 60 |
|
| 61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
# ---------------------------------------------------------------------------
|
| 63 |
# Environment
|
| 64 |
# ---------------------------------------------------------------------------
|
|
@@ -135,7 +140,7 @@ class ConstraintEnvironment(Environment):
|
|
| 135 |
return ConstraintObservation(
|
| 136 |
prompt=self._current_sample["prompt"],
|
| 137 |
done=True,
|
| 138 |
-
reward=
|
| 139 |
info={**info, "error": "invalid_json"},
|
| 140 |
)
|
| 141 |
|
|
@@ -157,7 +162,7 @@ class ConstraintEnvironment(Environment):
|
|
| 157 |
return ConstraintObservation(
|
| 158 |
prompt=self._current_sample["prompt"],
|
| 159 |
done=True,
|
| 160 |
-
reward=
|
| 161 |
info=info,
|
| 162 |
)
|
| 163 |
|
|
|
|
| 59 |
_PENALTY_INVALID_JSON = -0.250
|
| 60 |
|
| 61 |
|
| 62 |
+
def _clamp_reward(raw: float) -> float:
|
| 63 |
+
"""Clamp reward to [0.01, 0.99] — autograder rejects exact 0.00 / 1.00."""
|
| 64 |
+
return round(max(0.01, min(0.99, float(raw))), 4)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
# ---------------------------------------------------------------------------
|
| 68 |
# Environment
|
| 69 |
# ---------------------------------------------------------------------------
|
|
|
|
| 140 |
return ConstraintObservation(
|
| 141 |
prompt=self._current_sample["prompt"],
|
| 142 |
done=True,
|
| 143 |
+
reward=_clamp_reward(reward),
|
| 144 |
info={**info, "error": "invalid_json"},
|
| 145 |
)
|
| 146 |
|
|
|
|
| 162 |
return ConstraintObservation(
|
| 163 |
prompt=self._current_sample["prompt"],
|
| 164 |
done=True,
|
| 165 |
+
reward=_clamp_reward(reward),
|
| 166 |
info=info,
|
| 167 |
)
|
| 168 |
|