Spaces:

XcodeAddy
/

sentinel-env

Running

App Files Files Community

XcodeAddy commited on 20 days ago

Commit

db820a9

1 Parent(s): ed29027

Fix HF Jobs GRPO runtime stack

Browse files

Files changed (6) hide show

README.md +6 -1
docs/TRAINING_RUNBOOK.md +52 -3
pyproject.toml +6 -4
requirements-train.txt +10 -10
training/colab_notebook.ipynb +1 -1
training/launch_hf_job.py +26 -4

README.md CHANGED Viewed

@@ -79,9 +79,14 @@ Deployment contract: run one server worker for the submitted Space. Active `Sent
 ## Live Submission Targets
 - GitHub: `https://github.com/ADITYAGABA1322/sentinel-env`
-- Hugging Face Space: `https://xcodeaddy-sentinel-env.hf.space`
 - OpenEnv base URL: `https://xcodeaddy-sentinel-env.hf.space`
 ## Specialist Behaviors
 | Public Slot | Hidden Behavior |

 ## Live Submission Targets
 - GitHub: `https://github.com/ADITYAGABA1322/sentinel-env`
+- Hugging Face Space repo/settings: `https://huggingface.co/spaces/XcodeAddy/sentinel-env`
+- Hugging Face live app: `https://xcodeaddy-sentinel-env.hf.space`
 - OpenEnv base URL: `https://xcodeaddy-sentinel-env.hf.space`
+Local note: run uvicorn with `--host 0.0.0.0`, but open the app in a browser at
+`http://127.0.0.1:7860/` or `http://localhost:7860/`. `0.0.0.0` is a bind
+address, not the page URL to demo.
 ## Specialist Behaviors
 | Public Slot | Hidden Behavior |

docs/TRAINING_RUNBOOK.md CHANGED Viewed

@@ -148,6 +148,29 @@ Use a Hugging Face token in Colab for:
 The Space itself does not need GPU to run the replay demo.
 ## Hugging Face Credits
 Best use:
@@ -155,10 +178,36 @@ Best use:
 - keep the Space on CPU for normal judging,
 - optionally upgrade the Space to T4 only during the final live demo if the UI
   needs extra responsiveness,
-- avoid doing full training inside the Space.
-Training belongs in Colab. The Space is for serving the environment and replay
-demo.
 ## Success Criteria

 The Space itself does not need GPU to run the replay demo.
+## Hugging Face App URLs
+Use these two Hugging Face URLs for different jobs:
+```text
+https://huggingface.co/spaces/XcodeAddy/sentinel-env
+```
+This is the Space repository/settings page. Use it to inspect files, Settings,
+hardware, build logs, variables, secrets, and commits. It is not the iframe app
+URL you demo to judges.
+```text
+https://xcodeaddy-sentinel-env.hf.space/
+```
+This is the real live app URL. Use this for the dashboard, API smoke tests, and
+OpenEnv base URL.
+When running locally, start uvicorn with `--host 0.0.0.0`, but open the browser
+at `http://127.0.0.1:7860/` or `http://localhost:7860/`. Do not browse to
+`http://0.0.0.0:7860/`; `0.0.0.0` is only a bind address.
 ## Hugging Face Credits
 Best use:
 - keep the Space on CPU for normal judging,
 - optionally upgrade the Space to T4 only during the final live demo if the UI
   needs extra responsiveness,
+- avoid doing full training inside the Space,
+- use Hugging Face Jobs or Colab for the actual GRPO run.
+The Space is for serving the environment and replay demo. Training belongs in
+Colab or in a Hugging Face GPU Job.
+HF Jobs smoke path:
+```bash
+.venv/bin/python training/launch_hf_job.py \
+  --mode import-smoke \
+  --timeout 45m
+.venv/bin/python training/launch_hf_job.py \
+  --mode train-smoke \
+  --episodes 50 \
+  --timeout 2h
+```
+If `import-smoke` passes, run the full job:
+```bash
+.venv/bin/python training/launch_hf_job.py \
+  --mode train-full \
+  --episodes 200 \
+  --timeout 4h
+```
+The launcher uses `pytorch/pytorch:2.11.0-cuda12.8-cudnn9-devel` because the
+current Unsloth stack pulls `torchao`, which expects torch `>=2.11`.
 ## Success Criteria

pyproject.toml CHANGED Viewed

@@ -18,10 +18,12 @@ server = "server.app:main"
 [project.optional-dependencies]
 dev = ["pytest>=8.0.0"]
 training = [
-  "trl",
-  "transformers",
-  "datasets",
-  "accelerate",
   "unsloth",
 ]

 [project.optional-dependencies]
 dev = ["pytest>=8.0.0"]
 training = [
+  "trl==0.24.0",
+  "transformers==4.57.6",
+  "datasets==4.3.0",
+  "accelerate==1.13.0",
+  "peft==0.19.1",
+  "bitsandbytes==0.49.2",
   "unsloth",
 ]

requirements-train.txt CHANGED Viewed

@@ -1,11 +1,11 @@
 unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git
-trl>=0.18.2,<0.25,!=0.19.0
-transformers>=4.56,<5
-datasets>=3.0,<5
-accelerate>=1.4
-peft>=0.14
-bitsandbytes>=0.45
-matplotlib
-seaborn
-pandas
-huggingface_hub

 unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git
+trl==0.24.0
+transformers==4.57.6
+datasets==4.3.0
+accelerate==1.13.0
+peft==0.19.1
+bitsandbytes==0.49.2
+matplotlib==3.10.9
+seaborn==0.13.2
+pandas==3.0.2
+huggingface_hub>=0.36,<1

training/colab_notebook.ipynb CHANGED Viewed

@@ -74,7 +74,7 @@
     "        \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\",\n",
     "    ])\n",
     "    subprocess.check_call([\"pip\", \"install\", \"-q\", \"--no-deps\",\n",
-    "        \"trl<0.13\", \"transformers>=4.46\", \"datasets\", \"accelerate\", \"peft\", \"bitsandbytes\",\n",
     "    ])\n",
     "except subprocess.CalledProcessError as exc:\n",
     "    print(f\"Training extras failed to install ({exc}); continuing with heuristic-fallback path.\")\n",

     "        \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\",\n",
     "    ])\n",
     "    subprocess.check_call([\"pip\", \"install\", \"-q\", \"--no-deps\",\n",
+    "        \"trl==0.24.0\", \"transformers==4.57.6\", \"datasets==4.3.0\", \"accelerate==1.13.0\", \"peft==0.19.1\", \"bitsandbytes==0.49.2\",\n",
     "    ])\n",
     "except subprocess.CalledProcessError as exc:\n",
     "    print(f\"Training extras failed to install ({exc}); continuing with heuristic-fallback path.\")\n",

training/launch_hf_job.py CHANGED Viewed

@@ -9,7 +9,9 @@ from textwrap import dedent
 from huggingface_hub import run_job
-DEFAULT_IMAGE = "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel"
 DEFAULT_REPO = "https://github.com/ADITYAGABA1322/sentinel-env"
 DEFAULT_MODEL = "unsloth/Qwen2.5-0.5B-Instruct"
@@ -27,6 +29,14 @@ def bootstrap_repo(repo_url: str) -> list[str]:
         "python -m pip install --upgrade pip",
         "pip install -r requirements.txt",
         "pip install -r requirements-train.txt",
     ]
@@ -34,8 +44,11 @@ def gpu_test_command() -> str:
     return "python -c 'import torch; print(torch.cuda.get_device_name())'"
-def train_command(args: argparse.Namespace) -> str:
     lines = bootstrap_repo(args.repo_url)
     lines.append(
         " ".join(
             [
@@ -93,7 +106,11 @@ def parse_args() -> argparse.Namespace:
     parser = argparse.ArgumentParser(
         description="Launch SENTINEL training on Hugging Face Jobs without shell quoting pain."
     )
-    parser.add_argument("--mode", choices=["gpu-test", "train-smoke", "train-full"], default="gpu-test")
     parser.add_argument("--namespace", default=os.environ.get("HF_NAMESPACE", "XcodeAddy"))
     parser.add_argument("--flavor", default="a10g-small")
     parser.add_argument("--timeout", default="2h")
@@ -130,7 +147,12 @@ def main() -> None:
             ).strip()
         )
-    command = gpu_test_command() if args.mode == "gpu-test" else train_command(args)
     print("Launching HF Job:")
     print(f"  mode      = {args.mode}")
     print(f"  namespace = {args.namespace}")

 from huggingface_hub import run_job
+# Current Unsloth pulls torchao, which expects torch >= 2.11. Keep the Jobs
+# image aligned so GRPO imports fail fast only for real code issues.
+DEFAULT_IMAGE = "pytorch/pytorch:2.11.0-cuda12.8-cudnn9-devel"
 DEFAULT_REPO = "https://github.com/ADITYAGABA1322/sentinel-env"
 DEFAULT_MODEL = "unsloth/Qwen2.5-0.5B-Instruct"
         "python -m pip install --upgrade pip",
         "pip install -r requirements.txt",
         "pip install -r requirements-train.txt",
+        (
+            "python -c \"import torch; "
+            "print('torch', torch.__version__); "
+            "print('gpu', torch.cuda.get_device_name() if torch.cuda.is_available() else 'none'); "
+            "from transformers import PreTrainedModel; "
+            "from trl import GRPOConfig, GRPOTrainer; "
+            "print('training imports ok')\""
+        ),
     ]
     return "python -c 'import torch; print(torch.cuda.get_device_name())'"
+def train_command(args: argparse.Namespace, train: bool = True) -> str:
     lines = bootstrap_repo(args.repo_url)
+    if not train:
+        return shell_join(lines)
     lines.append(
         " ".join(
             [
     parser = argparse.ArgumentParser(
         description="Launch SENTINEL training on Hugging Face Jobs without shell quoting pain."
     )
+    parser.add_argument(
+        "--mode",
+        choices=["gpu-test", "import-smoke", "train-smoke", "train-full"],
+        default="gpu-test",
+    )
     parser.add_argument("--namespace", default=os.environ.get("HF_NAMESPACE", "XcodeAddy"))
     parser.add_argument("--flavor", default="a10g-small")
     parser.add_argument("--timeout", default="2h")
             ).strip()
         )
+    if args.mode == "gpu-test":
+        command = gpu_test_command()
+    elif args.mode == "import-smoke":
+        command = train_command(args, train=False)
+    else:
+        command = train_command(args)
     print("Launching HF Job:")
     print(f"  mode      = {args.mode}")
     print(f"  namespace = {args.namespace}")