Spaces:

openenv-community
/

origami_env

Running

App Files Files Community

Update notebook for Colab: clone repo, explain local sim vs server

by sissississi - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-11

Files changed (2) hide show

training/Dockerfile.train +0 -7
training/train_origami.ipynb +4 -4

training/Dockerfile.train CHANGED Viewed

@@ -28,10 +28,3 @@ ENV NUM_GENERATIONS="4"
 ENV LR="2e-4"
 CMD ["sh", "-c", "python -m training.train_grpo --task $TASK --model $MODEL --max_steps $MAX_STEPS --num_generations $NUM_GENERATIONS --lr $LR"]
-="triangle"
-ENV MODEL="Qwen/Qwen2.5-3B-Instruct"
-ENV MAX_STEPS="600"
-ENV NUM_GENERATIONS="4"
-ENV LR="2e-4"
-CMD ["sh", "-c", "python -m training.train_grpo --task $TASK --model $MODEL --max_steps $MAX_STEPS --num_generations $NUM_GENERATIONS --lr $LR"]


28	ENV LR="2e-4"
29
30	CMD ["sh", "-c", "python -m training.train_grpo --task $TASK --model $MODEL --max_steps $MAX_STEPS --num_generations $NUM_GENERATIONS --lr $LR"]

training/train_origami.ipynb CHANGED Viewed

@@ -3,19 +3,19 @@
   {
    "cell_type": "markdown",
    "id": "p8uwc5bkc4n",
-   "source": "# Origami RL — GRPO Training Notebook\n\nTrain an LLM to generate valid FOLD-format crease patterns that fold into target shapes.\n\n**Pipeline:**\n1. LLM receives a prompt describing a target shape (e.g. \"fold diagonally into a triangle\")\n2. LLM generates a FOLD JSON crease pattern\n3. Physics simulator folds the paper analytically\n4. Reward = shape similarity (chamfer distance) to target × 20\n\n**Reward functions:**\n- `valid_fold`: +1.0 valid FOLD JSON, −0.5 parseable but invalid, −2.0 unparseable\n- `shape_match`: similarity × 20.0 (0–20), −1.0 sim fails, −2.0 invalid FOLD\n\n**Algorithm:** GRPO (Group Relative Policy Optimization) via TRL + Unsloth LoRA",
    "metadata": {}
   },
   {
    "cell_type": "markdown",
    "id": "xxp4krkl6w",
-   "source": "## 1. Install Dependencies",
    "metadata": {}
   },
   {
    "cell_type": "code",
    "id": "ulhu8a5p5ti",
-   "source": "# Run this cell once to install all dependencies\n# For Colab: unsloth has a specific install process\nimport sys\nIN_COLAB = \"google.colab\" in sys.modules\n\nif IN_COLAB:\n    # Unsloth's recommended Colab install\n    !pip install --no-deps \"unsloth[colab-new]\"\n    !pip install --no-deps trl datasets peft accelerate bitsandbytes xformers\nelse:\n    !pip install -q \"trl>=0.7\" \"datasets>=2.14\" unsloth torch transformers accelerate bitsandbytes\n\n# Core origami env deps (numpy, scipy, pydantic)\n!pip install -q numpy scipy pydantic",
    "metadata": {},
    "execution_count": null,
    "outputs": []
@@ -29,7 +29,7 @@
   {
    "cell_type": "code",
    "id": "3hr273dhqiv",
-   "source": "import os\nimport sys\nimport json\n\n# Add the repo root to Python path so origami_server and training modules are importable\nREPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(\"__file__\"), \"..\"))\nif REPO_ROOT not in sys.path:\n    sys.path.insert(0, REPO_ROOT)\n\nprint(f\"Repo root: {REPO_ROOT}\")\nprint(f\"Python: {sys.version}\")",
    "metadata": {},
    "execution_count": null,
    "outputs": []

   {
    "cell_type": "markdown",
    "id": "p8uwc5bkc4n",
+   "source": "# Origami RL — GRPO Training Notebook\n\nTrain an LLM to generate valid FOLD-format crease patterns that fold into target shapes.\n\n**Pipeline:**\n1. LLM receives a prompt describing a target shape (e.g. \"fold diagonally into a triangle\")\n2. LLM generates a FOLD JSON crease pattern\n3. Physics simulator folds the paper analytically (runs locally — no server needed)\n4. Reward = shape similarity (chamfer distance) to target x 20\n\n**Why local simulation instead of the OpenEnv server?**\n> The reward function is called thousands of times during GRPO training (every generation, every step). HTTP roundtrips to a remote server would be way too slow. The simulation is pure numpy/scipy and runs in milliseconds locally. The OpenEnv client/server pattern is for inference and evaluation, not training loops.\n\n**Reward functions:**\n- `valid_fold`: +1.0 valid FOLD JSON, -0.5 parseable but invalid, -2.0 unparseable\n- `shape_match`: similarity x 20.0 (0-20), -1.0 sim fails, -2.0 invalid FOLD\n\n**Algorithm:** GRPO (Group Relative Policy Optimization) via TRL + Unsloth LoRA",
    "metadata": {}
   },
   {
    "cell_type": "markdown",
    "id": "xxp4krkl6w",
+   "source": "## 1. Clone Repo & Install Dependencies",
    "metadata": {}
   },
   {
    "cell_type": "code",
    "id": "ulhu8a5p5ti",
+   "source": "import os, sys\n\n# Clone the origami_env repo (contains the simulator + reward functions)\nif not os.path.exists(\"origami_env\"):\n    !git clone https://huggingface.co/spaces/openenv-community/origami_env\n    print(\"Repo cloned.\")\nelse:\n    print(\"Repo already exists.\")\n\n# Install training deps\nIN_COLAB = \"google.colab\" in sys.modules\nif IN_COLAB:\n    !pip install -q \"unsloth[colab-new]\"\n    !pip install -q trl datasets peft accelerate bitsandbytes xformers\nelse:\n    !pip install -q \"trl>=0.7\" \"datasets>=2.14\" torch transformers accelerate bitsandbytes peft\n\n# Simulation deps (lightweight)\n!pip install -q numpy scipy pydantic",
    "metadata": {},
    "execution_count": null,
    "outputs": []
   {
    "cell_type": "code",
    "id": "3hr273dhqiv",
+   "source": "import os\nimport sys\nimport json\n\n# Add the cloned repo to Python path\nREPO_ROOT = os.path.abspath(\"origami_env\")\nif REPO_ROOT not in sys.path:\n    sys.path.insert(0, REPO_ROOT)\n\nprint(f\"Repo root: {REPO_ROOT}\")\nprint(f\"Python: {sys.version}\")",
    "metadata": {},
    "execution_count": null,
    "outputs": []