Instructions to use garvitsachdeva/spindleflow-rl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use garvitsachdeva/spindleflow-rl with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="garvitsachdeva/spindleflow-rl", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
| [11:05:13] OpenAI key found β finetuner + spawn self-learning enabled. | |
| [11:05:13] Model will be pushed to: https://huggingface.co/garvitsachdeva/spindleflow-rl | |
| [11:05:13] Working directory: /home/user/app | |
| [11:05:13] Patching SentenceTransformer to CUDA... | |
| [11:05:15] WARNING: CUDA not available for SentenceTransformer β CPU mode (slow) | |
| [11:05:15] Loading environment... | |
| [11:05:20] TieredRewardScorer β Tier-1 only (LLM judge disabled for speed) β | |
| [11:05:20] Generalist baseline β static simulation (0 API calls per episode) β | |
| [11:05:20] Running smoke test... | |
| [11:05:22] Smoke test OK β obs shape (5490,) | |
| [11:05:22] Benchmarking SentenceTransformer encode speed... | |
| [11:05:22] Encode speed : 10.2 ms/call [CUDA β fast] | |
| [11:05:22] Benchmarking full env.step() speed... | |
| [11:05:24] Step speed : 67.4 ms/step [fast β] | |
| [11:05:24] Projected 100k steps: 112 min | |
| [11:05:25] Training on : cpu | |
| [11:05:25] Curriculum : Phase 1 β Phase 1/3 | Rolling mean: 0.000 / 0.6 | Episodes in phase: 0 | |
| [11:05:25] Total steps : 30,000 | |
| [11:05:25] Training started... | |
| [11:05:32] Ep 25 | reward +0.736 | Phase 1/3 | Rolling mean: -0.185 / 0.6 | Episodes in phase: 25 | |
| [11:05:37] Ep 50 | reward +1.097 | Phase 1/3 | Rolling mean: -0.511 / 0.6 | Episodes in phase: 50 | |
| [11:05:43] Ep 75 | reward -1.508 | Phase 1/3 | Rolling mean: -0.597 / 0.6 | Episodes in phase: 75 | |
| [11:05:48] Ep 100 | reward -0.701 | Phase 1/3 | Rolling mean: -0.595 / 0.6 | Episodes in phase: 100 | |
| [11:06:10] Ep 125 | reward +0.805 | Phase 1/3 | Rolling mean: -0.463 / 0.6 | Episodes in phase: 125 | |
| [11:06:16] Ep 150 | reward -1.505 | Phase 1/3 | Rolling mean: -0.476 / 0.6 | Episodes in phase: 150 | |
| [11:07:36] Ep 175 | reward -1.475 | Phase 1/3 | Rolling mean: -0.522 / 0.6 | Episodes in phase: 175 | |
| [11:07:42] Ep 200 | reward -1.879 | Phase 1/3 | Rolling mean: -0.517 / 0.6 | Episodes in phase: 200 | |
| [11:08:06] Ep 225 | reward -1.314 | Phase 1/3 | Rolling mean: -0.555 / 0.6 | Episodes in phase: 225 | |
| [11:08:13] Ep 250 | reward -1.912 | Phase 1/3 | Rolling mean: -0.541 / 0.6 | Episodes in phase: 250 | |
| [11:08:21] Ep 275 | reward -0.073 | Phase 1/3 | Rolling mean: -0.492 / 0.6 | Episodes in phase: 275 | |
| [11:08:27] Ep 300 | reward +0.842 | Phase 1/3 | Rolling mean: -0.483 / 0.6 | Episodes in phase: 300 | |
| [11:09:50] Ep 325 | reward -1.541 | Phase 1/3 | Rolling mean: -0.526 / 0.6 | Episodes in phase: 325 | |
| [11:09:56] Ep 350 | reward -1.545 | Phase 1/3 | Rolling mean: -0.520 / 0.6 | Episodes in phase: 350 | |
| [11:10:03] Ep 375 | reward -1.545 | Phase 1/3 | Rolling mean: -0.498 / 0.6 | Episodes in phase: 375 | |
| [11:10:09] Ep 400 | reward -1.542 | Phase 1/3 | Rolling mean: -0.427 / 0.6 | Episodes in phase: 400 | |
| [11:10:32] Ep 425 | reward +2.306 | Phase 1/3 | Rolling mean: -0.443 / 0.6 | Episodes in phase: 425 | |
| [11:10:39] Ep 450 | reward -1.521 | Phase 1/3 | Rolling mean: -0.405 / 0.6 | Episodes in phase: 450 | |
| [11:11:56] Ep 475 | reward +1.361 | Phase 1/3 | Rolling mean: -0.392 / 0.6 | Episodes in phase: 475 | |
| [11:12:03] Ep 500 | reward +1.262 | Phase 1/3 | Rolling mean: -0.330 / 0.6 | Episodes in phase: 500 | |
| [11:12:25] Ep 525 | reward -1.370 | Phase 1/3 | Rolling mean: -0.343 / 0.6 | Episodes in phase: 525 | |
| [11:12:32] Ep 550 | reward -0.791 | Phase 1/3 | Rolling mean: -0.304 / 0.6 | Episodes in phase: 550 | |
| [11:12:39] Ep 575 | reward -2.053 | Phase 1/3 | Rolling mean: -0.261 / 0.6 | Episodes in phase: 575 | |
| [11:12:46] Ep 600 | reward -1.163 | Phase 1/3 | Rolling mean: -0.303 / 0.6 | Episodes in phase: 600 | |
| [11:14:12] Ep 625 | reward +0.563 | Phase 1/3 | Rolling mean: -0.280 / 0.6 | Episodes in phase: 625 | |
| [11:14:19] Ep 650 | reward -1.620 | Phase 1/3 | Rolling mean: -0.335 / 0.6 | Episodes in phase: 650 | |
| [11:14:27] Ep 675 | reward +0.994 | Phase 1/3 | Rolling mean: -0.274 / 0.6 | Episodes in phase: 675 | |
| [11:14:34] Ep 700 | reward -0.728 | Phase 1/3 | Rolling mean: -0.293 / 0.6 | Episodes in phase: 700 | |
| [11:14:55] Ep 725 | reward -0.023 | Phase 1/3 | Rolling mean: -0.249 / 0.6 | Episodes in phase: 725 | |
| [11:15:02] Ep 750 | reward +2.148 | Phase 1/3 | Rolling mean: -0.249 / 0.6 | Episodes in phase: 750 | |
| [11:16:20] Ep 775 | reward -1.583 | Phase 1/3 | Rolling mean: -0.238 / 0.6 | Episodes in phase: 775 | |
| [11:16:28] Ep 800 | reward -0.096 | Phase 1/3 | Rolling mean: -0.213 / 0.6 | Episodes in phase: 800 | |
| [11:16:50] Ep 825 | reward -0.967 | Phase 1/3 | Rolling mean: -0.204 / 0.6 | Episodes in phase: 825 | |
| [11:16:56] Ep 850 | reward +0.192 | Phase 1/3 | Rolling mean: -0.094 / 0.6 | Episodes in phase: 850 | |
| [11:17:04] Ep 875 | reward -1.050 | Phase 1/3 | Rolling mean: -0.160 / 0.6 | Episodes in phase: 875 | |
| [11:18:25] Ep 900 | reward +3.062 | Phase 1/3 | Rolling mean: -0.128 / 0.6 | Episodes in phase: 900 | |
| [11:18:47] Ep 925 | reward +1.293 | Phase 1/3 | Rolling mean: -0.108 / 0.6 | Episodes in phase: 925 | |
| [11:18:54] Ep 950 | reward +0.422 | Phase 1/3 | Rolling mean: -0.084 / 0.6 | Episodes in phase: 950 | |
| [11:19:01] Ep 975 | reward +1.644 | Phase 1/3 | Rolling mean: -0.055 / 0.6 | Episodes in phase: 975 | |
| [11:19:11] Ep 1000 | reward +1.232 | Phase 1/3 | Rolling mean: -0.082 / 0.6 | Episodes in phase: 1000 | |
| [11:19:32] Ep 1025 | reward -0.330 | Phase 1/3 | Rolling mean: -0.049 / 0.6 | Episodes in phase: 1025 | |
| [11:20:58] Ep 1050 | reward +0.055 | Phase 1/3 | Rolling mean: -0.055 / 0.6 | Episodes in phase: 1050 | |
| [11:21:05] Ep 1075 | reward -0.634 | Phase 1/3 | Rolling mean: -0.011 / 0.6 | Episodes in phase: 1075 | |
| [11:21:11] Ep 1100 | reward -1.339 | Phase 1/3 | Rolling mean: 0.009 / 0.6 | Episodes in phase: 1100 | |
| [11:21:35] Ep 1125 | reward +0.593 | Phase 1/3 | Rolling mean: -0.020 / 0.6 | Episodes in phase: 1125 | |
| [11:21:43] Ep 1150 | reward -1.622 | Phase 1/3 | Rolling mean: -0.062 / 0.6 | Episodes in phase: 1150 | |
| [11:21:50] Ep 1175 | reward -0.510 | Phase 1/3 | Rolling mean: -0.104 / 0.6 | Episodes in phase: 1175 | |
| [11:23:03] Ep 1200 | reward -2.411 | Phase 1/3 | Rolling mean: -0.054 / 0.6 | Episodes in phase: 1200 | |
| [11:23:24] Ep 1225 | reward +1.149 | Phase 1/3 | Rolling mean: 0.051 / 0.6 | Episodes in phase: 1225 | |
| [11:23:30] Ep 1250 | reward +0.250 | Phase 1/3 | Rolling mean: 0.083 / 0.6 | Episodes in phase: 1250 | |
| [11:23:37] Ep 1275 | reward +0.860 | Phase 1/3 | Rolling mean: 0.109 / 0.6 | Episodes in phase: 1275 | |
| [11:23:43] Ep 1300 | reward +2.844 | Phase 1/3 | Rolling mean: 0.121 / 0.6 | Episodes in phase: 1300 | |
| [11:24:05] Ep 1325 | reward +0.399 | Phase 1/3 | Rolling mean: 0.196 / 0.6 | Episodes in phase: 1325 | |
| [11:24:10] Ep 1350 | reward -0.790 | Phase 1/3 | Rolling mean: 0.311 / 0.6 | Episodes in phase: 1350 | |
| [11:24:15] Ep 1375 | reward -1.641 | Phase 1/3 | Rolling mean: 0.390 / 0.6 | Episodes in phase: 1375 | |
| [11:25:23] Ep 1400 | reward +0.059 | Phase 1/3 | Rolling mean: 0.465 / 0.6 | Episodes in phase: 1400 | |
| [11:25:41] Ep 1425 | reward +0.962 | Phase 1/3 | Rolling mean: 0.427 / 0.6 | Episodes in phase: 1425 | |
| [11:25:48] Ep 1450 | reward +0.184 | Phase 1/3 | Rolling mean: 0.502 / 0.6 | Episodes in phase: 1450 | |
| [11:25:53] Ep 1475 | reward +2.529 | Phase 1/3 | Rolling mean: 0.597 / 0.6 | Episodes in phase: 1475 | |
| [11:25:54] Ep 1478 | reward +3.360 | Phase 2/3 | Rolling mean: 0.000 / 1.0 | Episodes in phase: 0 | |
| [11:25:59] Ep 1500 | reward -0.227 | Phase 2/3 | Rolling mean: 0.121 / 1.0 | Episodes in phase: 22 | |
| [11:26:17] Periodic save at step 5,000 ... | |
| [11:26:20] Periodic push done β 5 files at step 5,000 | |
| [11:26:24] Ep 1525 | reward +0.519 | Phase 2/3 | Rolling mean: 0.336 / 1.0 | Episodes in phase: 47 | |
| [11:26:31] Ep 1550 | reward -1.655 | Phase 2/3 | Rolling mean: 0.464 / 1.0 | Episodes in phase: 72 | |
| [11:27:44] Ep 1575 | reward +1.494 | Phase 2/3 | Rolling mean: 0.580 / 1.0 | Episodes in phase: 97 | |
| [11:27:51] Ep 1600 | reward +1.042 | Phase 2/3 | Rolling mean: 0.682 / 1.0 | Episodes in phase: 122 | |
| [11:28:15] Ep 1625 | reward +0.742 | Phase 2/3 | Rolling mean: 0.627 / 1.0 | Episodes in phase: 147 | |
| [11:28:21] Ep 1650 | reward +1.428 | Phase 2/3 | Rolling mean: 0.635 / 1.0 | Episodes in phase: 172 | |
| [11:28:28] Ep 1675 | reward +0.336 | Phase 2/3 | Rolling mean: 0.634 / 1.0 | Episodes in phase: 197 | |
| [11:28:33] Ep 1700 | reward +4.152 | Phase 2/3 | Rolling mean: 0.681 / 1.0 | Episodes in phase: 222 | |
| [11:28:50] Ep 1725 | reward +2.964 | Phase 2/3 | Rolling mean: 0.746 / 1.0 | Episodes in phase: 247 | |
| [11:30:12] Ep 1750 | reward -0.352 | Phase 2/3 | Rolling mean: 0.756 / 1.0 | Episodes in phase: 272 | |
| [11:30:18] Ep 1775 | reward +3.482 | Phase 2/3 | Rolling mean: 0.721 / 1.0 | Episodes in phase: 297 | |
| [11:30:26] Ep 1800 | reward -0.045 | Phase 2/3 | Rolling mean: 0.699 / 1.0 | Episodes in phase: 322 | |
| [11:30:50] Ep 1825 | reward +2.169 | Phase 2/3 | Rolling mean: 0.783 / 1.0 | Episodes in phase: 347 | |
| [11:30:55] Ep 1850 | reward +1.839 | Phase 2/3 | Rolling mean: 0.795 / 1.0 | Episodes in phase: 372 | |
| [11:31:01] Ep 1875 | reward +0.765 | Phase 2/3 | Rolling mean: 0.870 / 1.0 | Episodes in phase: 397 | |
| [11:31:06] Ep 1900 | reward +1.146 | Phase 2/3 | Rolling mean: 0.942 / 1.0 | Episodes in phase: 422 | |
| [11:32:32] Ep 1925 | reward +1.780 | Phase 2/3 | Rolling mean: 0.934 / 1.0 | Episodes in phase: 447 | |
| [11:32:38] Ep 1950 | reward +1.365 | Phase 2/3 | Rolling mean: 1.008 / 1.0 | Episodes in phase: 472 | |
| [11:32:45] Ep 1975 | reward -1.427 | Phase 2/3 | Rolling mean: 1.078 / 1.0 | Episodes in phase: 497 | |
| [11:32:45] Ep 1978 | reward +0.503 | Phase 3/3 | Rolling mean: 0.000 / β | Episodes in phase: 0 | |
| [11:32:49] Ep 2000 | reward +3.096 | Phase 3/3 | Rolling mean: 0.995 / β | Episodes in phase: 22 | |
| [11:33:12] Ep 2025 | reward +2.753 | Phase 3/3 | Rolling mean: 0.718 / β | Episodes in phase: 47 | |
| [11:33:18] Ep 2050 | reward +0.487 | Phase 3/3 | Rolling mean: 0.844 / β | Episodes in phase: 72 | |
| [11:33:23] Ep 2075 | reward +1.654 | Phase 3/3 | Rolling mean: 0.959 / β | Episodes in phase: 97 | |
| [11:33:29] Ep 2100 | reward +1.654 | Phase 3/3 | Rolling mean: 0.925 / β | Episodes in phase: 122 | |
| [11:34:46] Ep 2125 | reward +0.516 | Phase 3/3 | Rolling mean: 0.959 / β | Episodes in phase: 147 | |
| [11:34:52] Ep 2150 | reward +3.792 | Phase 3/3 | Rolling mean: 0.979 / β | Episodes in phase: 172 | |
| [11:34:57] Ep 2175 | reward +0.073 | Phase 3/3 | Rolling mean: 0.995 / β | Episodes in phase: 197 | |
| [11:35:02] Ep 2200 | reward +2.109 | Phase 3/3 | Rolling mean: 1.027 / β | Episodes in phase: 222 | |
| [11:35:22] Ep 2225 | reward +1.313 | Phase 3/3 | Rolling mean: 1.061 / β | Episodes in phase: 247 | |
| [11:35:27] Ep 2250 | reward +3.740 | Phase 3/3 | Rolling mean: 1.103 / β | Episodes in phase: 272 | |
| [11:35:31] Ep 2275 | reward +2.000 | Phase 3/3 | Rolling mean: 1.066 / β | Episodes in phase: 297 | |
| [11:35:37] Ep 2300 | reward +0.179 | Phase 3/3 | Rolling mean: 1.106 / β | Episodes in phase: 322 | |
| [11:36:53] Ep 2325 | reward +1.694 | Phase 3/3 | Rolling mean: 1.085 / β | Episodes in phase: 347 | |
| [11:36:59] Ep 2350 | reward -0.421 | Phase 3/3 | Rolling mean: 1.064 / β | Episodes in phase: 372 | |
| [11:37:03] Ep 2375 | reward +1.838 | Phase 3/3 | Rolling mean: 1.123 / β | Episodes in phase: 397 | |
| [11:37:08] Ep 2400 | reward -0.246 | Phase 3/3 | Rolling mean: 1.117 / β | Episodes in phase: 422 | |
| [11:37:26] Ep 2425 | reward +3.134 | Phase 3/3 | Rolling mean: 1.167 / β | Episodes in phase: 447 | |
| [11:37:31] Ep 2450 | reward -0.659 | Phase 3/3 | Rolling mean: 1.173 / β | Episodes in phase: 472 | |
| [11:37:36] Ep 2475 | reward +2.264 | Phase 3/3 | Rolling mean: 1.210 / β | Episodes in phase: 497 | |
| [11:37:40] Ep 2500 | reward +0.612 | Phase 3/3 | Rolling mean: 1.224 / β | Episodes in phase: 522 | |
| [11:37:59] Ep 2525 | reward +0.474 | Phase 3/3 | Rolling mean: 1.223 / β | Episodes in phase: 547 | |
| [11:38:36] Ep 2550 | reward -0.258 | Phase 3/3 | Rolling mean: 1.246 / β | Episodes in phase: 572 | |
| [11:38:40] Ep 2575 | reward +2.700 | Phase 3/3 | Rolling mean: 1.239 / β | Episodes in phase: 597 | |
| [11:38:45] Ep 2600 | reward +1.871 | Phase 3/3 | Rolling mean: 1.240 / β | Episodes in phase: 622 | |
| [11:39:09] Ep 2625 | reward +1.806 | Phase 3/3 | Rolling mean: 1.325 / β | Episodes in phase: 647 | |
| [11:39:13] Ep 2650 | reward +1.485 | Phase 3/3 | Rolling mean: 1.266 / β | Episodes in phase: 672 | |
| [11:39:17] Ep 2675 | reward +0.647 | Phase 3/3 | Rolling mean: 1.290 / β | Episodes in phase: 697 | |
| [11:39:21] Ep 2700 | reward -0.045 | Phase 3/3 | Rolling mean: 1.263 / β | Episodes in phase: 722 | |
| [11:39:41] Ep 2725 | reward -0.226 | Phase 3/3 | Rolling mean: 1.311 / β | Episodes in phase: 747 | |
| [11:39:44] Ep 2750 | reward +0.837 | Phase 3/3 | Rolling mean: 1.374 / β | Episodes in phase: 772 | |
| [11:39:49] Ep 2775 | reward +1.761 | Phase 3/3 | Rolling mean: 1.325 / β | Episodes in phase: 797 | |
| [11:40:32] Ep 2800 | reward +0.013 | Phase 3/3 | Rolling mean: 1.313 / β | Episodes in phase: 822 | |
| [11:40:51] Ep 2825 | reward +0.972 | Phase 3/3 | Rolling mean: 1.223 / β | Episodes in phase: 847 | |
| [11:40:56] Ep 2850 | reward +0.477 | Phase 3/3 | Rolling mean: 1.272 / β | Episodes in phase: 872 | |
| [11:41:00] Ep 2875 | reward +1.260 | Phase 3/3 | Rolling mean: 1.281 / β | Episodes in phase: 897 | |
| [11:41:04] Ep 2900 | reward +0.083 | Phase 3/3 | Rolling mean: 1.360 / β | Episodes in phase: 922 | |
| [11:41:23] Ep 2925 | reward +0.015 | Phase 3/3 | Rolling mean: 1.454 / β | Episodes in phase: 947 | |
| [11:41:29] Ep 2950 | reward +1.500 | Phase 3/3 | Rolling mean: 1.414 / β | Episodes in phase: 972 | |
| [11:41:33] Ep 2975 | reward +3.524 | Phase 3/3 | Rolling mean: 1.559 / β | Episodes in phase: 997 | |
| [11:41:38] Ep 3000 | reward +1.860 | Phase 3/3 | Rolling mean: 1.650 / β | Episodes in phase: 1022 | |
| [11:42:41] Ep 3025 | reward +1.604 | Phase 3/3 | Rolling mean: 1.644 / β | Episodes in phase: 1047 | |
| [11:42:46] Ep 3050 | reward +1.799 | Phase 3/3 | Rolling mean: 1.636 / β | Episodes in phase: 1072 | |
| [11:42:50] Ep 3075 | reward +3.127 | Phase 3/3 | Rolling mean: 1.658 / β | Episodes in phase: 1097 | |
| [11:42:54] Ep 3100 | reward +0.200 | Phase 3/3 | Rolling mean: 1.640 / β | Episodes in phase: 1122 | |
| [11:43:12] Ep 3125 | reward +0.282 | Phase 3/3 | Rolling mean: 1.549 / β | Episodes in phase: 1147 | |
| [11:43:16] Ep 3150 | reward +0.694 | Phase 3/3 | Rolling mean: 1.551 / β | Episodes in phase: 1172 | |
| [11:43:21] Ep 3175 | reward +1.753 | Phase 3/3 | Rolling mean: 1.503 / β | Episodes in phase: 1197 | |
| [11:43:24] Ep 3200 | reward +1.497 | Phase 3/3 | Rolling mean: 1.392 / β | Episodes in phase: 1222 | |
| [11:43:43] Ep 3225 | reward +0.607 | Phase 3/3 | Rolling mean: 1.443 / β | Episodes in phase: 1247 | |
| [11:44:25] Ep 3250 | reward +2.374 | Phase 3/3 | Rolling mean: 1.528 / β | Episodes in phase: 1272 | |
| [11:44:30] Ep 3275 | reward +1.900 | Phase 3/3 | Rolling mean: 1.483 / β | Episodes in phase: 1297 | |
| [11:44:33] Ep 3300 | reward +1.744 | Phase 3/3 | Rolling mean: 1.515 / β | Episodes in phase: 1322 | |
| [11:44:53] Ep 3325 | reward +1.466 | Phase 3/3 | Rolling mean: 1.504 / β | Episodes in phase: 1347 | |
| [11:44:57] Ep 3350 | reward +1.019 | Phase 3/3 | Rolling mean: 1.499 / β | Episodes in phase: 1372 | |
| [11:45:00] Ep 3375 | reward -0.277 | Phase 3/3 | Rolling mean: 1.479 / β | Episodes in phase: 1397 | |
| [11:45:05] Ep 3400 | reward +2.236 | Phase 3/3 | Rolling mean: 1.526 / β | Episodes in phase: 1422 | |
| [11:45:25] Ep 3425 | reward +0.735 | Phase 3/3 | Rolling mean: 1.572 / β | Episodes in phase: 1447 | |
| [11:45:28] Ep 3450 | reward +2.571 | Phase 3/3 | Rolling mean: 1.515 / β | Episodes in phase: 1472 | |
| [11:45:32] Ep 3475 | reward -2.395 | Phase 3/3 | Rolling mean: 1.490 / β | Episodes in phase: 1497 | |
| [11:46:29] Ep 3500 | reward +1.103 | Phase 3/3 | Rolling mean: 1.466 / β | Episodes in phase: 1522 | |
| [11:46:49] Ep 3525 | reward +1.210 | Phase 3/3 | Rolling mean: 1.537 / β | Episodes in phase: 1547 | |
| [11:46:53] Ep 3550 | reward +2.704 | Phase 3/3 | Rolling mean: 1.597 / β | Episodes in phase: 1572 | |
| [11:46:57] Ep 3575 | reward +3.341 | Phase 3/3 | Rolling mean: 1.570 / β | Episodes in phase: 1597 | |
| [11:46:59] Periodic save at step 10,000 ... | |