Instructions to use garvitsachdeva/spindleflow-rl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use garvitsachdeva/spindleflow-rl with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="garvitsachdeva/spindleflow-rl", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
File size: 15,932 Bytes
01cb591 df4c76f 01cb591 f71b6d0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | [11:05:13] OpenAI key found β finetuner + spawn self-learning enabled. [11:05:13] Model will be pushed to: https://huggingface.co/garvitsachdeva/spindleflow-rl [11:05:13] Working directory: /home/user/app [11:05:13] Patching SentenceTransformer to CUDA... [11:05:15] WARNING: CUDA not available for SentenceTransformer β CPU mode (slow) [11:05:15] Loading environment... [11:05:20] TieredRewardScorer β Tier-1 only (LLM judge disabled for speed) β [11:05:20] Generalist baseline β static simulation (0 API calls per episode) β [11:05:20] Running smoke test... [11:05:22] Smoke test OK β obs shape (5490,) [11:05:22] Benchmarking SentenceTransformer encode speed... [11:05:22] Encode speed : 10.2 ms/call [CUDA β fast] [11:05:22] Benchmarking full env.step() speed... [11:05:24] Step speed : 67.4 ms/step [fast β] [11:05:24] Projected 100k steps: 112 min [11:05:25] Training on : cpu [11:05:25] Curriculum : Phase 1 β Phase 1/3 | Rolling mean: 0.000 / 0.6 | Episodes in phase: 0 [11:05:25] Total steps : 30,000 [11:05:25] Training started... [11:05:32] Ep 25 | reward +0.736 | Phase 1/3 | Rolling mean: -0.185 / 0.6 | Episodes in phase: 25 [11:05:37] Ep 50 | reward +1.097 | Phase 1/3 | Rolling mean: -0.511 / 0.6 | Episodes in phase: 50 [11:05:43] Ep 75 | reward -1.508 | Phase 1/3 | Rolling mean: -0.597 / 0.6 | Episodes in phase: 75 [11:05:48] Ep 100 | reward -0.701 | Phase 1/3 | Rolling mean: -0.595 / 0.6 | Episodes in phase: 100 [11:06:10] Ep 125 | reward +0.805 | Phase 1/3 | Rolling mean: -0.463 / 0.6 | Episodes in phase: 125 [11:06:16] Ep 150 | reward -1.505 | Phase 1/3 | Rolling mean: -0.476 / 0.6 | Episodes in phase: 150 [11:07:36] Ep 175 | reward -1.475 | Phase 1/3 | Rolling mean: -0.522 / 0.6 | Episodes in phase: 175 [11:07:42] Ep 200 | reward -1.879 | Phase 1/3 | Rolling mean: -0.517 / 0.6 | Episodes in phase: 200 [11:08:06] Ep 225 | reward -1.314 | Phase 1/3 | Rolling mean: -0.555 / 0.6 | Episodes in phase: 225 [11:08:13] Ep 250 | reward -1.912 | Phase 1/3 | Rolling mean: -0.541 / 0.6 | Episodes in phase: 250 [11:08:21] Ep 275 | reward -0.073 | Phase 1/3 | Rolling mean: -0.492 / 0.6 | Episodes in phase: 275 [11:08:27] Ep 300 | reward +0.842 | Phase 1/3 | Rolling mean: -0.483 / 0.6 | Episodes in phase: 300 [11:09:50] Ep 325 | reward -1.541 | Phase 1/3 | Rolling mean: -0.526 / 0.6 | Episodes in phase: 325 [11:09:56] Ep 350 | reward -1.545 | Phase 1/3 | Rolling mean: -0.520 / 0.6 | Episodes in phase: 350 [11:10:03] Ep 375 | reward -1.545 | Phase 1/3 | Rolling mean: -0.498 / 0.6 | Episodes in phase: 375 [11:10:09] Ep 400 | reward -1.542 | Phase 1/3 | Rolling mean: -0.427 / 0.6 | Episodes in phase: 400 [11:10:32] Ep 425 | reward +2.306 | Phase 1/3 | Rolling mean: -0.443 / 0.6 | Episodes in phase: 425 [11:10:39] Ep 450 | reward -1.521 | Phase 1/3 | Rolling mean: -0.405 / 0.6 | Episodes in phase: 450 [11:11:56] Ep 475 | reward +1.361 | Phase 1/3 | Rolling mean: -0.392 / 0.6 | Episodes in phase: 475 [11:12:03] Ep 500 | reward +1.262 | Phase 1/3 | Rolling mean: -0.330 / 0.6 | Episodes in phase: 500 [11:12:25] Ep 525 | reward -1.370 | Phase 1/3 | Rolling mean: -0.343 / 0.6 | Episodes in phase: 525 [11:12:32] Ep 550 | reward -0.791 | Phase 1/3 | Rolling mean: -0.304 / 0.6 | Episodes in phase: 550 [11:12:39] Ep 575 | reward -2.053 | Phase 1/3 | Rolling mean: -0.261 / 0.6 | Episodes in phase: 575 [11:12:46] Ep 600 | reward -1.163 | Phase 1/3 | Rolling mean: -0.303 / 0.6 | Episodes in phase: 600 [11:14:12] Ep 625 | reward +0.563 | Phase 1/3 | Rolling mean: -0.280 / 0.6 | Episodes in phase: 625 [11:14:19] Ep 650 | reward -1.620 | Phase 1/3 | Rolling mean: -0.335 / 0.6 | Episodes in phase: 650 [11:14:27] Ep 675 | reward +0.994 | Phase 1/3 | Rolling mean: -0.274 / 0.6 | Episodes in phase: 675 [11:14:34] Ep 700 | reward -0.728 | Phase 1/3 | Rolling mean: -0.293 / 0.6 | Episodes in phase: 700 [11:14:55] Ep 725 | reward -0.023 | Phase 1/3 | Rolling mean: -0.249 / 0.6 | Episodes in phase: 725 [11:15:02] Ep 750 | reward +2.148 | Phase 1/3 | Rolling mean: -0.249 / 0.6 | Episodes in phase: 750 [11:16:20] Ep 775 | reward -1.583 | Phase 1/3 | Rolling mean: -0.238 / 0.6 | Episodes in phase: 775 [11:16:28] Ep 800 | reward -0.096 | Phase 1/3 | Rolling mean: -0.213 / 0.6 | Episodes in phase: 800 [11:16:50] Ep 825 | reward -0.967 | Phase 1/3 | Rolling mean: -0.204 / 0.6 | Episodes in phase: 825 [11:16:56] Ep 850 | reward +0.192 | Phase 1/3 | Rolling mean: -0.094 / 0.6 | Episodes in phase: 850 [11:17:04] Ep 875 | reward -1.050 | Phase 1/3 | Rolling mean: -0.160 / 0.6 | Episodes in phase: 875 [11:18:25] Ep 900 | reward +3.062 | Phase 1/3 | Rolling mean: -0.128 / 0.6 | Episodes in phase: 900 [11:18:47] Ep 925 | reward +1.293 | Phase 1/3 | Rolling mean: -0.108 / 0.6 | Episodes in phase: 925 [11:18:54] Ep 950 | reward +0.422 | Phase 1/3 | Rolling mean: -0.084 / 0.6 | Episodes in phase: 950 [11:19:01] Ep 975 | reward +1.644 | Phase 1/3 | Rolling mean: -0.055 / 0.6 | Episodes in phase: 975 [11:19:11] Ep 1000 | reward +1.232 | Phase 1/3 | Rolling mean: -0.082 / 0.6 | Episodes in phase: 1000 [11:19:32] Ep 1025 | reward -0.330 | Phase 1/3 | Rolling mean: -0.049 / 0.6 | Episodes in phase: 1025 [11:20:58] Ep 1050 | reward +0.055 | Phase 1/3 | Rolling mean: -0.055 / 0.6 | Episodes in phase: 1050 [11:21:05] Ep 1075 | reward -0.634 | Phase 1/3 | Rolling mean: -0.011 / 0.6 | Episodes in phase: 1075 [11:21:11] Ep 1100 | reward -1.339 | Phase 1/3 | Rolling mean: 0.009 / 0.6 | Episodes in phase: 1100 [11:21:35] Ep 1125 | reward +0.593 | Phase 1/3 | Rolling mean: -0.020 / 0.6 | Episodes in phase: 1125 [11:21:43] Ep 1150 | reward -1.622 | Phase 1/3 | Rolling mean: -0.062 / 0.6 | Episodes in phase: 1150 [11:21:50] Ep 1175 | reward -0.510 | Phase 1/3 | Rolling mean: -0.104 / 0.6 | Episodes in phase: 1175 [11:23:03] Ep 1200 | reward -2.411 | Phase 1/3 | Rolling mean: -0.054 / 0.6 | Episodes in phase: 1200 [11:23:24] Ep 1225 | reward +1.149 | Phase 1/3 | Rolling mean: 0.051 / 0.6 | Episodes in phase: 1225 [11:23:30] Ep 1250 | reward +0.250 | Phase 1/3 | Rolling mean: 0.083 / 0.6 | Episodes in phase: 1250 [11:23:37] Ep 1275 | reward +0.860 | Phase 1/3 | Rolling mean: 0.109 / 0.6 | Episodes in phase: 1275 [11:23:43] Ep 1300 | reward +2.844 | Phase 1/3 | Rolling mean: 0.121 / 0.6 | Episodes in phase: 1300 [11:24:05] Ep 1325 | reward +0.399 | Phase 1/3 | Rolling mean: 0.196 / 0.6 | Episodes in phase: 1325 [11:24:10] Ep 1350 | reward -0.790 | Phase 1/3 | Rolling mean: 0.311 / 0.6 | Episodes in phase: 1350 [11:24:15] Ep 1375 | reward -1.641 | Phase 1/3 | Rolling mean: 0.390 / 0.6 | Episodes in phase: 1375 [11:25:23] Ep 1400 | reward +0.059 | Phase 1/3 | Rolling mean: 0.465 / 0.6 | Episodes in phase: 1400 [11:25:41] Ep 1425 | reward +0.962 | Phase 1/3 | Rolling mean: 0.427 / 0.6 | Episodes in phase: 1425 [11:25:48] Ep 1450 | reward +0.184 | Phase 1/3 | Rolling mean: 0.502 / 0.6 | Episodes in phase: 1450 [11:25:53] Ep 1475 | reward +2.529 | Phase 1/3 | Rolling mean: 0.597 / 0.6 | Episodes in phase: 1475 [11:25:54] Ep 1478 | reward +3.360 | Phase 2/3 | Rolling mean: 0.000 / 1.0 | Episodes in phase: 0 [11:25:59] Ep 1500 | reward -0.227 | Phase 2/3 | Rolling mean: 0.121 / 1.0 | Episodes in phase: 22 [11:26:17] Periodic save at step 5,000 ... [11:26:20] Periodic push done β 5 files at step 5,000 [11:26:24] Ep 1525 | reward +0.519 | Phase 2/3 | Rolling mean: 0.336 / 1.0 | Episodes in phase: 47 [11:26:31] Ep 1550 | reward -1.655 | Phase 2/3 | Rolling mean: 0.464 / 1.0 | Episodes in phase: 72 [11:27:44] Ep 1575 | reward +1.494 | Phase 2/3 | Rolling mean: 0.580 / 1.0 | Episodes in phase: 97 [11:27:51] Ep 1600 | reward +1.042 | Phase 2/3 | Rolling mean: 0.682 / 1.0 | Episodes in phase: 122 [11:28:15] Ep 1625 | reward +0.742 | Phase 2/3 | Rolling mean: 0.627 / 1.0 | Episodes in phase: 147 [11:28:21] Ep 1650 | reward +1.428 | Phase 2/3 | Rolling mean: 0.635 / 1.0 | Episodes in phase: 172 [11:28:28] Ep 1675 | reward +0.336 | Phase 2/3 | Rolling mean: 0.634 / 1.0 | Episodes in phase: 197 [11:28:33] Ep 1700 | reward +4.152 | Phase 2/3 | Rolling mean: 0.681 / 1.0 | Episodes in phase: 222 [11:28:50] Ep 1725 | reward +2.964 | Phase 2/3 | Rolling mean: 0.746 / 1.0 | Episodes in phase: 247 [11:30:12] Ep 1750 | reward -0.352 | Phase 2/3 | Rolling mean: 0.756 / 1.0 | Episodes in phase: 272 [11:30:18] Ep 1775 | reward +3.482 | Phase 2/3 | Rolling mean: 0.721 / 1.0 | Episodes in phase: 297 [11:30:26] Ep 1800 | reward -0.045 | Phase 2/3 | Rolling mean: 0.699 / 1.0 | Episodes in phase: 322 [11:30:50] Ep 1825 | reward +2.169 | Phase 2/3 | Rolling mean: 0.783 / 1.0 | Episodes in phase: 347 [11:30:55] Ep 1850 | reward +1.839 | Phase 2/3 | Rolling mean: 0.795 / 1.0 | Episodes in phase: 372 [11:31:01] Ep 1875 | reward +0.765 | Phase 2/3 | Rolling mean: 0.870 / 1.0 | Episodes in phase: 397 [11:31:06] Ep 1900 | reward +1.146 | Phase 2/3 | Rolling mean: 0.942 / 1.0 | Episodes in phase: 422 [11:32:32] Ep 1925 | reward +1.780 | Phase 2/3 | Rolling mean: 0.934 / 1.0 | Episodes in phase: 447 [11:32:38] Ep 1950 | reward +1.365 | Phase 2/3 | Rolling mean: 1.008 / 1.0 | Episodes in phase: 472 [11:32:45] Ep 1975 | reward -1.427 | Phase 2/3 | Rolling mean: 1.078 / 1.0 | Episodes in phase: 497 [11:32:45] Ep 1978 | reward +0.503 | Phase 3/3 | Rolling mean: 0.000 / β | Episodes in phase: 0 [11:32:49] Ep 2000 | reward +3.096 | Phase 3/3 | Rolling mean: 0.995 / β | Episodes in phase: 22 [11:33:12] Ep 2025 | reward +2.753 | Phase 3/3 | Rolling mean: 0.718 / β | Episodes in phase: 47 [11:33:18] Ep 2050 | reward +0.487 | Phase 3/3 | Rolling mean: 0.844 / β | Episodes in phase: 72 [11:33:23] Ep 2075 | reward +1.654 | Phase 3/3 | Rolling mean: 0.959 / β | Episodes in phase: 97 [11:33:29] Ep 2100 | reward +1.654 | Phase 3/3 | Rolling mean: 0.925 / β | Episodes in phase: 122 [11:34:46] Ep 2125 | reward +0.516 | Phase 3/3 | Rolling mean: 0.959 / β | Episodes in phase: 147 [11:34:52] Ep 2150 | reward +3.792 | Phase 3/3 | Rolling mean: 0.979 / β | Episodes in phase: 172 [11:34:57] Ep 2175 | reward +0.073 | Phase 3/3 | Rolling mean: 0.995 / β | Episodes in phase: 197 [11:35:02] Ep 2200 | reward +2.109 | Phase 3/3 | Rolling mean: 1.027 / β | Episodes in phase: 222 [11:35:22] Ep 2225 | reward +1.313 | Phase 3/3 | Rolling mean: 1.061 / β | Episodes in phase: 247 [11:35:27] Ep 2250 | reward +3.740 | Phase 3/3 | Rolling mean: 1.103 / β | Episodes in phase: 272 [11:35:31] Ep 2275 | reward +2.000 | Phase 3/3 | Rolling mean: 1.066 / β | Episodes in phase: 297 [11:35:37] Ep 2300 | reward +0.179 | Phase 3/3 | Rolling mean: 1.106 / β | Episodes in phase: 322 [11:36:53] Ep 2325 | reward +1.694 | Phase 3/3 | Rolling mean: 1.085 / β | Episodes in phase: 347 [11:36:59] Ep 2350 | reward -0.421 | Phase 3/3 | Rolling mean: 1.064 / β | Episodes in phase: 372 [11:37:03] Ep 2375 | reward +1.838 | Phase 3/3 | Rolling mean: 1.123 / β | Episodes in phase: 397 [11:37:08] Ep 2400 | reward -0.246 | Phase 3/3 | Rolling mean: 1.117 / β | Episodes in phase: 422 [11:37:26] Ep 2425 | reward +3.134 | Phase 3/3 | Rolling mean: 1.167 / β | Episodes in phase: 447 [11:37:31] Ep 2450 | reward -0.659 | Phase 3/3 | Rolling mean: 1.173 / β | Episodes in phase: 472 [11:37:36] Ep 2475 | reward +2.264 | Phase 3/3 | Rolling mean: 1.210 / β | Episodes in phase: 497 [11:37:40] Ep 2500 | reward +0.612 | Phase 3/3 | Rolling mean: 1.224 / β | Episodes in phase: 522 [11:37:59] Ep 2525 | reward +0.474 | Phase 3/3 | Rolling mean: 1.223 / β | Episodes in phase: 547 [11:38:36] Ep 2550 | reward -0.258 | Phase 3/3 | Rolling mean: 1.246 / β | Episodes in phase: 572 [11:38:40] Ep 2575 | reward +2.700 | Phase 3/3 | Rolling mean: 1.239 / β | Episodes in phase: 597 [11:38:45] Ep 2600 | reward +1.871 | Phase 3/3 | Rolling mean: 1.240 / β | Episodes in phase: 622 [11:39:09] Ep 2625 | reward +1.806 | Phase 3/3 | Rolling mean: 1.325 / β | Episodes in phase: 647 [11:39:13] Ep 2650 | reward +1.485 | Phase 3/3 | Rolling mean: 1.266 / β | Episodes in phase: 672 [11:39:17] Ep 2675 | reward +0.647 | Phase 3/3 | Rolling mean: 1.290 / β | Episodes in phase: 697 [11:39:21] Ep 2700 | reward -0.045 | Phase 3/3 | Rolling mean: 1.263 / β | Episodes in phase: 722 [11:39:41] Ep 2725 | reward -0.226 | Phase 3/3 | Rolling mean: 1.311 / β | Episodes in phase: 747 [11:39:44] Ep 2750 | reward +0.837 | Phase 3/3 | Rolling mean: 1.374 / β | Episodes in phase: 772 [11:39:49] Ep 2775 | reward +1.761 | Phase 3/3 | Rolling mean: 1.325 / β | Episodes in phase: 797 [11:40:32] Ep 2800 | reward +0.013 | Phase 3/3 | Rolling mean: 1.313 / β | Episodes in phase: 822 [11:40:51] Ep 2825 | reward +0.972 | Phase 3/3 | Rolling mean: 1.223 / β | Episodes in phase: 847 [11:40:56] Ep 2850 | reward +0.477 | Phase 3/3 | Rolling mean: 1.272 / β | Episodes in phase: 872 [11:41:00] Ep 2875 | reward +1.260 | Phase 3/3 | Rolling mean: 1.281 / β | Episodes in phase: 897 [11:41:04] Ep 2900 | reward +0.083 | Phase 3/3 | Rolling mean: 1.360 / β | Episodes in phase: 922 [11:41:23] Ep 2925 | reward +0.015 | Phase 3/3 | Rolling mean: 1.454 / β | Episodes in phase: 947 [11:41:29] Ep 2950 | reward +1.500 | Phase 3/3 | Rolling mean: 1.414 / β | Episodes in phase: 972 [11:41:33] Ep 2975 | reward +3.524 | Phase 3/3 | Rolling mean: 1.559 / β | Episodes in phase: 997 [11:41:38] Ep 3000 | reward +1.860 | Phase 3/3 | Rolling mean: 1.650 / β | Episodes in phase: 1022 [11:42:41] Ep 3025 | reward +1.604 | Phase 3/3 | Rolling mean: 1.644 / β | Episodes in phase: 1047 [11:42:46] Ep 3050 | reward +1.799 | Phase 3/3 | Rolling mean: 1.636 / β | Episodes in phase: 1072 [11:42:50] Ep 3075 | reward +3.127 | Phase 3/3 | Rolling mean: 1.658 / β | Episodes in phase: 1097 [11:42:54] Ep 3100 | reward +0.200 | Phase 3/3 | Rolling mean: 1.640 / β | Episodes in phase: 1122 [11:43:12] Ep 3125 | reward +0.282 | Phase 3/3 | Rolling mean: 1.549 / β | Episodes in phase: 1147 [11:43:16] Ep 3150 | reward +0.694 | Phase 3/3 | Rolling mean: 1.551 / β | Episodes in phase: 1172 [11:43:21] Ep 3175 | reward +1.753 | Phase 3/3 | Rolling mean: 1.503 / β | Episodes in phase: 1197 [11:43:24] Ep 3200 | reward +1.497 | Phase 3/3 | Rolling mean: 1.392 / β | Episodes in phase: 1222 [11:43:43] Ep 3225 | reward +0.607 | Phase 3/3 | Rolling mean: 1.443 / β | Episodes in phase: 1247 [11:44:25] Ep 3250 | reward +2.374 | Phase 3/3 | Rolling mean: 1.528 / β | Episodes in phase: 1272 [11:44:30] Ep 3275 | reward +1.900 | Phase 3/3 | Rolling mean: 1.483 / β | Episodes in phase: 1297 [11:44:33] Ep 3300 | reward +1.744 | Phase 3/3 | Rolling mean: 1.515 / β | Episodes in phase: 1322 [11:44:53] Ep 3325 | reward +1.466 | Phase 3/3 | Rolling mean: 1.504 / β | Episodes in phase: 1347 [11:44:57] Ep 3350 | reward +1.019 | Phase 3/3 | Rolling mean: 1.499 / β | Episodes in phase: 1372 [11:45:00] Ep 3375 | reward -0.277 | Phase 3/3 | Rolling mean: 1.479 / β | Episodes in phase: 1397 [11:45:05] Ep 3400 | reward +2.236 | Phase 3/3 | Rolling mean: 1.526 / β | Episodes in phase: 1422 [11:45:25] Ep 3425 | reward +0.735 | Phase 3/3 | Rolling mean: 1.572 / β | Episodes in phase: 1447 [11:45:28] Ep 3450 | reward +2.571 | Phase 3/3 | Rolling mean: 1.515 / β | Episodes in phase: 1472 [11:45:32] Ep 3475 | reward -2.395 | Phase 3/3 | Rolling mean: 1.490 / β | Episodes in phase: 1497 [11:46:29] Ep 3500 | reward +1.103 | Phase 3/3 | Rolling mean: 1.466 / β | Episodes in phase: 1522 [11:46:49] Ep 3525 | reward +1.210 | Phase 3/3 | Rolling mean: 1.537 / β | Episodes in phase: 1547 [11:46:53] Ep 3550 | reward +2.704 | Phase 3/3 | Rolling mean: 1.597 / β | Episodes in phase: 1572 [11:46:57] Ep 3575 | reward +3.341 | Phase 3/3 | Rolling mean: 1.570 / β | Episodes in phase: 1597 [11:46:59] Periodic save at step 10,000 ... |