Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -105,15 +105,24 @@ And did and SFT to teach/align our SLM to the expected Marimo/Manim code style.<
|
|
| 105 |
|
| 106 |
## Links
|
| 107 |
|
| 108 |
-
SFT Code:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
RL GRPO Code: [train/grpo_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/grpo_unsloth.py)
|
| 110 |
-
|
| 111 |
|
| 112 |
> Dashboard is for looking at logs and interacting with the environment.
|
|
|
|
|
|
|
| 113 |
|
| 114 |
## Status
|
| 115 |
|
| 116 |
Completed: Environment and SFT<br>
|
| 117 |
-
Remaining: RL GRPO training<br>
|
|
|
|
|
|
|
| 118 |
|
| 119 |
|
|
|
|
| 105 |
|
| 106 |
## Links
|
| 107 |
|
| 108 |
+
SFT Code:
|
| 109 |
+
|
| 110 |
+
[train/sft_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/sft_unsloth.py)<br>
|
| 111 |
+
[training curves](https://huggingface.co/kgdrathan/ministral-3-3b-4bit-marimo-manim/blob/main/training_curves.png)<br>
|
| 112 |
+
[adapter model](https://huggingface.co/kgdrathan/ministral-3-3b-4bit-marimo-manim/)<br>
|
| 113 |
+
|
| 114 |
RL GRPO Code: [train/grpo_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/grpo_unsloth.py)
|
| 115 |
+
|
| 116 |
|
| 117 |
> Dashboard is for looking at logs and interacting with the environment.
|
| 118 |
+
Dashboard for interacting with the environment: [explainer-env-dashboard](https://kgdrathan-explainer-env-dashboard.hf.space/)
|
| 119 |
+
|
| 120 |
|
| 121 |
## Status
|
| 122 |
|
| 123 |
Completed: Environment and SFT<br>
|
| 124 |
+
Remaining: RL GRPO training (some errors in the code)<br>
|
| 125 |
+
|
| 126 |
+
|
| 127 |
|
| 128 |
|