Spaces:

kgdrathan
/

explainer-env

Sleeping

kgdrathan commited on Apr 26

Commit

df7c076

verified ·

1 Parent(s): 6067794

Upload folder using huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -105,15 +105,24 @@ And did and SFT to teach/align our SLM to the expected Marimo/Manim code style.<
 ## Links
-SFT Code: [train/sft_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/sft_unsloth.py)
 RL GRPO Code: [train/grpo_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/grpo_unsloth.py)
-Dashboard for interacting with the environment: [explainer-env-dashboard](https://kgdrathan-explainer-env-dashboard.hf.space/)
 > Dashboard is for looking at logs and interacting with the environment.
 ## Status
 Completed: Environment and SFT<br>
-Remaining: RL GRPO training<br>

 ## Links
+SFT Code:
+[train/sft_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/sft_unsloth.py)<br>
+[training curves](https://huggingface.co/kgdrathan/ministral-3-3b-4bit-marimo-manim/blob/main/training_curves.png)<br>
+[adapter model](https://huggingface.co/kgdrathan/ministral-3-3b-4bit-marimo-manim/)<br>
 RL GRPO Code: [train/grpo_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/grpo_unsloth.py)
 > Dashboard is for looking at logs and interacting with the environment.
+Dashboard for interacting with the environment: [explainer-env-dashboard](https://kgdrathan-explainer-env-dashboard.hf.space/)
 ## Status
 Completed: Environment and SFT<br>
+Remaining: RL GRPO training (some errors in the code)<br>