kgdrathan commited on
Commit
df7c076
·
verified ·
1 Parent(s): 6067794

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +12 -3
README.md CHANGED
@@ -105,15 +105,24 @@ And did and SFT to teach/align our SLM to the expected Marimo/Manim code style.<
105
 
106
  ## Links
107
 
108
- SFT Code: [train/sft_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/sft_unsloth.py)
 
 
 
 
 
109
  RL GRPO Code: [train/grpo_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/grpo_unsloth.py)
110
- Dashboard for interacting with the environment: [explainer-env-dashboard](https://kgdrathan-explainer-env-dashboard.hf.space/)
111
 
112
  > Dashboard is for looking at logs and interacting with the environment.
 
 
113
 
114
  ## Status
115
 
116
  Completed: Environment and SFT<br>
117
- Remaining: RL GRPO training<br>
 
 
118
 
119
 
 
105
 
106
  ## Links
107
 
108
+ SFT Code:
109
+
110
+ [train/sft_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/sft_unsloth.py)<br>
111
+ [training curves](https://huggingface.co/kgdrathan/ministral-3-3b-4bit-marimo-manim/blob/main/training_curves.png)<br>
112
+ [adapter model](https://huggingface.co/kgdrathan/ministral-3-3b-4bit-marimo-manim/)<br>
113
+
114
  RL GRPO Code: [train/grpo_unsloth.py](https://gitlab.com/kgdrathan/openenv-explainer/-/blob/main/train/grpo_unsloth.py)
115
+
116
 
117
  > Dashboard is for looking at logs and interacting with the environment.
118
+ Dashboard for interacting with the environment: [explainer-env-dashboard](https://kgdrathan-explainer-env-dashboard.hf.space/)
119
+
120
 
121
  ## Status
122
 
123
  Completed: Environment and SFT<br>
124
+ Remaining: RL GRPO training (some errors in the code)<br>
125
+
126
+
127
 
128