Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -23,10 +23,9 @@ pinned: false
|
|
| 23 |
| 🧠 **Merged model (deployable)** | [`Yaswanth-Bolla/qwen-merged`](https://huggingface.co/Yaswanth-Bolla/qwen-merged) |
|
| 24 |
| 🧩 **LoRA adapter (post-GRPO)** | [`daemongg/qwen2.5-7b-sre-grpo`](https://huggingface.co/daemongg/qwen2.5-7b-sre-grpo) |
|
| 25 |
| 🏗️ **Base model** | [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
|
| 26 |
-
| 📒 **
|
| 27 |
-
| 📊 **Ablation results** | [`./ablation.md`](./ablation.md) |
|
| 28 |
|
| 29 |
-
> ⚠️ **Note on training infrastructure.** We ran the full pipeline (SFT, GRPO, merge) on **HuggingFace Jobs** (A100-40GB) instead of a Colab notebook — Colab's free + Pro tiers OOM'd on the 7B base + reference model + GRPO group buffers. The **complete training logs and the exact scripts we executed** are committed under [`./logger/`](./
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
|
|
| 23 |
| 🧠 **Merged model (deployable)** | [`Yaswanth-Bolla/qwen-merged`](https://huggingface.co/Yaswanth-Bolla/qwen-merged) |
|
| 24 |
| 🧩 **LoRA adapter (post-GRPO)** | [`daemongg/qwen2.5-7b-sre-grpo`](https://huggingface.co/daemongg/qwen2.5-7b-sre-grpo) |
|
| 25 |
| 🏗️ **Base model** | [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
|
| 26 |
+
| 📒 **Logs + scripts** | [`logger`](https://huggingface.co/spaces/Meta-HF-hackathon/updated-policy/tree/main/logger) |
|
|
|
|
| 27 |
|
| 28 |
+
> ⚠️ **Note on training infrastructure.** We ran the full pipeline (SFT, GRPO, merge) on **HuggingFace Jobs** (A100-40GB) instead of a Colab notebook — Colab's free + Pro tiers OOM'd on the 7B base + reference model + GRPO group buffers. The **complete training logs and the exact scripts we executed** are committed under [`./logger/`](https://huggingface.co/spaces/Meta-HF-hackathon/updated-policy/tree/main/logger) (`sft_finetune.log`, `grpo_finetune.log`, `merge.log`, `trajectory.log`, `ablation.log`, plus the `.py` scripts that produced them) so the run is reproducible end-to-end.
|
| 29 |
|
| 30 |
---
|
| 31 |
|