Spaces:
Sleeping
Sleeping
Commit ·
bd4f36c
1
Parent(s): 73ff1b0
docs: update previous training run link in README for accuracy
Browse files
README.md
CHANGED
|
@@ -32,7 +32,7 @@ pinned: false
|
|
| 32 |
> · 📺 [**Walkthrough video (YouTube)**](https://youtu.be/7dz37JTTMo4)
|
| 33 |
> · 🤗 [**Hugging Face Space**](https://huggingface.co/spaces/mitudrudutta/ChargeBackOps)
|
| 34 |
> · 🧪 [**Latest training run (Colab — iter 5, 200 GRPO steps)**](https://colab.research.google.com/drive/1GtLH6_b10oHlAnnGq4hnBkcGJ-pE_za5?usp=sharing)
|
| 35 |
-
> · 🧪 [**Previous training run (Colab — iter
|
| 36 |
> · 🧠 [**Specification-gaming write-up**](docs/SPECIFICATION_GAMING.md)
|
| 37 |
|
| 38 |
## TL;DR (60-second read)
|
|
@@ -148,7 +148,7 @@ Pipeline: **Qwen2.5-3B fp16 + LoRA r=16** on a single Colab T4. Phase A is super
|
|
| 148 |
|
| 149 |
- **Repo notebook (canonical):** [`notebooks/train_merchant_agent.ipynb`](notebooks/train_merchant_agent.ipynb)
|
| 150 |
- **Latest Colab run (iter 5, 200 GRPO steps):** [open in Colab](https://colab.research.google.com/drive/1GtLH6_b10oHlAnnGq4hnBkcGJ-pE_za5?usp=sharing)
|
| 151 |
-
- **Previous Colab run (iter
|
| 152 |
|
| 153 |
### Five training iterations, three failure modes
|
| 154 |
|
|
|
|
| 32 |
> · 📺 [**Walkthrough video (YouTube)**](https://youtu.be/7dz37JTTMo4)
|
| 33 |
> · 🤗 [**Hugging Face Space**](https://huggingface.co/spaces/mitudrudutta/ChargeBackOps)
|
| 34 |
> · 🧪 [**Latest training run (Colab — iter 5, 200 GRPO steps)**](https://colab.research.google.com/drive/1GtLH6_b10oHlAnnGq4hnBkcGJ-pE_za5?usp=sharing)
|
| 35 |
+
> · 🧪 [**Previous training run (Colab — iter 4, 62 GRPO steps)**](https://colab.research.google.com/drive/1AjG3Sv7FnMeOSls6JMzTunkMzlJi_ySu?usp=sharing)
|
| 36 |
> · 🧠 [**Specification-gaming write-up**](docs/SPECIFICATION_GAMING.md)
|
| 37 |
|
| 38 |
## TL;DR (60-second read)
|
|
|
|
| 148 |
|
| 149 |
- **Repo notebook (canonical):** [`notebooks/train_merchant_agent.ipynb`](notebooks/train_merchant_agent.ipynb)
|
| 150 |
- **Latest Colab run (iter 5, 200 GRPO steps):** [open in Colab](https://colab.research.google.com/drive/1GtLH6_b10oHlAnnGq4hnBkcGJ-pE_za5?usp=sharing)
|
| 151 |
+
- **Previous Colab run (iter 4, 62 GRPO steps):** [open in Colab](https://colab.research.google.com/drive/1AjG3Sv7FnMeOSls6JMzTunkMzlJi_ySu?usp=sharing)
|
| 152 |
|
| 153 |
### Five training iterations, three failure modes
|
| 154 |
|