mitudrudutta commited on
Commit
bd4f36c
·
1 Parent(s): 73ff1b0

docs: update previous training run link in README for accuracy

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -32,7 +32,7 @@ pinned: false
32
  > · 📺 [**Walkthrough video (YouTube)**](https://youtu.be/7dz37JTTMo4)
33
  > · 🤗 [**Hugging Face Space**](https://huggingface.co/spaces/mitudrudutta/ChargeBackOps)
34
  > · 🧪 [**Latest training run (Colab — iter 5, 200 GRPO steps)**](https://colab.research.google.com/drive/1GtLH6_b10oHlAnnGq4hnBkcGJ-pE_za5?usp=sharing)
35
- > · 🧪 [**Previous training run (Colab — iter 3, 62 GRPO steps)**](https://colab.research.google.com/drive/1AjG3Sv7FnMeOSls6JMzTunkMzlJi_ySu?usp=sharing)
36
  > · 🧠 [**Specification-gaming write-up**](docs/SPECIFICATION_GAMING.md)
37
 
38
  ## TL;DR (60-second read)
@@ -148,7 +148,7 @@ Pipeline: **Qwen2.5-3B fp16 + LoRA r=16** on a single Colab T4. Phase A is super
148
 
149
  - **Repo notebook (canonical):** [`notebooks/train_merchant_agent.ipynb`](notebooks/train_merchant_agent.ipynb)
150
  - **Latest Colab run (iter 5, 200 GRPO steps):** [open in Colab](https://colab.research.google.com/drive/1GtLH6_b10oHlAnnGq4hnBkcGJ-pE_za5?usp=sharing)
151
- - **Previous Colab run (iter 3, 62 GRPO steps):** [open in Colab](https://colab.research.google.com/drive/1AjG3Sv7FnMeOSls6JMzTunkMzlJi_ySu?usp=sharing)
152
 
153
  ### Five training iterations, three failure modes
154
 
 
32
  > · 📺 [**Walkthrough video (YouTube)**](https://youtu.be/7dz37JTTMo4)
33
  > · 🤗 [**Hugging Face Space**](https://huggingface.co/spaces/mitudrudutta/ChargeBackOps)
34
  > · 🧪 [**Latest training run (Colab — iter 5, 200 GRPO steps)**](https://colab.research.google.com/drive/1GtLH6_b10oHlAnnGq4hnBkcGJ-pE_za5?usp=sharing)
35
+ > · 🧪 [**Previous training run (Colab — iter 4, 62 GRPO steps)**](https://colab.research.google.com/drive/1AjG3Sv7FnMeOSls6JMzTunkMzlJi_ySu?usp=sharing)
36
  > · 🧠 [**Specification-gaming write-up**](docs/SPECIFICATION_GAMING.md)
37
 
38
  ## TL;DR (60-second read)
 
148
 
149
  - **Repo notebook (canonical):** [`notebooks/train_merchant_agent.ipynb`](notebooks/train_merchant_agent.ipynb)
150
  - **Latest Colab run (iter 5, 200 GRPO steps):** [open in Colab](https://colab.research.google.com/drive/1GtLH6_b10oHlAnnGq4hnBkcGJ-pE_za5?usp=sharing)
151
+ - **Previous Colab run (iter 4, 62 GRPO steps):** [open in Colab](https://colab.research.google.com/drive/1AjG3Sv7FnMeOSls6JMzTunkMzlJi_ySu?usp=sharing)
152
 
153
  ### Five training iterations, three failure modes
154