Update README.md
Browse files
README.md
CHANGED
|
@@ -33,7 +33,7 @@ datasets:
|
|
| 33 |
|
| 34 |
This checkpoint is the primary CodeT5-based solver we used for the MindsAI @ Tufa Labs entry in the ARC Prize 2025 competition. It shares the same architecture as `mindware/arc-codet5-660m-scr` (a 16-layer decoder variant of `Salesforce/codet5-large`), but *does not* include the Span-Corruption Refinement (SCR) auxiliary training stage. Instead, it represents the best non-refinement checkpoint obtained during long-horizon pretraining on TPU-v4 systems.
|
| 35 |
|
| 36 |
-
- **No SCR stage**: this model was trained purely with the original span-corruption + instruction fine-tuning curriculum
|
| 37 |
- **Decoder-only pruning**: the original decoder depth (24) was reduced to 16 layers after experiments showed encoder pruning harmed sample efficiency, while decoder pruning could be recovered through extended training.
|
| 38 |
- **Long-run TPU training**: training spanned roughly two years on a V4-64 TPU, made possible by Google’s TPU Research Cloud program.
|
| 39 |
|
|
|
|
| 33 |
|
| 34 |
This checkpoint is the primary CodeT5-based solver we used for the MindsAI @ Tufa Labs entry in the ARC Prize 2025 competition. It shares the same architecture as `mindware/arc-codet5-660m-scr` (a 16-layer decoder variant of `Salesforce/codet5-large`), but *does not* include the Span-Corruption Refinement (SCR) auxiliary training stage. Instead, it represents the best non-refinement checkpoint obtained during long-horizon pretraining on TPU-v4 systems.
|
| 35 |
|
| 36 |
+
- **No SCR stage**: this model was trained purely with the original span-corruption + instruction fine-tuning curriculum + ARC fine tunining.
|
| 37 |
- **Decoder-only pruning**: the original decoder depth (24) was reduced to 16 layers after experiments showed encoder pruning harmed sample efficiency, while decoder pruning could be recovered through extended training.
|
| 38 |
- **Long-run TPU training**: training spanned roughly two years on a V4-64 TPU, made possible by Google’s TPU Research Cloud program.
|
| 39 |
|