|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- Salesforce/codet5-large |
|
|
tags: |
|
|
- ARC-AGI |
|
|
- ARC |
|
|
- code |
|
|
datasets: |
|
|
- WizardLMTeam/WizardLM_evol_instruct_V2_196k |
|
|
- Open-Orca/SlimOrca |
|
|
- camel-ai/math |
|
|
- skeskinen/TinyStories-GPT4 |
|
|
- rajpurkar/squad_v2 |
|
|
- garage-bAInd/Open-Platypus |
|
|
- Sharathhebbar24/arxiv-math-instruct-50k |
|
|
- AlgorithmicResearchGroup/arxiv-physics-instruct-tune-30k |
|
|
- TIGER-Lab/MathInstruct |
|
|
- neoneye/histogram-comparisons-small-v1 |
|
|
- ise-uiuc/Magicoder-Evol-Instruct-110K |
|
|
- PrimeIntellect/INTELLECT-MATH-SFT-Data |
|
|
- PrimeIntellect/verifiable-math-problems |
|
|
- sethapun/arithmetic_2md_1to1000 |
|
|
- EleutherAI/proof-pile-2 |
|
|
- MMInstruction/M3IT |
|
|
- stingning/ultrachat |
|
|
- timdettmers/openassistant-guanaco |
|
|
- Dahoas/instruct-synthetic-prompt-responses |
|
|
- pankajmathur/WizardLM_Orca |
|
|
--- |
|
|
|
|
|
This checkpoint is the primary CodeT5-based solver we used for the MindsAI @ Tufa Labs entry in the ARC Prize 2025 competition. It shares the same architecture as `mindware/arc-codet5-660m-scr` (a 16-layer decoder variant of `Salesforce/codet5-large`), but *does not* include the Span-Corruption Refinement (SCR) auxiliary training stage. Instead, it represents the best non-refinement checkpoint obtained during long-horizon pretraining on TPU-v4 systems. |
|
|
|
|
|
- **No SCR stage**: this model was trained purely with the original span-corruption + instruction fine-tuning curriculum + ARC fine tunining. |
|
|
- **Decoder-only pruning**: the original decoder depth (24) was reduced to 16 layers after experiments showed encoder pruning harmed sample efficiency, while decoder pruning could be recovered through extended training. |
|
|
- **Long-run TPU training**: training spanned roughly two years on a V4-64 TPU, made possible by Google’s TPU Research Cloud program. |
|
|
|
|
|
📚 ARC-Related Datasets & Frameworks |
|
|
RE-ARC Link: https://github.com/michaelhodel/re-arc |
|
|
Note: This is the repository from Michael Hodel, which procedurally generates examples for the 400 ARC training tasks. We also include RE-ARC eval and ARC 1.5 (also by Michael Hodel). |
|
|
ConceptARC Link: https://github.com/victorvikram/ConceptARC |
|
|
1D-ARC (likely "ID ARC") Link: https://khalil-research.github.io/LLM4ARC/ |
|
|
ARC_gym |
|
|
Sort-of-ARC |
|
|
Andreas Koepf - Generated many tasks based upon the RE-ARC methodology using various foundation models. Additionally generated from a generator Andreas wrote based on the icecuber solution. It also includes extra tasks like predicting the solution graph. |
|
|
Jack Cole - Wrote generators for 60-80 tasks. Many were inspired by ARC items. Others were large concept datasets (cellular automata, math equation derived boards). |
|
|
|
|
|
There is a large amount of ARC-related tasks that are not solving for the board (like generating code, predicting various parameters or features related to the task). There are other non-ARC related tasks. |
|
|
|
|
|
## ARC Data Formatting |
|
|
|
|
|
- ARC tasks ship as JSON where each `task_id` contains `train` pairs and `test` inputs; every grid is a rectangular list of lists with integers `0-9`. Dimensions follow the original 1×1–30×30 spec, though the evaluator accepts up to 50×50. |
|
|
- Example task payload: |
|
|
```json |
|
|
{ |
|
|
"task_id": { |
|
|
"train": [ |
|
|
{"input": [[0,0],[1,1]], "output": [[1,1],[1,1]]} |
|
|
], |
|
|
"test": [ |
|
|
{"input": [[0,0,0],[0,1,0],[0,0,0]]} |
|
|
] |
|
|
} |
|
|
} |
|
|
``` |
|
|
- Model prompts (`prompt` column during training/TTT/inference) are serialized text strings: `solve: train input1 <train_input> output1 <prefix><train_output>. … test tinput1 <test_input> toutput1 `. Each grid token `<train_input>` / `<train_output>` / `<test_input>` is produced by `grid_to_string`, so rows are concatenated digits separated by spaces. Multiple train examples increment the index (`input2`, `output2`, etc.). |
|
|
- Prompt example: |
|
|
```text |
|
|
solve: train input1 000 010 000 output1 11 3 3 10 111 101 111. input2 00 02 output2 5 2 2 20 22 20. test tinput1 0000 0300 0000 0000 toutput1 |
|
|
``` |
|
|
- Model targets (`correct_answer` column and expected decoder output before post-processing) follow `output_prefix` semantics: ` {total_chars} {height} {width} {symbols} {row_strings}.` Here `total_chars = height*width + (height - 1)` and `symbols` is the deduplicated sequence of colors as they are first encountered when scanning the board row-major; that rule applies to every output grid we emit (training outputs inside the prompt and the predicted test toutput). Example target string for a 3×3 donut: |
|
|
```text |
|
|
11 3 3 10 111 101 111. |
|
|
``` |
|
|
|
|
|
|