| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - floydnet |
| - diffusion-models |
| - ARC-AGI |
| --- |
| |
|
|
| # FloydARC (ARC-AGI Reasoning) |
|
|
| ## Model Summary |
|
|
| **FloydARC** is a neural algorithmic reasoning model adapted from FloydNet for the **ARC-AGI** benchmark. |
| This checkpoint is trained primarily on ARC-style synthetic and curated data, and is designed to solve ARC tasks via **iterative refinement and test-time adaptation**, rather than large-scale web pretraining. |
|
|
| Among models trained mainly on ARC-like data, FloydARC achieves **state-of-the-art performance** on both ARC-AGI-1 and ARC-AGI-2, significantly narrowing the gap to very large proprietary models. |
|
|
| --- |
|
|
| ## Performance |
|
|
| FloydARC demonstrates strong generalization on ARC benchmarks under standard evaluation protocols. |
|
|
| **ARC-AGI benchmark results:** |
|
|
| | Model | #Params | ARC-AGI-1 | ARC-AGI-2 | |
| | ------------ | ------: | --------: | --------: | |
| | VARC | 73M | 60.4 | 11.1 | |
| | Loop-ViT | 11.2M | 61.2 | 10.3 | |
| | HRM | 27M | 40.3 | 5.0 | |
| | **FloydARC** | 153.7M | **70.5** | **15.3** | |
|
|
|
|
|
|
| --- |
|
|
| ## Model Details |
|
|
| * **Model ID**: `ocxlabs/FloydARC` |
| * **Task**: Abstraction and Reasoning Corpus (ARC-AGI) |
| * **Architecture**: FloydNet-based global relational reasoning with looped refinement |
| * **Input / Output**: ARC grid-based visual reasoning (query canvas → predicted answer canvas) |
| * **License**: Apache 2.0 |
|
|
| --- |
|
|
| ## Usage: Inference & Evaluation |
|
|
| This checkpoint is intended for **research and evaluation use** on ARC-AGI. Full reproduction of reported results requires multi-GPU inference with test-time training. |
|
|
| ### 1. Download checkpoint |
|
|
| Download the pretrained checkpoint from Hugging Face: |
|
|
| ``` |
| https://huggingface.co/ocxlabs/FloydARC |
| ``` |
|
|
| Place the downloaded folder anywhere on disk and pass its path via `--ckpt_path`. |
|
|
| --- |
|
|
| ### 2. Prepare ARC evaluation data |
|
|
| Place the original ARC JSON files under `rawdata/`, then preprocess: |
|
|
| ```bash |
| python -m scripts.process_data \ |
| --input_dir ./rawdata/ARC-AGI-1_evaluation/ \ |
| --output_dir ./preprocessed/arc1 \ |
| --split test |
| ``` |
|
|
| Repeat with `ARC-AGI-2_evaluation` for ARC-AGI-2. |
|
|
| --- |
|
|
| ### 3. Run inference with Test-Time Training (recommended) |
|
|
| ```bash |
| python -m scripts.TTT \ |
| --ckpt_path /path/to/floydarc_ckpt \ |
| --subset arc1 \ |
| --output_dir ./output/TTT_results |
| ``` |
|
|
| Notes: |
|
|
| * Default configuration uses **8 GPUs on a single node** |
| * LoRA-based TTT is enabled by default and recommended |
| * For ARC-AGI-2, set `--subset arc2` |
|
|
| --- |
|
|
| ### 4. Ensembling & visualization |
|
|
| For reproducible evaluation and qualitative inspection: |
|
|
| ```bash |
| python -m scripts.analyze \ |
| --result-folder ./output/TTT_results \ |
| --subset arc1 \ |
| --out-html output/arc1_results.html |
| ``` |
|
|
| Multiple result folders can be passed to enable max-voting ensembling. |