FloydARC / README.md
paramecinm's picture
Update README.md
02541b0 verified
---
license: apache-2.0
language:
- en
tags:
- floydnet
- diffusion-models
- ARC-AGI
---
# FloydARC (ARC-AGI Reasoning)
## Model Summary
**FloydARC** is a neural algorithmic reasoning model adapted from FloydNet for the **ARC-AGI** benchmark.
This checkpoint is trained primarily on ARC-style synthetic and curated data, and is designed to solve ARC tasks via **iterative refinement and test-time adaptation**, rather than large-scale web pretraining.
Among models trained mainly on ARC-like data, FloydARC achieves **state-of-the-art performance** on both ARC-AGI-1 and ARC-AGI-2, significantly narrowing the gap to very large proprietary models.
---
## Performance
FloydARC demonstrates strong generalization on ARC benchmarks under standard evaluation protocols.
**ARC-AGI benchmark results:**
| Model | #Params | ARC-AGI-1 | ARC-AGI-2 |
| ------------ | ------: | --------: | --------: |
| VARC | 73M | 60.4 | 11.1 |
| Loop-ViT | 11.2M | 61.2 | 10.3 |
| HRM | 27M | 40.3 | 5.0 |
| **FloydARC** | 153.7M | **70.5** | **15.3** |
---
## Model Details
* **Model ID**: `ocxlabs/FloydARC`
* **Task**: Abstraction and Reasoning Corpus (ARC-AGI)
* **Architecture**: FloydNet-based global relational reasoning with looped refinement
* **Input / Output**: ARC grid-based visual reasoning (query canvas → predicted answer canvas)
* **License**: Apache 2.0
---
## Usage: Inference & Evaluation
This checkpoint is intended for **research and evaluation use** on ARC-AGI. Full reproduction of reported results requires multi-GPU inference with test-time training.
### 1. Download checkpoint
Download the pretrained checkpoint from Hugging Face:
```
https://huggingface.co/ocxlabs/FloydARC
```
Place the downloaded folder anywhere on disk and pass its path via `--ckpt_path`.
---
### 2. Prepare ARC evaluation data
Place the original ARC JSON files under `rawdata/`, then preprocess:
```bash
python -m scripts.process_data \
--input_dir ./rawdata/ARC-AGI-1_evaluation/ \
--output_dir ./preprocessed/arc1 \
--split test
```
Repeat with `ARC-AGI-2_evaluation` for ARC-AGI-2.
---
### 3. Run inference with Test-Time Training (recommended)
```bash
python -m scripts.TTT \
--ckpt_path /path/to/floydarc_ckpt \
--subset arc1 \
--output_dir ./output/TTT_results
```
Notes:
* Default configuration uses **8 GPUs on a single node**
* LoRA-based TTT is enabled by default and recommended
* For ARC-AGI-2, set `--subset arc2`
---
### 4. Ensembling & visualization
For reproducible evaluation and qualitative inspection:
```bash
python -m scripts.analyze \
--result-folder ./output/TTT_results \
--subset arc1 \
--out-html output/arc1_results.html
```
Multiple result folders can be passed to enable max-voting ensembling.