Upload ARC-IT model checkpoint

Browse files

Files changed (4) hide show

.gitattributes +1 -0
README.md +55 -0
config.json +111 -0
model.pt +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1 @@


1	+ model.pt filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+---
+tags:
+  - arc-agi
+  - abstract-reasoning
+  - rule-conditioned-transformer
+  - discrete-reasoning
+license: mit
+---
+# ARC-IT: Rule-Conditioned Transformer for ARC-AGI
+A novel architecture that solves abstract reasoning tasks (ARC-AGI) by explicitly
+extracting transformation rules from demonstration pairs and applying them to new inputs:
+- **GridTokenizer** -- Embeds discrete ARC grids (0-11) into continuous patch tokens
+- **RuleEncoder** -- Extracts transformation rules from demo input/output pairs via cross-attention
+- **RuleApplier** -- Applies the learned rules to a test input via cross-attention
+- **SpatialDecoder** -- Converts output tokens to 64x64 grid logits
+## Architecture
+```
+Demo Pairs -> GridTokenizer -> RuleEncoder (cross-attention + aggregation) -> Rule Tokens
+Test Input  -> GridTokenizer -> RuleApplier (cross-attention to rules) -> SpatialDecoder -> Predicted Grid
+```
+## Training
+- **2-stage training**: Full Training -> Hard Focus (AGI-2 oversampling)
+- **Test-Time Training (TTT)**: Per-task fine-tuning on demonstration examples
+## Model Details
+- **Training step**: 18000
+- **Best validation accuracy**: 0.733029360572497
+- **Hidden size**: 384
+- **Rule Encoder**: 2 pair layers, 2 agg layers, 64 rule tokens
+- **Rule Applier**: 4 layers, 8 heads
+- **Canvas size**: 64
+## Usage
+```python
+import torch
+from arc_it.models.arc_it_model import ARCITModel
+model = ARCITModel.from_config(config)
+ckpt = torch.load("model.pt", map_location="cpu", weights_only=False)
+model.load_state_dict(ckpt["model_state_dict"])
+```
+## Links
+- **Repository**: [github.com/REDDITARUN/arc_it](https://github.com/REDDITARUN/arc_it)
+- **ARC-AGI**: [arcprize.org](https://arcprize.org)

config.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+  "data": {
+    "arc_agi1_path": "References/ARC-AGI",
+    "arc_agi2_path": "References/ARC-AGI-2",
+    "re_arc_path": "References/RE-ARC",
+    "canvas_size": 64,
+    "num_colors": 12,
+    "max_grid_size": 30,
+    "max_demos": 5,
+    "re_arc_samples_per_task": 50,
+    "repeat_factor": 1,
+    "augmentation": {
+      "geometric": true,
+      "color_permutation": true,
+      "num_color_perms": 10,
+      "keep_background": true,
+      "resolution_scaling": true,
+      "translation": true
+    }
+  },
+  "model": {
+    "hidden_size": 384,
+    "mlp_ratio": 2.5,
+    "tokenizer": {
+      "patch_size": 4
+    },
+    "rule_encoder": {
+      "pair_layers": 2,
+      "agg_layers": 2,
+      "num_heads": 8,
+      "num_rule_tokens": 64
+    },
+    "rule_applier": {
+      "num_layers": 4,
+      "num_heads": 8
+    },
+    "decoder": {
+      "upsample_method": "transposed_conv",
+      "hidden_channels": [
+        192,
+        96
+      ]
+    }
+  },
+  "training": {
+    "batch_size": 64,
+    "num_workers": 8,
+    "gradient_clip": 1.0,
+    "stage1": {
+      "name": "pretrain",
+      "data_sources": [
+        "re_arc"
+      ],
+      "epochs": 50,
+      "lr": 0.0003
+    },
+    "stage2": {
+      "name": "finetune",
+      "data_sources": [
+        "agi1",
+        "agi2"
+      ],
+      "epochs": 30,
+      "lr": 0.0001
+    },
+    "stage3": {
+      "name": "hard_focus",
+      "data_sources": [
+        "agi1",
+        "agi2"
+      ],
+      "epochs": 10,
+      "lr": 3e-05,
+      "agi2_oversample": 2.0
+    },
+    "optimizer": {
+      "name": "adamw",
+      "weight_decay": 0.01,
+      "betas": [
+        0.9,
+        0.999
+      ]
+    },
+    "scheduler": {
+      "name": "cosine",
+      "warmup_ratio": 0.1
+    },
+    "log_every_n_steps": 100,
+    "save_every_n_epochs": 10,
+    "checkpoint_dir": "checkpoints"
+  },
+  "ttt": {
+    "enabled": true,
+    "steps": 100,
+    "lr": 0.0001,
+    "batch_size": 8,
+    "num_candidates": 32
+  },
+  "evaluation": {
+    "val_split_ratio": 0.1,
+    "val_data_sources": [
+      "agi1",
+      "agi2"
+    ],
+    "metrics": [
+      "pixel_accuracy",
+      "grid_exact_match"
+    ],
+    "visualize_every_n_tasks": 50
+  }
+}

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e9cbf16263cf79d22e6f219cb5e11110d6dc6ed84aedb204222900ce9727f119
+size 68062850