plasterlabs
/

MOEW_SMALL

Model card Files Files and versions

MOEW_SMALL / README.md

Hemanshu121's picture

Update README.md

6adbe1b verified about 1 month ago

|

history blame contribute delete

2.67 kB

	---
	license: mit
	---

	# 🧠 NeuroGolf 2026: Ultra-Efficient ARC-AGI Solver

	## 📌 Overview
	This repository contains the implementation of NeuroGolf 2026, an ultra-efficient model designed to solve Abstraction and Reasoning Corpus (ARC-AGI) image transformations.

	The project focuses on maximizing reasoning capability while strictly adhering to extreme model size constraints required for competition submission.

	---

	## 🏁 Competition Constraints

	The model is strictly optimized to meet the following requirements:

	- ONNX File Size Limit: ≤ 1.44 MB
	- Parameter Budget:
	- ~360K parameters (Float32)
	- ~1.4M parameters (INT8 quantized)
	- Input/Output Shape:
	`(1, 10, 30, 30)` for both input and output logits

	---

	## 🏗️ Architecture

	The system uses a Teacher–Student Distillation framework to compress high-level reasoning into a micro-scale deployable model.

	### 🧠 Mega-Teacher Model (`MegaTeacherARCNet`)
	- Purpose: Captures complex patterns and logic across 400+ ARC tasks
	- Dimensions: 512 hidden units, 16 residual blocks deep
	- Technique: Standard convolutions + deep residual architecture for maximum pattern recognition

	---

	### ⚡ Student Model (`UltraTinyARCNet`)
	- Purpose: Final deployable model optimized for strict size limits
	- Dimensions: 56 hidden units, 5 residual blocks deep

	#### 🔧 Key Techniques
	- Depthwise Separable Convolutions → ~10× parameter reduction
	- No Bias Terms → `bias=False` in Conv2d to reduce parameter count
	- Residual Blocks → Maintain gradient flow in ultra-small networks

	---

	## 🔄 Training Pipeline

	### 1️⃣ Teacher Training
	- Train Mega-Teacher for 50 epochs
	- Dataset: Full 400+ ARC tasks
	- Augmentation: 8× (rotations + flips)

	---

	### 2️⃣ Knowledge Distillation
	- Student learns from teacher’s soft probability distributions
	- Transfers “dark knowledge”
	- Achieves better generalization vs hard-label training

	---

	### 3️⃣ Pruning & Fine-Tuning

	#### ✂️ Pruning
	- Remove 30–35% of low-magnitude weights
	- Method: L1 unstructured pruning
	- Ensures ONNX file remains under 1.44 MB

	#### 🔧 Fine-Tuning
	- 20 epochs recovery training
	- Restores performance lost during pruning

	---

	## ⚙️ Installation & Usage

	### 📋 Prerequisites
	- Linux environment (Debian/Kali recommended)
	- Python 3.10+
	- NVIDIA GPU with CUDA support (optimized for 2× T4 setup)

	---

	### 🛠️ Setup
	```bash
	pip install torch torchvision numpy onnx onnxruntime
	mkdir -p data/training/
	# Place ARC task JSON files in data/training/