| --- |
| language: en |
| license: apache-2.0 |
| library_name: jax |
| tags: |
| - deep-reinforcement-learning |
| - neuroevolution |
| - tpu |
| - micro-models |
| datasets: |
| - synthetic-life-sim |
| metrics: |
| - survival-time |
| - mass-optimization |
| --- |
| |
| ## Micro-DeepRL (Gen 126k) |
|
|
| **Micro-DeepRL** is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for **extreme resource-constrained environments** (e.g., Google Cloud `e2-micro`, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM. |
|
|
| ## ๐ Model Profile |
| | Feature | Specification | |
| | :--- | :--- | |
| | **Model Architecture** | 3-Layer Convolutional Neural Network | |
| | **Parameters** | ~2,300 (~8.6 KiB) | |
| | **Grid Resolution** | 32x32 Cellular Automata | |
| | **Input Channels** | 6 (Life, Food, Lava, 3x Signaling) | |
| | **Training Time** | 126,000+ Generations | |
| | **Compute** | 16x Google Cloud TPU v5e (TRC Program) | |
|
|
| ## ๐งฌ Evolution & Training |
| The model was trained using **JAX** and **Keras 3** on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments). |
|
|
| - **Generation 0-10k:** Discovery of basic movement. |
| - **Generation 10k-50k:** Mastery of "Lava Avoidance." |
| - **Generation 50k-126k:** Optimization of "Food Camping" and mass stabilization. |
| - **Final State:** Reached a stable equilibrium loss of approximately **-3040.00**. |
|
|
| ## ๐ Deployment (Inference) |
| This model is distributed as a raw **NumPy (`.npy`)** array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script. |
|
|
| ### Hardware Target: "Potato Tier" |
| - **CPU:** 1vCPU (Shared core) |
| - **RAM:** < 1GB |
| - **Latency:** Near-zero inference on standard x86/ARM hardware. |
|
|
| ## ๐บ๏ธ Roadmap: The 11-Day Sprint |
| This model is part of a larger scaling experiment conducted over 11 days of TPU quota: |
|
|
| 1. **Micro-DeepRL** (Gen 126k) - **Current** (32x32 Grid, 24 Filters) |
| 2. **DeepRL-Standard** - **Active** (64x64 Grid, 64 Filters, Metabolism Decay) |
| 3. **DeepRL-Large** - **Planned** (128x128 Grid, ResNet-style Blocks, T4 GPU Target) |
| 4. **DeepRL-Max** - **Planned** (Transformer/Attention-based Strategist, H200 Target) |
|
|
| ## ๐ ๏ธ How to Load |
| ```python |
| import numpy as np |
| |
| # Load the 8.6KB DNA payload |
| # Note: Use allow_pickle=True as weights are stored as a ragged object array |
| dna = np.load("microDeepRL.npy", allow_pickle=True) |
| |
| # Parameter shapes: |
| # [0] Conv2D_1 Kernel (3,3,6,24) |
| # [1] Conv2D_1 Bias (24,) |
| # [2] Conv2D_2 Kernel (1,1,24,24) |
| # [3] Conv2D_2 Bias (24,) |
| # [4] Output Kernel (1,1,24,6) |
| # [5] Output Bias (6,) |