language: en
license: apache-2.0
library_name: jax
tags:
- deep-reinforcement-learning
- neuroevolution
- tpu
- micro-models
datasets:
- synthetic-life-sim
metrics:
- survival-time
- mass-optimization
Micro-DeepRL (Gen 126k)
Micro-DeepRL is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for extreme resource-constrained environments (e.g., Google Cloud e2-micro, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM.
π Model Profile
| Feature | Specification |
|---|---|
| Model Architecture | 3-Layer Convolutional Neural Network |
| Parameters | |
| Grid Resolution | 32x32 Cellular Automata |
| Input Channels | 6 (Life, Food, Lava, 3x Signaling) |
| Training Time | 126,000+ Generations |
| Compute | 16x Google Cloud TPU v5e (TRC Program) |
𧬠Evolution & Training
The model was trained using JAX and Keras 3 on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments).
- Generation 0-10k: Discovery of basic movement.
- Generation 10k-50k: Mastery of "Lava Avoidance."
- Generation 50k-126k: Optimization of "Food Camping" and mass stabilization.
- Final State: Reached a stable equilibrium loss of approximately -3040.00.
π Deployment (Inference)
This model is distributed as a raw NumPy (.npy) array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script.
Hardware Target: "Potato Tier"
- CPU: 1vCPU (Shared core)
- RAM: < 1GB
- Latency: Near-zero inference on standard x86/ARM hardware.
πΊοΈ Roadmap: The 11-Day Sprint
This model is part of a larger scaling experiment conducted over 11 days of TPU quota:
- Micro-DeepRL (Gen 126k) - Current (32x32 Grid, 24 Filters)
- DeepRL-Standard - Active (64x64 Grid, 64 Filters, Metabolism Decay)
- DeepRL-Large - Planned (128x128 Grid, ResNet-style Blocks, T4 GPU Target)
- DeepRL-Max - Planned (Transformer/Attention-based Strategist, H200 Target)
π οΈ How to Load
import numpy as np
# Load the 8.6KB DNA payload
# Note: Use allow_pickle=True as weights are stored as a ragged object array
dna = np.load("microDeepRL.npy", allow_pickle=True)
# Parameter shapes:
# [0] Conv2D_1 Kernel (3,3,6,24)
# [1] Conv2D_1 Bias (24,)
# [2] Conv2D_2 Kernel (1,1,24,24)
# [3] Conv2D_2 Bias (24,)
# [4] Output Kernel (1,1,24,6)
# [5] Output Bias (6,)