Micro-DeepRL / README.md
turtle170's picture
Update README.md
75198da verified
---
language: en
license: apache-2.0
library_name: jax
tags:
- deep-reinforcement-learning
- neuroevolution
- tpu
- micro-models
datasets:
- synthetic-life-sim
metrics:
- survival-time
- mass-optimization
---
## Micro-DeepRL (Gen 126k)
**Micro-DeepRL** is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for **extreme resource-constrained environments** (e.g., Google Cloud `e2-micro`, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM.
## ๐Ÿ“Š Model Profile
| Feature | Specification |
| :--- | :--- |
| **Model Architecture** | 3-Layer Convolutional Neural Network |
| **Parameters** | ~2,300 (~8.6 KiB) |
| **Grid Resolution** | 32x32 Cellular Automata |
| **Input Channels** | 6 (Life, Food, Lava, 3x Signaling) |
| **Training Time** | 126,000+ Generations |
| **Compute** | 16x Google Cloud TPU v5e (TRC Program) |
## ๐Ÿงฌ Evolution & Training
The model was trained using **JAX** and **Keras 3** on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments).
- **Generation 0-10k:** Discovery of basic movement.
- **Generation 10k-50k:** Mastery of "Lava Avoidance."
- **Generation 50k-126k:** Optimization of "Food Camping" and mass stabilization.
- **Final State:** Reached a stable equilibrium loss of approximately **-3040.00**.
## ๐Ÿš€ Deployment (Inference)
This model is distributed as a raw **NumPy (`.npy`)** array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script.
### Hardware Target: "Potato Tier"
- **CPU:** 1vCPU (Shared core)
- **RAM:** < 1GB
- **Latency:** Near-zero inference on standard x86/ARM hardware.
## ๐Ÿ—บ๏ธ Roadmap: The 11-Day Sprint
This model is part of a larger scaling experiment conducted over 11 days of TPU quota:
1. **Micro-DeepRL** (Gen 126k) - **Current** (32x32 Grid, 24 Filters)
2. **DeepRL-Standard** - **Active** (64x64 Grid, 64 Filters, Metabolism Decay)
3. **DeepRL-Large** - **Planned** (128x128 Grid, ResNet-style Blocks, T4 GPU Target)
4. **DeepRL-Max** - **Planned** (Transformer/Attention-based Strategist, H200 Target)
## ๐Ÿ› ๏ธ How to Load
```python
import numpy as np
# Load the 8.6KB DNA payload
# Note: Use allow_pickle=True as weights are stored as a ragged object array
dna = np.load("microDeepRL.npy", allow_pickle=True)
# Parameter shapes:
# [0] Conv2D_1 Kernel (3,3,6,24)
# [1] Conv2D_1 Bias (24,)
# [2] Conv2D_2 Kernel (1,1,24,24)
# [3] Conv2D_2 Bias (24,)
# [4] Output Kernel (1,1,24,6)
# [5] Output Bias (6,)