Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,69 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
library_name: jax
|
| 5 |
+
tags:
|
| 6 |
+
- deep-reinforcement-learning
|
| 7 |
+
- neuroevolution
|
| 8 |
+
- tpu
|
| 9 |
+
- micro-models
|
| 10 |
+
datasets:
|
| 11 |
+
- synthetic-life-sim
|
| 12 |
+
metrics:
|
| 13 |
+
- survival-time
|
| 14 |
+
- mass-optimization
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
## Micro-DeepRL (Gen 126k)
|
| 18 |
+
|
| 19 |
+
**Micro-DeepRL** is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for **extreme resource-constrained environments** (e.g., Google Cloud `e2-micro`, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM.
|
| 20 |
+
|
| 21 |
+
## 📊 Model Profile
|
| 22 |
+
| Feature | Specification |
|
| 23 |
+
| :--- | :--- |
|
| 24 |
+
| **Model Architecture** | 3-Layer Convolutional Neural Network |
|
| 25 |
+
| **Parameters** | ~2,300 (~8.6 KiB) |
|
| 26 |
+
| **Grid Resolution** | 32x32 Cellular Automata |
|
| 27 |
+
| **Input Channels** | 6 (Life, Food, Lava, 3x Signaling) |
|
| 28 |
+
| **Training Time** | 126,000+ Generations |
|
| 29 |
+
| **Compute** | 16x Google Cloud TPU v5e (TRC Program) |
|
| 30 |
+
|
| 31 |
+
## 🧬 Evolution & Training
|
| 32 |
+
The model was trained using **JAX** and **Keras 3** on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments).
|
| 33 |
+
|
| 34 |
+
- **Generation 0-10k:** Discovery of basic movement.
|
| 35 |
+
- **Generation 10k-50k:** Mastery of "Lava Avoidance."
|
| 36 |
+
- **Generation 50k-126k:** Optimization of "Food Camping" and mass stabilization.
|
| 37 |
+
- **Final State:** Reached a stable equilibrium loss of approximately **-3040.00**.
|
| 38 |
+
|
| 39 |
+
## 🚀 Deployment (Inference)
|
| 40 |
+
This model is distributed as a raw **NumPy (`.npy`)** array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script.
|
| 41 |
+
|
| 42 |
+
### Hardware Target: "Potato Tier"
|
| 43 |
+
- **CPU:** 1vCPU (Shared core)
|
| 44 |
+
- **RAM:** < 1GB
|
| 45 |
+
- **Latency:** Near-zero inference on standard x86/ARM hardware.
|
| 46 |
+
|
| 47 |
+
## 🗺️ Roadmap: The 11-Day Sprint
|
| 48 |
+
This model is part of a larger scaling experiment conducted over 11 days of TPU quota:
|
| 49 |
+
|
| 50 |
+
1. **Micro-DeepRL** (Gen 126k) - **Current** (32x32 Grid, 24 Filters)
|
| 51 |
+
2. **DeepRL-Standard** - **Active** (64x64 Grid, 64 Filters, Metabolism Decay)
|
| 52 |
+
3. **DeepRL-Large** - **Planned** (128x128 Grid, ResNet-style Blocks, T4 GPU Target)
|
| 53 |
+
4. **DeepRL-Max** - **Planned** (Transformer/Attention-based Strategist, H200 Target)
|
| 54 |
+
|
| 55 |
+
## 🛠️ How to Load
|
| 56 |
+
```python
|
| 57 |
+
import numpy as np
|
| 58 |
+
|
| 59 |
+
# Load the 8.6KB DNA payload
|
| 60 |
+
# Note: Use allow_pickle=True as weights are stored as a ragged object array
|
| 61 |
+
dna = np.load("godzilla_dna_latest.npy", allow_pickle=True)
|
| 62 |
+
|
| 63 |
+
# Parameter shapes:
|
| 64 |
+
# [0] Conv2D_1 Kernel (3,3,6,24)
|
| 65 |
+
# [1] Conv2D_1 Bias (24,)
|
| 66 |
+
# [2] Conv2D_2 Kernel (1,1,24,24)
|
| 67 |
+
# [3] Conv2D_2 Bias (24,)
|
| 68 |
+
# [4] Output Kernel (1,1,24,6)
|
| 69 |
+
# [5] Output Bias (6,)
|