File size: 2,657 Bytes
c1201b5
 
75198da
c1201b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75198da
c1201b5
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
language: en
license: apache-2.0
library_name: jax
tags:
- deep-reinforcement-learning
- neuroevolution
- tpu
- micro-models
datasets:
- synthetic-life-sim
metrics:
- survival-time
- mass-optimization
---

## Micro-DeepRL (Gen 126k)

**Micro-DeepRL** is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for **extreme resource-constrained environments** (e.g., Google Cloud `e2-micro`, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM.

## 📊 Model Profile
| Feature | Specification |
| :--- | :--- |
| **Model Architecture** | 3-Layer Convolutional Neural Network |
| **Parameters** | ~2,300 (~8.6 KiB) |
| **Grid Resolution** | 32x32 Cellular Automata |
| **Input Channels** | 6 (Life, Food, Lava, 3x Signaling) |
| **Training Time** | 126,000+ Generations |
| **Compute** | 16x Google Cloud TPU v5e (TRC Program) |

## 🧬 Evolution & Training
The model was trained using **JAX** and **Keras 3** on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments).

- **Generation 0-10k:** Discovery of basic movement.
- **Generation 10k-50k:** Mastery of "Lava Avoidance."
- **Generation 50k-126k:** Optimization of "Food Camping" and mass stabilization.
- **Final State:** Reached a stable equilibrium loss of approximately **-3040.00**.

## 🚀 Deployment (Inference)
This model is distributed as a raw **NumPy (`.npy`)** array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script.

### Hardware Target: "Potato Tier"
- **CPU:** 1vCPU (Shared core)
- **RAM:** < 1GB
- **Latency:** Near-zero inference on standard x86/ARM hardware.

## 🗺️ Roadmap: The 11-Day Sprint
This model is part of a larger scaling experiment conducted over 11 days of TPU quota:

1.  **Micro-DeepRL** (Gen 126k) - **Current** (32x32 Grid, 24 Filters)
2.  **DeepRL-Standard** - **Active** (64x64 Grid, 64 Filters, Metabolism Decay)
3.  **DeepRL-Large** - **Planned** (128x128 Grid, ResNet-style Blocks, T4 GPU Target)
4.  **DeepRL-Max** - **Planned** (Transformer/Attention-based Strategist, H200 Target)

## 🛠️ How to Load
```python
import numpy as np

# Load the 8.6KB DNA payload
# Note: Use allow_pickle=True as weights are stored as a ragged object array
dna = np.load("microDeepRL.npy", allow_pickle=True)

# Parameter shapes:
# [0] Conv2D_1 Kernel (3,3,6,24)
# [1] Conv2D_1 Bias (24,)
# [2] Conv2D_2 Kernel (1,1,24,24)
# [3] Conv2D_2 Bias (24,)
# [4] Output Kernel (1,1,24,6)
# [5] Output Bias (6,)