turtle170 commited on
Commit
c1201b5
·
verified ·
1 Parent(s): 1960be0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -3
README.md CHANGED
@@ -1,3 +1,69 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ library_name: jax
5
+ tags:
6
+ - deep-reinforcement-learning
7
+ - neuroevolution
8
+ - tpu
9
+ - micro-models
10
+ datasets:
11
+ - synthetic-life-sim
12
+ metrics:
13
+ - survival-time
14
+ - mass-optimization
15
+ ---
16
+
17
+ ## Micro-DeepRL (Gen 126k)
18
+
19
+ **Micro-DeepRL** is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for **extreme resource-constrained environments** (e.g., Google Cloud `e2-micro`, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM.
20
+
21
+ ## 📊 Model Profile
22
+ | Feature | Specification |
23
+ | :--- | :--- |
24
+ | **Model Architecture** | 3-Layer Convolutional Neural Network |
25
+ | **Parameters** | ~2,300 (~8.6 KiB) |
26
+ | **Grid Resolution** | 32x32 Cellular Automata |
27
+ | **Input Channels** | 6 (Life, Food, Lava, 3x Signaling) |
28
+ | **Training Time** | 126,000+ Generations |
29
+ | **Compute** | 16x Google Cloud TPU v5e (TRC Program) |
30
+
31
+ ## 🧬 Evolution & Training
32
+ The model was trained using **JAX** and **Keras 3** on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments).
33
+
34
+ - **Generation 0-10k:** Discovery of basic movement.
35
+ - **Generation 10k-50k:** Mastery of "Lava Avoidance."
36
+ - **Generation 50k-126k:** Optimization of "Food Camping" and mass stabilization.
37
+ - **Final State:** Reached a stable equilibrium loss of approximately **-3040.00**.
38
+
39
+ ## 🚀 Deployment (Inference)
40
+ This model is distributed as a raw **NumPy (`.npy`)** array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script.
41
+
42
+ ### Hardware Target: "Potato Tier"
43
+ - **CPU:** 1vCPU (Shared core)
44
+ - **RAM:** < 1GB
45
+ - **Latency:** Near-zero inference on standard x86/ARM hardware.
46
+
47
+ ## 🗺️ Roadmap: The 11-Day Sprint
48
+ This model is part of a larger scaling experiment conducted over 11 days of TPU quota:
49
+
50
+ 1. **Micro-DeepRL** (Gen 126k) - **Current** (32x32 Grid, 24 Filters)
51
+ 2. **DeepRL-Standard** - **Active** (64x64 Grid, 64 Filters, Metabolism Decay)
52
+ 3. **DeepRL-Large** - **Planned** (128x128 Grid, ResNet-style Blocks, T4 GPU Target)
53
+ 4. **DeepRL-Max** - **Planned** (Transformer/Attention-based Strategist, H200 Target)
54
+
55
+ ## 🛠️ How to Load
56
+ ```python
57
+ import numpy as np
58
+
59
+ # Load the 8.6KB DNA payload
60
+ # Note: Use allow_pickle=True as weights are stored as a ragged object array
61
+ dna = np.load("godzilla_dna_latest.npy", allow_pickle=True)
62
+
63
+ # Parameter shapes:
64
+ # [0] Conv2D_1 Kernel (3,3,6,24)
65
+ # [1] Conv2D_1 Bias (24,)
66
+ # [2] Conv2D_2 Kernel (1,1,24,24)
67
+ # [3] Conv2D_2 Bias (24,)
68
+ # [4] Output Kernel (1,1,24,6)
69
+ # [5] Output Bias (6,)