turtle170
/

Micro-DeepRL

deep-reinforcement-learning

Model card Files Files and versions

turtle170 commited on about 7 hours ago

Commit

c1201b5

·

verified ·

1 Parent(s): 1960be0

Update README.md

Files changed (1) hide show

README.md +69 -3

README.md CHANGED Viewed

@@ -1,3 +1,69 @@
----
-license: apache-2.0
----

+---
+language: en
+license: mit
+library_name: jax
+tags:
+- deep-reinforcement-learning
+- neuroevolution
+- tpu
+- micro-models
+datasets:
+- synthetic-life-sim
+metrics:
+- survival-time
+- mass-optimization
+---
+## Micro-DeepRL (Gen 126k)
+**Micro-DeepRL** is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for **extreme resource-constrained environments** (e.g., Google Cloud `e2-micro`, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM.
+## 📊 Model Profile
+| Feature | Specification |
+| :--- | :--- |
+| **Model Architecture** | 3-Layer Convolutional Neural Network |
+| **Parameters** | ~2,300 (~8.6 KiB) |
+| **Grid Resolution** | 32x32 Cellular Automata |
+| **Input Channels** | 6 (Life, Food, Lava, 3x Signaling) |
+| **Training Time** | 126,000+ Generations |
+| **Compute** | 16x Google Cloud TPU v5e (TRC Program) |
+## 🧬 Evolution & Training
+The model was trained using **JAX** and **Keras 3** on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments).
+- **Generation 0-10k:** Discovery of basic movement.
+- **Generation 10k-50k:** Mastery of "Lava Avoidance."
+- **Generation 50k-126k:** Optimization of "Food Camping" and mass stabilization.
+- **Final State:** Reached a stable equilibrium loss of approximately **-3040.00**.
+## 🚀 Deployment (Inference)
+This model is distributed as a raw **NumPy (`.npy`)** array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script.
+### Hardware Target: "Potato Tier"
+- **CPU:** 1vCPU (Shared core)
+- **RAM:** < 1GB
+- **Latency:** Near-zero inference on standard x86/ARM hardware.
+## 🗺️ Roadmap: The 11-Day Sprint
+This model is part of a larger scaling experiment conducted over 11 days of TPU quota:
+1.  **Micro-DeepRL** (Gen 126k) - **Current** (32x32 Grid, 24 Filters)
+2.  **DeepRL-Standard** - **Active** (64x64 Grid, 64 Filters, Metabolism Decay)
+3.  **DeepRL-Large** - **Planned** (128x128 Grid, ResNet-style Blocks, T4 GPU Target)
+4.  **DeepRL-Max** - **Planned** (Transformer/Attention-based Strategist, H200 Target)
+## 🛠️ How to Load
+```python
+import numpy as np
+# Load the 8.6KB DNA payload
+# Note: Use allow_pickle=True as weights are stored as a ragged object array
+dna = np.load("godzilla_dna_latest.npy", allow_pickle=True)
+# Parameter shapes:
+# [0] Conv2D_1 Kernel (3,3,6,24)
+# [1] Conv2D_1 Bias (24,)
+# [2] Conv2D_2 Kernel (1,1,24,24)
+# [3] Conv2D_2 Bias (24,)
+# [4] Output Kernel (1,1,24,6)
+# [5] Output Bias (6,)