turtle170
/

Micro-DeepRL

deep-reinforcement-learning

Model card Files Files and versions

Micro-DeepRL / README.md

turtle170's picture

Update README.md

75198da verified about 7 hours ago

|

history blame contribute delete

2.66 kB

	---
	language: en
	license: apache-2.0
	library_name: jax
	tags:
	- deep-reinforcement-learning
	- neuroevolution
	- tpu
	- micro-models
	datasets:
	- synthetic-life-sim
	metrics:
	- survival-time
	- mass-optimization
	---

	## Micro-DeepRL (Gen 126k)

	Micro-DeepRL is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for extreme resource-constrained environments (e.g., Google Cloud `e2-micro`, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM.

	## 📊 Model Profile
	\| Feature \| Specification \|
	\| :--- \| :--- \|
	\| Model Architecture \| 3-Layer Convolutional Neural Network \|
	\| Parameters \| ~2,300 (~8.6 KiB) \|
	\| Grid Resolution \| 32x32 Cellular Automata \|
	\| Input Channels \| 6 (Life, Food, Lava, 3x Signaling) \|
	\| Training Time \| 126,000+ Generations \|
	\| Compute \| 16x Google Cloud TPU v5e (TRC Program) \|

	## 🧬 Evolution & Training
	The model was trained using JAX and Keras 3 on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments).

	- Generation 0-10k: Discovery of basic movement.
	- Generation 10k-50k: Mastery of "Lava Avoidance."
	- Generation 50k-126k: Optimization of "Food Camping" and mass stabilization.
	- Final State: Reached a stable equilibrium loss of approximately -3040.00.

	## 🚀 Deployment (Inference)
	This model is distributed as a raw NumPy (`.npy`) array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script.

	### Hardware Target: "Potato Tier"
	- CPU: 1vCPU (Shared core)
	- RAM: < 1GB
	- Latency: Near-zero inference on standard x86/ARM hardware.

	## 🗺️ Roadmap: The 11-Day Sprint
	This model is part of a larger scaling experiment conducted over 11 days of TPU quota:

	1. Micro-DeepRL (Gen 126k) - Current (32x32 Grid, 24 Filters)
	2. DeepRL-Standard - Active (64x64 Grid, 64 Filters, Metabolism Decay)
	3. DeepRL-Large - Planned (128x128 Grid, ResNet-style Blocks, T4 GPU Target)
	4. DeepRL-Max - Planned (Transformer/Attention-based Strategist, H200 Target)

	## 🛠️ How to Load
	```python
	import numpy as np

	# Load the 8.6KB DNA payload
	# Note: Use allow_pickle=True as weights are stored as a ragged object array
	dna = np.load("microDeepRL.npy", allow_pickle=True)

	# Parameter shapes:
	# [0] Conv2D_1 Kernel (3,3,6,24)
	# [1] Conv2D_1 Bias (24,)
	# [2] Conv2D_2 Kernel (1,1,24,24)
	# [3] Conv2D_2 Bias (24,)
	# [4] Output Kernel (1,1,24,6)
	# [5] Output Bias (6,)