Micro-DeepRL / README.md
turtle170's picture
Update README.md
75198da verified
metadata
language: en
license: apache-2.0
library_name: jax
tags:
  - deep-reinforcement-learning
  - neuroevolution
  - tpu
  - micro-models
datasets:
  - synthetic-life-sim
metrics:
  - survival-time
  - mass-optimization

Micro-DeepRL (Gen 126k)

Micro-DeepRL is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for extreme resource-constrained environments (e.g., Google Cloud e2-micro, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM.

πŸ“Š Model Profile

Feature Specification
Model Architecture 3-Layer Convolutional Neural Network
Parameters 2,300 (8.6 KiB)
Grid Resolution 32x32 Cellular Automata
Input Channels 6 (Life, Food, Lava, 3x Signaling)
Training Time 126,000+ Generations
Compute 16x Google Cloud TPU v5e (TRC Program)

🧬 Evolution & Training

The model was trained using JAX and Keras 3 on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments).

  • Generation 0-10k: Discovery of basic movement.
  • Generation 10k-50k: Mastery of "Lava Avoidance."
  • Generation 50k-126k: Optimization of "Food Camping" and mass stabilization.
  • Final State: Reached a stable equilibrium loss of approximately -3040.00.

πŸš€ Deployment (Inference)

This model is distributed as a raw NumPy (.npy) array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script.

Hardware Target: "Potato Tier"

  • CPU: 1vCPU (Shared core)
  • RAM: < 1GB
  • Latency: Near-zero inference on standard x86/ARM hardware.

πŸ—ΊοΈ Roadmap: The 11-Day Sprint

This model is part of a larger scaling experiment conducted over 11 days of TPU quota:

  1. Micro-DeepRL (Gen 126k) - Current (32x32 Grid, 24 Filters)
  2. DeepRL-Standard - Active (64x64 Grid, 64 Filters, Metabolism Decay)
  3. DeepRL-Large - Planned (128x128 Grid, ResNet-style Blocks, T4 GPU Target)
  4. DeepRL-Max - Planned (Transformer/Attention-based Strategist, H200 Target)

πŸ› οΈ How to Load

import numpy as np

# Load the 8.6KB DNA payload
# Note: Use allow_pickle=True as weights are stored as a ragged object array
dna = np.load("microDeepRL.npy", allow_pickle=True)

# Parameter shapes:
# [0] Conv2D_1 Kernel (3,3,6,24)
# [1] Conv2D_1 Bias (24,)
# [2] Conv2D_2 Kernel (1,1,24,24)
# [3] Conv2D_2 Bias (24,)
# [4] Output Kernel (1,1,24,6)
# [5] Output Bias (6,)