Update README.md

75198da verified about 4 hours ago

2.66 kB

language: en
license: apache-2.0
library_name: jax
tags:
  - deep-reinforcement-learning
  - neuroevolution
  - tpu
  - micro-models
datasets:
  - synthetic-life-sim
metrics:
  - survival-time
  - mass-optimization

Micro-DeepRL (Gen 126k)

Micro-DeepRL is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for extreme resource-constrained environments (e.g., Google Cloud e2-micro, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM.

📊 Model Profile

Feature	Specification
Model Architecture	3-Layer Convolutional Neural Network
Parameters	~~2,300 (~~8.6 KiB)
Grid Resolution	32x32 Cellular Automata
Input Channels	6 (Life, Food, Lava, 3x Signaling)
Training Time	126,000+ Generations
Compute	16x Google Cloud TPU v5e (TRC Program)

🧬 Evolution & Training

The model was trained using JAX and Keras 3 on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments).

Generation 0-10k: Discovery of basic movement.
Generation 10k-50k: Mastery of "Lava Avoidance."
Generation 50k-126k: Optimization of "Food Camping" and mass stabilization.
Final State: Reached a stable equilibrium loss of approximately -3040.00.

🚀 Deployment (Inference)

This model is distributed as a raw NumPy (.npy) array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script.

Hardware Target: "Potato Tier"

CPU: 1vCPU (Shared core)
RAM: < 1GB
Latency: Near-zero inference on standard x86/ARM hardware.

🗺️ Roadmap: The 11-Day Sprint

This model is part of a larger scaling experiment conducted over 11 days of TPU quota:

Micro-DeepRL (Gen 126k) - Current (32x32 Grid, 24 Filters)
DeepRL-Standard - Active (64x64 Grid, 64 Filters, Metabolism Decay)
DeepRL-Large - Planned (128x128 Grid, ResNet-style Blocks, T4 GPU Target)
DeepRL-Max - Planned (Transformer/Attention-based Strategist, H200 Target)

🛠️ How to Load

import numpy as np

# Load the 8.6KB DNA payload
# Note: Use allow_pickle=True as weights are stored as a ragged object array
dna = np.load("microDeepRL.npy", allow_pickle=True)

# Parameter shapes:
# [0] Conv2D_1 Kernel (3,3,6,24)
# [1] Conv2D_1 Bias (24,)
# [2] Conv2D_2 Kernel (1,1,24,24)
# [3] Conv2D_2 Bias (24,)
# [4] Output Kernel (1,1,24,6)
# [5] Output Bias (6,)