v2: 64M decoder (2L, dim=1024), frozen VLM, dropout=0.0

62d6794 verified about 2 months ago

1.2 kB

	---
	license: apache-2.0
	tags:
	- game-ai
	- flow-matching
	- action-prediction
	- elden-ring
	- vla
	base_model: Qwen/Qwen3.5-4B
	---

	# Pi-Lumine 4B — Flow-Matching Action Decoder for Elden Ring

	A Pi0.5-style flow-matching action decoder trained on top of a frozen Qwen3.5-4B VLM backbone.

	## Architecture

	- Base VLM: Qwen/Qwen3.5-4B (frozen, not included — downloaded at runtime)
	- Action Decoder: FiLM-conditioned transformer with cross-attention to VLM hidden states
	- 2 decoder layers, VLM dim 2560 → decoder dim 1024, 8 attention heads
	- Projection layers decouple decoder from VLM hidden size
	- Instruction-conditioned via AdaptiveRMSNorm (FiLM)
	- Sinusoidal time embedding for flow matching
	- ~64M trainable parameters
	- Action Space: 6 steps x 20 dims (4 sticks + 16 buttons per step)
	- Training: Flow matching with Euler ODE integration at inference

	## Files

	- `action_decoder.pt` — Trained action decoder weights
	- `decoder_config.json` — Architecture and tokenizer config
	- `tokenizer.json` / `tokenizer_config.json` — Tokenizer with special tokens
	- `chat_template.jinja` — Chat template
	- `processor_config.json` — Processor config