--- tags: - continuous-llm - neural-ode - research language: - en - zh --- # 3rd-Order Continuous LLM 500M A 500M parameter language model with 3rd-order continuous dynamics. Non-standard architecture. Custom inference runtime required. ## Overview - Parameters: ~500M - Hidden size: 1024 - Layers: 28 - Attention: 16 query heads / 4 KV heads - MLP size: 4096 - Vocabulary size: 151643 - Tokenizer family: Qwen2.5 tokenizer vocabulary ## Public Architecture Features - RoPE positional encoding - RMSNorm - Grouped Query Attention (16Q / 4KV) - SiLU MLP - bfloat16 weights ## Usage This repository publishes weights only. It is not expected to run with standard Hugging Face `AutoModel` pipelines. The reason is that this model does not follow a standard separation between inference and training. It uses an endogenous control regime without the usual loss-driven runtime split. In short: inference is training. Conceptually this is closer to a TTT-like family of ideas than to a standard frozen LLM runtime, but the mechanism and goals here are different. At the moment, only these public details are released. If you are interested in higher-order ODE LLMs, request API access, or want to discuss custom runtime/code access, contact: - `2218038150@qq.com` - `a2218038150@gmail.com`