| tags: | |
| - continuous-llm | |
| - neural-ode | |
| - research | |
| language: | |
| - en | |
| - zh | |
| # 3rd-Order Continuous LLM 500M | |
| A 500M parameter language model with 3rd-order continuous dynamics. | |
| Non-standard architecture. Custom inference runtime required. | |
| ## Overview | |
| - Parameters: ~500M | |
| - Hidden size: 1024 | |
| - Layers: 28 | |
| - Attention: 16 query heads / 4 KV heads | |
| - MLP size: 4096 | |
| - Vocabulary size: 151643 | |
| - Tokenizer family: Qwen2.5 tokenizer vocabulary | |
| ## Public Architecture Features | |
| - RoPE positional encoding | |
| - RMSNorm | |
| - Grouped Query Attention (16Q / 4KV) | |
| - SiLU MLP | |
| - bfloat16 weights | |
| ## Usage | |
| This repository publishes weights only. | |
| It is not expected to run with standard Hugging Face `AutoModel` pipelines. | |
| The reason is that this model does not follow a standard separation between inference and training. It uses an endogenous control regime without the usual loss-driven runtime split. In short: inference is training. Conceptually this is closer to a TTT-like family of ideas than to a standard frozen LLM runtime, but the mechanism and goals here are different. | |
| At the moment, only these public details are released. If you are interested in higher-order ODE LLMs, request API access, or want to discuss custom runtime/code access, contact: | |
| - `2218038150@qq.com` | |
| - `a2218038150@gmail.com` | |