Vincent221
/

3rd-Order-Continuous-500M

Model card Files Files and versions

3rd-Order-Continuous-500M / README.md

Vincent221's picture

Add files using upload-large-folder tool

848ade9 verified about 1 month ago

|

history blame contribute delete

1.31 kB

	---
	tags:
	- continuous-llm
	- neural-ode
	- research
	language:
	- en
	- zh
	---

	# 3rd-Order Continuous LLM 500M

	A 500M parameter language model with 3rd-order continuous dynamics.

	Non-standard architecture. Custom inference runtime required.

	## Overview

	- Parameters: ~500M
	- Hidden size: 1024
	- Layers: 28
	- Attention: 16 query heads / 4 KV heads
	- MLP size: 4096
	- Vocabulary size: 151643
	- Tokenizer family: Qwen2.5 tokenizer vocabulary

	## Public Architecture Features

	- RoPE positional encoding
	- RMSNorm
	- Grouped Query Attention (16Q / 4KV)
	- SiLU MLP
	- bfloat16 weights

	## Usage

	This repository publishes weights only.

	It is not expected to run with standard Hugging Face `AutoModel` pipelines.

	The reason is that this model does not follow a standard separation between inference and training. It uses an endogenous control regime without the usual loss-driven runtime split. In short: inference is training. Conceptually this is closer to a TTT-like family of ideas than to a standard frozen LLM runtime, but the mechanism and goals here are different.

	At the moment, only these public details are released. If you are interested in higher-order ODE LLMs, request API access, or want to discuss custom runtime/code access, contact:

	- `2218038150@qq.com`
	- `a2218038150@gmail.com`