tencent
/

Sequential-Hidden-Decoding-8B-n4

qwen3_scale_seq

sequential-hidden-decoding

Model card Files Files and versions

Sequential-Hidden-Decoding-8B-n4 / README.md

exlaw's picture

Upload README.md with huggingface_hub

aea2ab6 verified 4 days ago

|

history blame contribute delete

3.59 kB

	---
	license: other
	license_name: sequential-hidden-decoding
	license_link: LICENSE
	base_model:
	- Qwen/Qwen3-8B-Base
	tags:
	- sequential-hidden-decoding
	- pretrained
	- base-model
	---

	# Sequential-Hidden-Decoding-8B-n4

	This is the n=4 variant of Sequential Hidden Decoding, a method that scales sequence length by n× with only additional Embedding parameters — same Transformer, more compute per token.

	- Base model: [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base)
	- Scale: 4×
	- Additional Embedding Params: 3.1B
	- Training Tokens: 150B
	- Dtype: bfloat16

	> Note: This is a base model (not instruction-tuned). It is intended for benchmarking, text completion, and as a foundation for downstream fine-tuning (SFT / RLHF). For conversational or instruction-following use cases, please fine-tune on your own data.

	## Key Idea

	Prepare n independent Embedding matrices to encode the same token sequence n times, interleave the results, and feed the n×-length sequence into the same Transformer. Only the last embedding of each token computes the next-token loss, while the preceding embeddings serve as implicit reasoning steps in a continuous latent space.

	## Results

	\| Benchmark \| # Shots \| 8B Baseline \| 8B scale n=2 \| 8B scale n=4 \| 8B scale n=8 \|
	\|-----------\|:-------:\|:-----------:\|:------------:\|:------------:\|:------------:\|
	\| BBH (EM) \| 3-shot \| 78.8 \| 81.3 \| 83.0 \| 83.9 \|
	\| MMLU (EM) \| 5-shot \| 79.8 \| 80.9 \| 81.9 \| 82.2 \|
	\| MBPP+ (Pass@1) \| 1-shot \| 66.7 \| 69.4 \| 68.7 \| 69.4 \|
	\| MATH (LLM-judge) \| 4-shot \| 56.0 \| 58.2 \| 60.0 \| 61.1 \|
	\| ARC-C \| 25-shot \| 93.9 \| 94.3 \| 94.4 \| 94.7 \|
	\| Hellaswag \| 10-shot \| 79.7 \| 83.1 \| 85.0 \| 85.3 \|
	\| GSM8K \| 4-shot \| 92.5 \| 93.3 \| 93.9 \| 94.6 \|

	## Serving (SGLang)

	This model requires a patched version of [SGLang](https://github.com/sgl-project/sglang) for inference. See the [project page](https://github.com/Tencent/Sequential-Hidden-Decoding) for installation options (Docker image, forked repo, or manual patch).

	```bash
	python -m sglang.launch_server \
	--model-path tencent/Sequential-Hidden-Decoding-8B-n4 \
	--trust-remote-code \
	--tp-size 1 \
	--port 30000 --host 0.0.0.0 \
	--chunked-prefill-size -1 \
	--attention-backend fa3 \
	--mem-fraction-static 0.82 \
	--max-running-requests 32 \
	--context-length 131072 \
	--cuda-graph-max-bs 128 \
	--cuda-graph-bs 1 2 4 8 16 32 64 128
	```

	```python
	from openai import OpenAI

	client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")
	response = client.completions.create(
	model="tencent/Sequential-Hidden-Decoding-8B-n4",
	prompt="The meaning of life is",
	max_tokens=128,
	temperature=0,
	)
	print(response.choices[0].text)
	```

	## All Models

	\| Model \| Scale \| Embedding Params \| Training Tokens \|
	\|-------\|:-----:\|:----------------:\|:---------------:\|
	\| [Sequential-Hidden-Decoding-8B-n2](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n2) \| 2× \| 1.9B \| 75B \|
	\| [Sequential-Hidden-Decoding-8B-n4](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n4) \| 4× \| 3.1B \| 150B \|
	\| [Sequential-Hidden-Decoding-8B-n8](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n8) \| 8× \| 5.6B \| 187B \|

	## Citation

	```bibtex
	@article{hidden_decoding_2026,
	title = {Hidden Decoding: Scaling Sequence Length in Pretraining},
	year = {2026},
	url = {https://welm.weixin.qq.com/posts/hidden_decoding/}
	}
	```

	## License

	This model is released under the [License Terms of Sequential-Hidden-Decoding](LICENSE).