File size: 3,593 Bytes
c74b4fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7201add
c74b4fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7201add
c74b4fa
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
license: other
license_name: sequential-hidden-decoding
license_link: LICENSE
base_model:
- Qwen/Qwen3-8B-Base
tags:
- sequential-hidden-decoding
- pretrained
- base-model
---

# Sequential-Hidden-Decoding-8B-n2

This is the **n=2** variant of Sequential Hidden Decoding, a method that scales sequence length by n× with only additional Embedding parameters — same Transformer, more compute per token.

- **Base model:** [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base)
- **Scale:**- **Additional Embedding Params:** 1.9B
- **Training Tokens:** 75B
- **Dtype:** bfloat16

> **Note:** This is a **base model** (not instruction-tuned). It is intended for benchmarking, text completion, and as a foundation for downstream fine-tuning (SFT / RLHF). For conversational or instruction-following use cases, please fine-tune on your own data.

## Key Idea

Prepare *n* independent Embedding matrices to encode the same token sequence *n* times, interleave the results, and feed the *n*×-length sequence into the same Transformer. Only the last embedding of each token computes the next-token loss, while the preceding embeddings serve as implicit reasoning steps in a continuous latent space.

## Results

| Benchmark | # Shots | 8B Baseline | 8B scale n=2 | 8B scale n=4 | 8B scale n=8 |
|-----------|:-------:|:-----------:|:------------:|:------------:|:------------:|
| BBH (EM) | 3-shot | 78.8 | **81.3** | 83.0 | 83.9 |
| MMLU (EM) | 5-shot | 79.8 | **80.9** | 81.9 | 82.2 |
| MBPP+ (Pass@1) | 1-shot | 66.7 | **69.4** | 68.7 | 69.4 |
| MATH (LLM-judge) | 4-shot | 56.0 | **58.2** | 60.0 | 61.1 |
| ARC-C | 25-shot | 93.9 | **94.3** | 94.4 | 94.7 |
| Hellaswag | 10-shot | 79.7 | **83.1** | 85.0 | 85.3 |
| GSM8K | 4-shot | 92.5 | **93.3** | 93.9 | 94.6 |

## Serving (SGLang)

This model requires a patched version of [SGLang](https://github.com/sgl-project/sglang) for inference. See the [project page](https://github.com/Tencent/Sequential-Hidden-Decoding) for installation options (Docker image, forked repo, or manual patch).

```bash
python -m sglang.launch_server \
    --model-path tencent/Sequential-Hidden-Decoding-8B-n2 \
    --trust-remote-code \
    --tp-size 1 \
    --port 30000 --host 0.0.0.0 \
    --chunked-prefill-size -1 \
    --attention-backend fa3 \
    --mem-fraction-static 0.82 \
    --max-running-requests 32 \
    --context-length 131072 \
    --cuda-graph-max-bs 128 \
    --cuda-graph-bs 1 2 4 8 16 32 64 128
```

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")
response = client.completions.create(
    model="tencent/Sequential-Hidden-Decoding-8B-n2",
    prompt="The meaning of life is",
    max_tokens=128,
    temperature=0,
)
print(response.choices[0].text)
```

## All Models

| Model | Scale | Embedding Params | Training Tokens |
|-------|:-----:|:----------------:|:---------------:|
| [Sequential-Hidden-Decoding-8B-n2](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n2) | 2× | 1.9B | 75B |
| [Sequential-Hidden-Decoding-8B-n4](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n4) | 4× | 3.1B | 150B |
| [Sequential-Hidden-Decoding-8B-n8](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n8) | 8× | 5.6B | 187B |

## Citation

```bibtex
@article{hidden_decoding_2026,
  title   = {Hidden Decoding: Scaling Sequence Length in Pretraining},
  year    = {2026},
  url     = {https://welm.weixin.qq.com/posts/hidden_decoding/}
}
```

## License

This model is released under the [License Terms of Sequential-Hidden-Decoding](LICENSE).