exlaw commited on
Commit
fe1e9b5
·
verified ·
1 Parent(s): 3a1f14c

Remove accidentally uploaded README.m

Browse files
Files changed (1) hide show
  1. README.m +0 -93
README.m DELETED
@@ -1,93 +0,0 @@
1
- ---
2
- license: other
3
- license_name: sequential-hidden-decoding
4
- license_link: LICENSE
5
- base_model:
6
- - Qwen/Qwen3-8B-Base
7
- tags:
8
- - sequential-hidden-decoding
9
- - pretrained
10
- - base-model
11
- ---
12
-
13
- # Sequential-Hidden-Decoding-8B-n8
14
-
15
- This is the **n=8** variant of Sequential Hidden Decoding, a method that scales sequence length by n× with only additional Embedding parameters — same Transformer, more compute per token.
16
-
17
- - **Base model:** [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base)
18
- - **Scale:** 8×
19
- - **Additional Embedding Params:** 5.6B
20
- - **Training Tokens:** 187B
21
- - **Dtype:** bfloat16
22
-
23
- > **Note:** This is a **base model** (not instruction-tuned). It is intended for benchmarking, text completion, and as a foundation for downstream fine-tuning (SFT / RLHF). For conversational or instruction-following use cases, please fine-tune on your own data.
24
-
25
- ## Key Idea
26
-
27
- Prepare *n* independent Embedding matrices to encode the same token sequence *n* times, interleave the results, and feed the *n*×-length sequence into the same Transformer. Only the last embedding of each token computes the next-token loss, while the preceding embeddings serve as implicit reasoning steps in a continuous latent space.
28
-
29
- ## Results
30
-
31
- | Benchmark | # Shots | 8B Baseline | 8B scale n=2 | 8B scale n=4 | 8B scale n=8 |
32
- |-----------|:-------:|:-----------:|:------------:|:------------:|:------------:|
33
- | BBH (EM) | 3-shot | 78.8 | 81.3 | 83.0 | **83.9** |
34
- | MMLU (EM) | 5-shot | 79.8 | 80.9 | 81.9 | **82.2** |
35
- | MBPP+ (Pass@1) | 1-shot | 66.7 | 69.4 | 68.7 | **69.4** |
36
- | MATH (LLM-judge) | 4-shot | 56.0 | 58.2 | 60.0 | **61.1** |
37
- | ARC-C | 25-shot | 93.9 | 94.3 | 94.4 | **94.7** |
38
- | Hellaswag | 10-shot | 79.7 | 83.1 | 85.0 | **85.3** |
39
- | GSM8K | 4-shot | 92.5 | 93.3 | 93.9 | **94.6** |
40
-
41
- ## Serving (SGLang)
42
-
43
- This model requires a patched version of [SGLang](https://github.com/sgl-project/sglang) for inference. See the [project page](https://github.com/Tencent/Sequential-Hidden-Decoding) for installation options (Docker image, forked repo, or manual patch).
44
-
45
- ```bash
46
- python -m sglang.launch_server \
47
- --model-path tencent/Sequential-Hidden-Decoding-8B-n8 \
48
- --trust-remote-code \
49
- --tp-size 1 \
50
- --port 30000 --host 0.0.0.0 \
51
- --chunked-prefill-size -1 \
52
- --attention-backend fa3 \
53
- --mem-fraction-static 0.82 \
54
- --max-running-requests 32 \
55
- --context-length 131072 \
56
- --cuda-graph-max-bs 128 \
57
- --cuda-graph-bs 1 2 4 8 16 32 64 128
58
- ```
59
-
60
- ```python
61
- from openai import OpenAI
62
-
63
- client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")
64
- response = client.completions.create(
65
- model="tencent/Sequential-Hidden-Decoding-8B-n8",
66
- prompt="The meaning of life is",
67
- max_tokens=128,
68
- temperature=0,
69
- )
70
- print(response.choices[0].text)
71
- ```
72
-
73
- ## All Models
74
-
75
- | Model | Scale | Embedding Params | Training Tokens |
76
- |-------|:-----:|:----------------:|:---------------:|
77
- | [Sequential-Hidden-Decoding-8B-n2](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n2) | 2× | 1.9B | 75B |
78
- | [Sequential-Hidden-Decoding-8B-n4](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n4) | 4× | 3.1B | 150B |
79
- | [Sequential-Hidden-Decoding-8B-n8](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n8) | 8× | 5.6B | 187B |
80
-
81
- ## Citation
82
-
83
- ```bibtex
84
- @article{hidden_decoding_2026,
85
- title = {Hidden Decoding: Scaling Sequence Length in Pretraining},
86
- year = {2026},
87
- url = {https://welm.weixin.qq.com/posts/hidden_decoding/}
88
- }
89
- ```
90
-
91
- ## License
92
-
93
- This model is released under the [License Terms of Sequential-Hidden-Decoding](LICENSE).