jisenli commited on
Commit
ae73168
·
verified ·
1 Parent(s): 9248811

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +10 -8
README.md CHANGED
@@ -73,6 +73,7 @@ This model was trained **from scratch** using Aurora, an inference-time training
73
 
74
  ### Training Configuration
75
 
 
76
  - **Training Steps**: 10,000 steps over 80,000 inference requests
77
  - **Learning Rate**: 1e-4
78
  - **TTT Length**: 5 tokens
@@ -91,12 +92,14 @@ Trained on the [OnlineSD Code Dataset](https://huggingface.co/datasets/zelc/onli
91
 
92
  ### Speculative Decoding Performance
93
 
94
- | Metric | Baseline | This Model | Improvement |
95
- |--------|----------|------------|-------------|
96
  | **Average Accept Length** | - | 3.1 | - |
97
  | **Throughput** | ~50 tokens/s | ~150 tokens/s | ~3x |
98
  | **Training Steps** | - | 10,000 (80k requests) | - |
99
 
 
 
100
  The model achieves consistent **3.1x average speculative accept length**, meaning on average 3.1 draft tokens are accepted per verification step, significantly reducing the number of target model forward passes required.
101
 
102
  **Notably**, this performance is achieved with a model trained **from scratch** - it learns entirely through Aurora's online training process, demonstrating the effectiveness of inference-time training without expensive pre-training.
@@ -107,14 +110,16 @@ This model is designed to be used as a draft model in EAGLE3 speculative decodin
107
 
108
  ### Example with SGLang
109
 
 
 
110
  ```python
111
  from sglang import Engine
112
 
113
  # Initialize engine with speculative decoding
114
  engine = Engine(
115
  model_path="Qwen/Qwen3-Coder-Next-FP8",
116
- draft_model_path="togethercomputer/Qwen3-Coder-Next-FP8-EAGLE3",
117
- speculative_algorithm="eagle",
118
  speculative_num_steps=5,
119
  )
120
 
@@ -134,17 +139,14 @@ output = engine.generate(
134
 
135
  ## Citation
136
 
137
- **Note: This is a placeholder citation. The paper is currently under review.**
138
-
139
  If you use this model, please cite:
140
 
141
  ```bibtex
142
  @article{aurora2026,
143
  title={When RL Meets Adaptive Speculative Training: A Unified Training-Serving System},
144
  author={Wang, Junxiong and Bie, Fengxiang and Li, Jisen and Zhou, Zhongzhu and Shao, Zelei and Wang, Yubo and Liu, Yinghui and Wu, Qingyang and May, Avner and Zhang, Yineng and Song, Shuaiwen Leon and Zhang, Ce and Athiwaratkun, Ben and Xu, Chenfeng and Wu, Xiaoxia},
145
- journal={International Conference on Machine Learning (ICML)},
146
  year={2026},
147
- note={Under review},
148
  url={https://aurora-spec.github.io}
149
  }
150
  ```
 
73
 
74
  ### Training Configuration
75
 
76
+ - **Hardware**: NVIDIA H200 GPU
77
  - **Training Steps**: 10,000 steps over 80,000 inference requests
78
  - **Learning Rate**: 1e-4
79
  - **TTT Length**: 5 tokens
 
92
 
93
  ### Speculative Decoding Performance
94
 
95
+ | Metric | Baseline (No Speculator) | This Model | Improvement |
96
+ |--------|--------------------------|------------|-------------|
97
  | **Average Accept Length** | - | 3.1 | - |
98
  | **Throughput** | ~50 tokens/s | ~150 tokens/s | ~3x |
99
  | **Training Steps** | - | 10,000 (80k requests) | - |
100
 
101
+ **Baseline**: Target model without speculative decoding (no draft model).
102
+
103
  The model achieves consistent **3.1x average speculative accept length**, meaning on average 3.1 draft tokens are accepted per verification step, significantly reducing the number of target model forward passes required.
104
 
105
  **Notably**, this performance is achieved with a model trained **from scratch** - it learns entirely through Aurora's online training process, demonstrating the effectiveness of inference-time training without expensive pre-training.
 
110
 
111
  ### Example with SGLang
112
 
113
+ **Note: This is a placeholder example. TODO - Verify and update with tested code.**
114
+
115
  ```python
116
  from sglang import Engine
117
 
118
  # Initialize engine with speculative decoding
119
  engine = Engine(
120
  model_path="Qwen/Qwen3-Coder-Next-FP8",
121
+ speculative_draft_model_path="togethercomputer/Qwen3-Coder-Next-FP8-EAGLE3",
122
+ speculative_algorithm="EAGLE",
123
  speculative_num_steps=5,
124
  )
125
 
 
139
 
140
  ## Citation
141
 
 
 
142
  If you use this model, please cite:
143
 
144
  ```bibtex
145
  @article{aurora2026,
146
  title={When RL Meets Adaptive Speculative Training: A Unified Training-Serving System},
147
  author={Wang, Junxiong and Bie, Fengxiang and Li, Jisen and Zhou, Zhongzhu and Shao, Zelei and Wang, Yubo and Liu, Yinghui and Wu, Qingyang and May, Avner and Zhang, Yineng and Song, Shuaiwen Leon and Zhang, Ce and Athiwaratkun, Ben and Xu, Chenfeng and Wu, Xiaoxia},
148
+ journal={Preprint, forthcoming},
149
  year={2026},
 
150
  url={https://aurora-spec.github.io}
151
  }
152
  ```