RelaxingSnorlax commited on
Commit
217d835
·
verified ·
1 Parent(s): de55a01

Upload Llama4 Eagle3 dummy drafter for vLLM testing

Browse files
Files changed (2) hide show
  1. README.md +17 -47
  2. model.safetensors +2 -2
README.md CHANGED
@@ -1,57 +1,27 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - eagle3
5
- - speculative-decoding
6
- - llama4
7
- - vllm
8
- - testing
9
- ---
10
 
11
- # Llama4 Scout 17B Eagle3 Dummy Drafter
12
-
13
- This is a **dummy/test drafter model** for testing the Eagle3 speculative decoding implementation with Llama4 Scout 17B Instruct models in vLLM.
14
-
15
- ⚠️ **WARNING**: This is not a real model and should not be used for actual inference. It contains random weights and is only for testing purposes.
16
 
17
  ## Model Details
18
-
19
- - **Architecture**: Llama4ForCausalLM (Eagle3 drafter variant)
20
- - **Target Model**: Llama4 Scout 17B Instruct (specifically `RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16`)
21
- - **Base Model**: Based on the Instruct version of Llama4 17B Scout model
22
- - **Hidden Size**: 2048
23
- - **Layers**: 1 (single decoder layer as per Eagle3 design)
24
- - **Vocabulary**: 128256 tokens
25
- - **Parameters**: ~322M
26
 
27
  ## Configuration
28
-
29
- This drafter model is specifically designed for the Instruct version of Llama4 Scout 17B and uses:
30
- - Eagle3 speculative decoding architecture
31
- - Single-layer transformer with auxiliary hidden state combination
32
- - Llama4 layer structure with RoPE (Rotary Position Embedding)
33
- - SGLang-compatible weight naming (midlayer.*)
34
- - Vocabulary mappings (t2d/d2t) for draft-to-target token conversion
35
 
36
  ## Usage
 
37
 
38
- This model is designed specifically for testing the vLLM Eagle3 implementation:
39
-
40
- ```python
41
- # Use with vLLM for testing Eagle3 speculative decoding with Llama4 Scout
42
- vllm serve RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 \
43
- --speculative-config '{"method": "eagle3", "model": "nm-testing/llama4-scout-17b-eagle3-dummy-drafter", ...}'
44
  ```
45
 
46
- ## Testing Purpose Only
47
-
48
- This model:
49
- - Contains random weights
50
- - Is not trained on any data
51
- - Should not be used for actual inference
52
- - Is only for vLLM development and testing
53
-
54
- ## Related
55
-
56
- - vLLM: https://github.com/vllm-project/vllm
57
- - Eagle3: Speculative decoding method
 
1
+ # Llama4 Eagle Drafter Model (Test)
 
 
 
 
 
 
 
 
2
 
3
+ This is a test Eagle drafter model for Llama4 with proper configuration and vocabulary mappings.
 
 
 
 
4
 
5
  ## Model Details
6
+ - **Architecture**: Llama4ForCausalLM (Eagle draft variant)
7
+ - **Hidden size**: 2048
8
+ - **Layers**: 1 (single decoder layer for Eagle draft)
9
+ - **Vocabulary**: 128256 tokens (Llama4)
10
+ - **Includes**: d2t and t2d vocabulary mappings
 
 
 
11
 
12
  ## Configuration
13
+ - Uses standard Llama4 architecture
14
+ - Includes Eagle auxiliary state configuration
15
+ - Has vocabulary mapping tensors (d2t/t2d) for draft-to-target conversion
16
+ - Extended context support (262k max position embeddings)
 
 
 
17
 
18
  ## Usage
19
+ This model is for testing Eagle speculative decoding with Llama4 in vLLM:
20
 
21
+ ```bash
22
+ vllm serve <llama4-target-model> \
23
+ --speculative-config '{"method": "eagle", "model": "nm-testing/llama4-eagle-drafter", ...}'
 
 
 
24
  ```
25
 
26
+ ## Testing Purpose
27
+ This model contains random weights and is only for vLLM Eagle implementation testing.
 
 
 
 
 
 
 
 
 
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:17835edd20ba68f4c014cdb8db66ecd669f5d340e8a3f4da637c545ba711275d
3
- size 1172276696
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d2b768b329308c0ff2682f465d294d308de9c9c183795dbf4139c96e3ed7553
3
+ size 1172276704