nm-testing
/

llama4-scout-17b-eagle3-dummy-drafter

@@ -1,57 +1,27 @@
----
-license: apache-2.0
-tags:
-- eagle3
-- speculative-decoding
-- llama4
-- vllm
-- testing
----
-# Llama4 Scout 17B Eagle3 Dummy Drafter
-This is a **dummy/test drafter model** for testing the Eagle3 speculative decoding implementation with Llama4 Scout 17B Instruct models in vLLM.
-⚠️ **WARNING**: This is not a real model and should not be used for actual inference. It contains random weights and is only for testing purposes.
 ## Model Details
-- **Architecture**: Llama4ForCausalLM (Eagle3 drafter variant)
-- **Target Model**: Llama4 Scout 17B Instruct (specifically `RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16`)
-- **Base Model**: Based on the Instruct version of Llama4 17B Scout model
-- **Hidden Size**: 2048
-- **Layers**: 1 (single decoder layer as per Eagle3 design)
-- **Vocabulary**: 128256 tokens
-- **Parameters**: ~322M
 ## Configuration
-This drafter model is specifically designed for the Instruct version of Llama4 Scout 17B and uses:
-- Eagle3 speculative decoding architecture
-- Single-layer transformer with auxiliary hidden state combination
-- Llama4 layer structure with RoPE (Rotary Position Embedding)
-- SGLang-compatible weight naming (midlayer.*)
-- Vocabulary mappings (t2d/d2t) for draft-to-target token conversion
 ## Usage
-This model is designed specifically for testing the vLLM Eagle3 implementation:
-```python
-# Use with vLLM for testing Eagle3 speculative decoding with Llama4 Scout
-vllm serve RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 \
-    --speculative-config '{"method": "eagle3", "model": "nm-testing/llama4-scout-17b-eagle3-dummy-drafter", ...}'
 ```
-## Testing Purpose Only
-This model:
-- Contains random weights
-- Is not trained on any data
-- Should not be used for actual inference
-- Is only for vLLM development and testing
-## Related
-- vLLM: https://github.com/vllm-project/vllm
-- Eagle3: Speculative decoding method

+# Llama4 Eagle Drafter Model (Test)
+This is a test Eagle drafter model for Llama4 with proper configuration and vocabulary mappings.
 ## Model Details
+- **Architecture**: Llama4ForCausalLM (Eagle draft variant)
+- **Hidden size**: 2048
+- **Layers**: 1 (single decoder layer for Eagle draft)
+- **Vocabulary**: 128256 tokens (Llama4)
+- **Includes**: d2t and t2d vocabulary mappings
 ## Configuration
+- Uses standard Llama4 architecture
+- Includes Eagle auxiliary state configuration
+- Has vocabulary mapping tensors (d2t/t2d) for draft-to-target conversion
+- Extended context support (262k max position embeddings)
 ## Usage
+This model is for testing Eagle speculative decoding with Llama4 in vLLM:
+```bash
+vllm serve <llama4-target-model> \
+    --speculative-config '{"method": "eagle", "model": "nm-testing/llama4-eagle-drafter", ...}'
 ```
+## Testing Purpose
+This model contains random weights and is only for vLLM Eagle implementation testing.

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:17835edd20ba68f4c014cdb8db66ecd669f5d340e8a3f4da637c545ba711275d
-size 1172276696

 version https://git-lfs.github.com/spec/v1
+oid sha256:0d2b768b329308c0ff2682f465d294d308de9c9c183795dbf4139c96e3ed7553
+size 1172276704