nielsr HF Staff commited on
Commit
c55a197
·
verified ·
1 Parent(s): 4cf85f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -45
README.md CHANGED
@@ -14,48 +14,4 @@ library_name: transformers
14
 
15
  Used in [STree: Speculative Tree Decoding for Hybrid State-Space Models](https://arxiv.org/abs/2505.14969) as a draft model for speculative decoding for hybrid models.
16
 
17
- For more details on installation, training, and evaluation, please refer to the [GitHub repository](https://github.com/wyc1997/stree).
18
-
19
- ## Usage
20
-
21
- You can use `EaModel.from_pretrained` for accelerated text generation, similar to `generate` from Hugging Face Transformers. Here is an example:
22
-
23
- ```python
24
- import torch
25
- from eagle.model.ea_model import EaModel
26
- from fastchat.model import get_conversation_template
27
-
28
- # Load the base model and EAGLE acceleration model
29
- base_model_path = "JunxiongWang/Llama3.2-Mamba2-3B-distill" # Replace with your base model path
30
- EAGLE_model_path = "ycwu97/mamba2-distilled-small" # Replace with your EAGLE weights path
31
-
32
- model = EaModel.from_pretrained(
33
- base_model_path=base_model_path,
34
- ea_model_path=EAGLE_model_path,
35
- torch_dtype=torch.float16,
36
- low_cpu_mem_usage=True,
37
- device_map="auto",
38
- total_token=-1 # -1 for auto configuration of draft tokens
39
- )
40
- model.eval()
41
-
42
- # Prepare your message using a conversation template (e.g., Vicuna)
43
- your_message="Hello"
44
- conv = get_conversation_template("vicuna") # Use the correct chat template for your base model
45
- conv.append_message(conv.roles[0], your_message)
46
- conv.append_message(conv.roles[1], None)
47
- prompt = conv.get_prompt()
48
-
49
- # Tokenize the input prompt
50
- input_ids = model.tokenizer([prompt]).input_ids
51
- input_ids = torch.as_tensor(input_ids).cuda()
52
-
53
- # Generate output using EAGLE's accelerated decoding
54
- output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512)
55
-
56
- # Decode and print the generated text
57
- output = model.tokenizer.decode(output_ids[0])
58
- print(output)
59
- ```
60
-
61
- **Note:** For chat models like Vicuna, LLaMA2-Chat, and LLaMA3-Instruct, you must use the correct chat template to ensure proper model output and EAGLE's performance.
 
14
 
15
  Used in [STree: Speculative Tree Decoding for Hybrid State-Space Models](https://arxiv.org/abs/2505.14969) as a draft model for speculative decoding for hybrid models.
16
 
17
+ For more details on installation, training, and evaluation, please refer to the [GitHub repository](https://github.com/wyc1997/stree).