Lin-K76 commited on
Commit
0035cbf
·
verified ·
1 Parent(s): f1b0386

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -45
README.md CHANGED
@@ -63,51 +63,6 @@ print(generated_text)
63
 
64
  vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
65
 
66
- ### Use with transformers
67
-
68
- This model is supported by Transformers leveraging the integration with the [AutoFP8](https://github.com/neuralmagic/AutoFP8) data format.
69
- The following example contemplates how the model can be used using the `generate()` function.
70
-
71
- ```python
72
- from transformers import AutoTokenizer, AutoModelForCausalLM
73
-
74
- model_id = "neuralmagic/Meta-Llama-3-8B-Instruct-FP8"
75
-
76
- tokenizer = AutoTokenizer.from_pretrained(model_id)
77
- model = AutoModelForCausalLM.from_pretrained(
78
- model_id,
79
- torch_dtype="auto",
80
- device_map="auto",
81
- )
82
-
83
- messages = [
84
- {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
85
- {"role": "user", "content": "Who are you?"},
86
- ]
87
-
88
- input_ids = tokenizer.apply_chat_template(
89
- messages,
90
- add_generation_prompt=True,
91
- return_tensors="pt"
92
- ).to(model.device)
93
-
94
- terminators = [
95
- tokenizer.eos_token_id,
96
- tokenizer.convert_tokens_to_ids("<|eot_id|>")
97
- ]
98
-
99
- outputs = model.generate(
100
- input_ids,
101
- max_new_tokens=256,
102
- eos_token_id=terminators,
103
- do_sample=True,
104
- temperature=0.6,
105
- top_p=0.9,
106
- )
107
- response = outputs[0][input_ids.shape[-1]:]
108
- print(tokenizer.decode(response, skip_special_tokens=True))
109
- ```
110
-
111
  ## Creation
112
 
113
  This model was created by applying [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py), as presented in the code snipet below.
 
63
 
64
  vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ## Creation
67
 
68
  This model was created by applying [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py), as presented in the code snipet below.