File size: 3,091 Bytes
5de29fd 83fdb3d 5de29fd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
library_name: peft
base_model: Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
license: apache-2.0
language:
- zho
- eng
- fra
- spa
- por
- deu
- ita
- rus
- jpn
- kor
- vie
- tha
- ara
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by: hack337**
- **Model type: qwen2**
- **Finetuned from model: Qwen/Qwen2.5-7B-Instruct**
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository: https://huggingface.co/Hack337/WavGPT-2**
- **Demo (WavGPT-1.0): https://huggingface.co/spaces/Hack337/WavGPT**
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"Hack337/WavGPT-2",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-2")
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "Вы очень полезный помощник."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
Use the code below to get started with the model using NPU.
```python
from transformers import AutoTokenizer, TextStreamer
from intel_npu_acceleration_library import NPUModelForCausalLM
import torch
# Load the NPU-optimized model without LoRA
model = NPUModelForCausalLM.from_pretrained(
"Hack337/WavGPT-2",
use_cache=True,
dtype=torch.float16 # Use float16 for the NPU
).eval()
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-2")
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)
# Prompt handling
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "Вы очень полезный помощник."},
{"role": "user", "content": prompt}
]
# Convert to a text format compatible with the model
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prefix = tokenizer([text], return_tensors="pt")["input_ids"].to("npu")
# Generation configuration
generation_kwargs = dict(
input_ids=prefix,
streamer=streamer,
do_sample=True,
top_k=50,
top_p=0.9,
max_new_tokens=512,
)
# Run inference on the NPU
print("Run inference")
_ = model.generate(**generation_kwargs)
```
- PEFT 0.11.1 |