File size: 3,222 Bytes

---

library_name: peft
base_model: Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
license: apache-2.0
language:
- zho
- eng
- fra
- spa
- por
- deu
- ita
- rus
- jpn
- kor
- vie
- tha
- ara
---


# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by: hack337**
- **Model type: qwen2**
- **Finetuned from model: Qwen/Qwen2.5-7B-Instruct**

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository: https://huggingface.co/Hack337/WavGPT-2**
- **Demo (WavGPT-1.0): https://huggingface.co/spaces/Hack337/WavGPT**

## How to Get Started with the Model

Use the code below to get started with the model.

```python

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto



model = AutoModelForCausalLM.from_pretrained(

    "Hack337/WavGPT-2",

    torch_dtype="auto",

    device_map="auto"

)

tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-2")



prompt = "Give me a short introduction to large language model."

messages = [

    {"role": "system", "content": "Вы очень полезный помощник."},

    {"role": "user", "content": prompt}

]

text = tokenizer.apply_chat_template(

    messages,

    tokenize=False,

    add_generation_prompt=True

)

model_inputs = tokenizer([text], return_tensors="pt").to(device)



generated_ids = model.generate(

    model_inputs.input_ids,

    max_new_tokens=512

)

generated_ids = [

    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)

]



response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]



```

Use the code below to get started with the model using NPU.

```python

from transformers import AutoTokenizer, TextStreamer

from intel_npu_acceleration_library import NPUModelForCausalLM

import torch



# Load the NPU-optimized model without LoRA

model = NPUModelForCausalLM.from_pretrained(

    "Hack337/WavGPT-2",

    use_cache=True,

    dtype=torch.float16  # Use float16 for the NPU

).eval()



# Load the tokenizer

tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-2")

tokenizer.pad_token_id = tokenizer.eos_token_id

streamer = TextStreamer(tokenizer, skip_special_tokens=True)



# Prompt handling

prompt = "Give me a short introduction to large language model."

messages = [

    {"role": "system", "content": "Вы очень полезный помощник."},

    {"role": "user", "content": prompt}

]



# Convert to a text format compatible with the model

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

prefix = tokenizer([text], return_tensors="pt")["input_ids"].to("npu")



# Generation configuration

generation_kwargs = dict(

    input_ids=prefix,

    streamer=streamer,

    do_sample=True,

    top_k=50,

    top_p=0.9,

    max_new_tokens=512,

)



# Run inference on the NPU

print("Run inference")

_ = model.generate(**generation_kwargs)



```

- PEFT 0.11.1