Description
This model was converted to OpenVINO from Qwen/Qwen2.5-Coder-7B-Instruct using optimum-intel
via the export space.
Quantization Parameters
Weight compression was performed in INT8_ASYM mode.
For more information on quantization, check the OpenVINO model optimization guide.
Running Model Inference with Optimum Intel
First make sure you have optimum-intel installed:
pip install optimum[openvino]
To run the model with optimum-intel:
from optimum.intel.openvino import OVModelForCausalLM
from transformers import AutoTokenizer
model_dir = "MaliJa/Qwen2.5-Coder-7B-Instruct-openvino"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = OVModelForCausalLM.from_pretrained(
model_dir,
device="GPU",
)
prompt = "What is OpenVINO?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 67