DiffuCoder-7B-Instruct OpenVINO INT8

This is the OpenVINO IR version of the apple/DiffuCoder-7B-Instruct model, optimized for Intel GPUs and CPUs. The model weights have been compressed to INT8 using NNCF for improved inference performance and reduced memory footprint.

DiffuCoder is a discrete diffusion model designed for code generation.

Usage

This model requires custom architecture files. When loading, you must use trust_remote_code=True.

Using with OpenVINO GenAI

Currently, standard openvino_genai pipelines might not fully support the custom "Dream" architecture natively without a custom denoising loop. For a complete implementation of the Discrete Diffusion loop (including optimizations like LocalLeap), refer to the custom server implementation.

Manual Inference (Python)

import openvino as ov
from transformers import AutoTokenizer, AutoConfig

model_path = "your_hf_username/DiffuCoder-7B-Instruct-ov-int8"

core = ov.Core()
ov_model = core.read_model(f"{model_path}/model.xml")
model = core.compile_model(ov_model, "GPU") # or "CPU"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)

# Note: Execution requires a discrete diffusion sampling loop.
# See the repository's diffusion_server.py for the full loop implementation.

Optimization Details

Quantization: NNCF Weight-Only Quantization (INT8_ASYM)
Target Hardware: Intel integrated GPUs (e.g., UHD 620) and CPUs.

Repository

For the complete server implementation and inference scripts designed specifically for Intel integrated graphics, please visit the main project repository: https://github.com/naranor/openvino-gpu-llm-server

Downloads last month: 15

Model tree for naranor/DiffuCoder-7B-Instruct-ov-int8

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

apple/DiffuCoder-7B-Base

Finetuned

apple/DiffuCoder-7B-Instruct

Finetuned

(3)

this model