| --- |
| library_name: openvino |
| pipeline_tag: text-generation |
| tags: |
| - openvino |
| - int8 |
| - nncf |
| - code-generation |
| - diffusion |
| - diffucoder |
| base_model: apple/DiffuCoder-7B-Instruct |
| --- |
| |
| # DiffuCoder-7B-Instruct OpenVINO INT8 |
|
|
| This is the OpenVINO IR version of the [apple/DiffuCoder-7B-Instruct](https://huggingface.co/apple/DiffuCoder-7B-Instruct) model, optimized for Intel GPUs and CPUs. |
| The model weights have been compressed to **INT8** using [NNCF](https://github.com/openvinotoolkit/nncf) for improved inference performance and reduced memory footprint. |
|
|
| DiffuCoder is a discrete diffusion model designed for code generation. |
|
|
| ## Usage |
|
|
| This model requires custom architecture files. When loading, you must use `trust_remote_code=True`. |
|
|
| ### Using with OpenVINO GenAI |
|
|
| Currently, standard `openvino_genai` pipelines might not fully support the custom "Dream" architecture natively without a custom denoising loop. |
| For a complete implementation of the Discrete Diffusion loop (including optimizations like LocalLeap), refer to the custom server implementation. |
|
|
| ### Manual Inference (Python) |
|
|
| ```python |
| import openvino as ov |
| from transformers import AutoTokenizer, AutoConfig |
| |
| model_path = "your_hf_username/DiffuCoder-7B-Instruct-ov-int8" |
| |
| core = ov.Core() |
| ov_model = core.read_model(f"{model_path}/model.xml") |
| model = core.compile_model(ov_model, "GPU") # or "CPU" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
| config = AutoConfig.from_pretrained(model_path, trust_remote_code=True) |
| |
| # Note: Execution requires a discrete diffusion sampling loop. |
| # See the repository's diffusion_server.py for the full loop implementation. |
| ``` |
|
|
| ## Optimization Details |
| - **Quantization:** NNCF Weight-Only Quantization (INT8_ASYM) |
| - **Target Hardware:** Intel integrated GPUs (e.g., UHD 620) and CPUs. |
| |
| ## Repository |
| For the complete server implementation and inference scripts designed specifically for Intel integrated graphics, please visit the main project repository: |
| [https://github.com/naranor/openvino-gpu-llm-server](https://github.com/naranor/openvino-gpu-llm-server) |
| |