naranor
/

DiffuCoder-7B-Instruct-ov-int8

Text Generation

code-generation

Model card Files Files and versions

DiffuCoder-7B-Instruct-ov-int8 / README.md

naranor's picture

Add files using upload-large-folder tool

815222b verified 13 days ago

|

History Blame Contribute Delete

2.1 kB

	---
	library_name: openvino
	pipeline_tag: text-generation
	tags:
	- openvino
	- int8
	- nncf
	- code-generation
	- diffusion
	- diffucoder
	base_model: apple/DiffuCoder-7B-Instruct
	---

	# DiffuCoder-7B-Instruct OpenVINO INT8

	This is the OpenVINO IR version of the [apple/DiffuCoder-7B-Instruct](https://huggingface.co/apple/DiffuCoder-7B-Instruct) model, optimized for Intel GPUs and CPUs.
	The model weights have been compressed to INT8 using [NNCF](https://github.com/openvinotoolkit/nncf) for improved inference performance and reduced memory footprint.

	DiffuCoder is a discrete diffusion model designed for code generation.

	## Usage

	This model requires custom architecture files. When loading, you must use `trust_remote_code=True`.

	### Using with OpenVINO GenAI

	Currently, standard `openvino_genai` pipelines might not fully support the custom "Dream" architecture natively without a custom denoising loop.
	For a complete implementation of the Discrete Diffusion loop (including optimizations like LocalLeap), refer to the custom server implementation.

	### Manual Inference (Python)

	```python
	import openvino as ov
	from transformers import AutoTokenizer, AutoConfig

	model_path = "your_hf_username/DiffuCoder-7B-Instruct-ov-int8"

	core = ov.Core()
	ov_model = core.read_model(f"{model_path}/model.xml")
	model = core.compile_model(ov_model, "GPU") # or "CPU"

	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
	config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)

	# Note: Execution requires a discrete diffusion sampling loop.
	# See the repository's diffusion_server.py for the full loop implementation.
	```

	## Optimization Details
	- Quantization: NNCF Weight-Only Quantization (INT8_ASYM)
	- Target Hardware: Intel integrated GPUs (e.g., UHD 620) and CPUs.

	## Repository
	For the complete server implementation and inference scripts designed specifically for Intel integrated graphics, please visit the main project repository:
	[https://github.com/naranor/openvino-gpu-llm-server](https://github.com/naranor/openvino-gpu-llm-server)