Instructions to use raydelossantos/OmniCoder-9B-GPTQ-Int4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Local Apps

How to use raydelossantos/OmniCoder-9B-GPTQ-Int4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "raydelossantos/OmniCoder-9B-GPTQ-Int4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raydelossantos/OmniCoder-9B-GPTQ-Int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/raydelossantos/OmniCoder-9B-GPTQ-Int4

SGLang

How to use raydelossantos/OmniCoder-9B-GPTQ-Int4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "raydelossantos/OmniCoder-9B-GPTQ-Int4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raydelossantos/OmniCoder-9B-GPTQ-Int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "raydelossantos/OmniCoder-9B-GPTQ-Int4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raydelossantos/OmniCoder-9B-GPTQ-Int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use raydelossantos/OmniCoder-9B-GPTQ-Int4 with Docker Model Runner:
```
docker model run hf.co/raydelossantos/OmniCoder-9B-GPTQ-Int4
```

OmniCoder-9B-GPTQ-Int4 / quantize_config.json

raydelossantos

Upload folder using huggingface_hub

a9fc8f6 verified 2 months ago

raw

history blame contribute delete

1.15 kB

	{
	"bits": 4,
	"dynamic": {
	"-:.attn.": {},
	"-:.mtp.": {},
	"-:.visual.": {},
	"lm_head": {},
	"model.language_model.embed_tokens": {}
	},
	"group_size": 128,
	"desc_act": false,
	"lm_head": false,
	"quant_method": "gptq",
	"checkpoint_format": "gptq",
	"pack_dtype": "int32",
	"meta": {
	"quantizer": [
	"gptqmodel:5.8.0"
	],
	"uri": "https://github.com/modelcloud/gptqmodel",
	"damp_percent": 0.01,
	"damp_auto_increment": 0.01,
	"static_groups": false,
	"true_sequential": true,
	"mse": 0.0,
	"gptaq": null,
	"act_group_aware": true,
	"failsafe": {
	"strategy": "rtn",
	"threshold": "0.5%",
	"smooth": null
	},
	"offload_to_disk": true,
	"offload_to_disk_path": "./gptqmodel_offload/jqalracr-ibpdeuuz/",
	"pack_impl": "cpu",
	"mock_quantization": false,
	"gc_mode": "interval",
	"wait_for_submodule_finalizers": false,
	"auto_forward_data_parallel": true,
	"hessian": {
	"chunk_size": null,
	"chunk_bytes": null,
	"staging_dtype": "float32"
	},
	"vram_strategy": "exclusive"
	},
	"sym": true,
	"format": "gptq"
	}