Instructions to use raydelossantos/OmniCoder-9B-GPTQ-Int4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Local Apps

How to use raydelossantos/OmniCoder-9B-GPTQ-Int4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "raydelossantos/OmniCoder-9B-GPTQ-Int4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raydelossantos/OmniCoder-9B-GPTQ-Int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/raydelossantos/OmniCoder-9B-GPTQ-Int4

SGLang

How to use raydelossantos/OmniCoder-9B-GPTQ-Int4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "raydelossantos/OmniCoder-9B-GPTQ-Int4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raydelossantos/OmniCoder-9B-GPTQ-Int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "raydelossantos/OmniCoder-9B-GPTQ-Int4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raydelossantos/OmniCoder-9B-GPTQ-Int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use raydelossantos/OmniCoder-9B-GPTQ-Int4 with Docker Model Runner:
```
docker model run hf.co/raydelossantos/OmniCoder-9B-GPTQ-Int4
```

OmniCoder-9B-GPTQ-Int4

File size: 1,152 Bytes

a9fc8f6

{
  "bits": 4,
  "dynamic": {
    "-:.*attn.*": {},
    "-:.*mtp.*": {},
    "-:.*visual.*": {},
    "lm_head": {},
    "model.language_model.embed_tokens": {}
  },
  "group_size": 128,
  "desc_act": false,
  "lm_head": false,
  "quant_method": "gptq",
  "checkpoint_format": "gptq",
  "pack_dtype": "int32",
  "meta": {
    "quantizer": [
      "gptqmodel:5.8.0"
    ],
    "uri": "https://github.com/modelcloud/gptqmodel",
    "damp_percent": 0.01,
    "damp_auto_increment": 0.01,
    "static_groups": false,
    "true_sequential": true,
    "mse": 0.0,
    "gptaq": null,
    "act_group_aware": true,
    "failsafe": {
      "strategy": "rtn",
      "threshold": "0.5%",
      "smooth": null
    },
    "offload_to_disk": true,
    "offload_to_disk_path": "./gptqmodel_offload/jqalracr-ibpdeuuz/",
    "pack_impl": "cpu",
    "mock_quantization": false,
    "gc_mode": "interval",
    "wait_for_submodule_finalizers": false,
    "auto_forward_data_parallel": true,
    "hessian": {
      "chunk_size": null,
      "chunk_bytes": null,
      "staging_dtype": "float32"
    },
    "vram_strategy": "exclusive"
  },
  "sym": true,
  "format": "gptq"
}