Instructions to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8")
model = AutoModelForMultimodalLM.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8

SGLang

How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Docker Model Runner:
```
docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8
```

Yash1005 commited on May 24

Commit

d3172ec

verified ·

1 Parent(s): a77aa5c

docs: add model card with eval metrics on held-out test set

Browse files

Files changed (1) hide show

README.md +40 -37

README.md CHANGED Viewed

@@ -73,18 +73,7 @@ Rules:
   - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
   - When multiple languages appear, list every distinct one (still only true).
 Allowed language keys (use these exact spellings):
-  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
-Examples:
-Input: What's the weather forecast today?
-Output: {"is_valid": false, "category": {}}
-Input: Run this for me: print('hello world')
-Output: {"is_valid": true, "category": {"Python": true}}
-Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)
-Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}"""
 llm = LLM(
     model=MODEL,
@@ -120,18 +109,7 @@ Rules:
   - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
   - When multiple languages appear, list every distinct one (still only true).
 Allowed language keys (use these exact spellings):
-  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
-Examples:
-Input: What's the weather forecast today?
-Output: {"is_valid": false, "category": {}}
-Input: Run this for me: print('hello world')
-Output: {"is_valid": true, "category": {"Python": true}}
-Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)
-Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}"""
 tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(
@@ -161,22 +139,11 @@ Rules:
   - When multiple languages appear, list every distinct one (still only true).
 Allowed language keys (use these exact spellings):
   Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
-Examples:
-Input: What's the weather forecast today?
-Output: {"is_valid": false, "category": {}}
-Input: Run this for me: print('hello world')
-Output: {"is_valid": true, "category": {"Python": true}}
-Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)
-Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}
 ```
 ## Evaluation (transformers)
 Evaluated on **200 held-out prompts** drawn from `test_dataset_langid.csv` (same single + multi + benign composition as training).
-- Evaluation timestamp: `2026-05-24 12:05 UTC`
 - GPU: `NVIDIA A10G`
 - Source adapter: `Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8`
 - JSON parse errors: `0/200` (`0.0%`)
@@ -251,5 +218,41 @@ The model emits one or more of these keys in the `category` map of its JSON outp
 Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
 ```
 ---
-*Model card generated automatically by `eval_and_push_card.py` on 2026-05-24 12:05 UTC.*

   - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
   - When multiple languages appear, list every distinct one (still only true).
 Allowed language keys (use these exact spellings):
+  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq"""
 llm = LLM(
     model=MODEL,
   - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
   - When multiple languages appear, list every distinct one (still only true).
 Allowed language keys (use these exact spellings):
+  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq"""
 tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(
   - When multiple languages appear, list every distinct one (still only true).
 Allowed language keys (use these exact spellings):
   Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
 ```
 ## Evaluation (transformers)
 Evaluated on **200 held-out prompts** drawn from `test_dataset_langid.csv` (same single + multi + benign composition as training).
+- Evaluation timestamp: `2026-05-24 12:53 UTC`
 - GPU: `NVIDIA A10G`
 - Source adapter: `Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8`
 - JSON parse errors: `0/200` (`0.0%`)
 Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
 ```
+## Evaluation — vLLM serving (merged model, text-only)
+Same **500 held-out prompts**, served through **vLLM `0.21.0`**'s native Qwen3.5/Mamba runner instead of the transformers `.generate()` loop above. Only text prompts are sent; vLLM auto-detects text-only mode. This reflects production serving accuracy + latency.
+- Engine: vLLM `0.21.0`, text-only (auto (limit_mm_per_prompt=0)), dtype bf16, greedy decoding
+- GPU: `NVIDIA A10G`
+- JSON parse errors: `0/500` (`0.0%`)
+### Accuracy (vLLM)
+| Metric | Value |
+|---|---:|
+| `is_valid` accuracy | **1.0000** |
+| Language-set exact match | **0.9700** |
+| Binary F1 (positive = contains code) | **1.0000** |
+| Binary precision | 1.0000 |
+| Binary recall | 1.0000 |
+| Macro F1 across languages | **0.9771** |
+### Confusion matrix — binary `is_valid` (vLLM)
+| | predicted contains-code | predicted no-code |
+|---|---:|---:|
+| **actual contains-code** | TP = 450 | FN = 0 |
+| **actual no-code**       | FP = 0 | TN = 50 |
+### vLLM inference latency (single-stream, batch = 1)
+| Stat | ms / prompt |
+|---|---:|
+| Mean | **200.0** |
+| Median | 186.2 |
+| p95 | 278.9 |
+| p99 | 343.7 |
+| Max | 1990.9 |
+| Under 1 s | 99.6% |
+### vLLM throughput (single batched submit, continuous batching)
+- Prompts/sec: **18.12**
+- Output tokens/sec: 260.7
+- Input tokens/sec: 15441.4
+- Batched wall time for all 500 prompts: 27.60 s
 ---
+*Model card generated automatically by `eval_and_push_card.py` on 2026-05-24 12:53 UTC.*