Instructions to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8") model = AutoModelForImageTextToText.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8
- SGLang
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Docker Model Runner:
docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8
# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8")
model = AutoModelForImageTextToText.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))CodeLanguage-Qwen3.5-2B-v8
Merged full model (base Qwen/Qwen3.5-2B + LoRA adapter, merged via peft.merge_and_unload()) that identifies which programming languages are embedded in a user prompt across 25 languages and configuration formats. This is a self-contained checkpoint — load it directly (no PEFT step) and serve it on vLLM (v0.21.0+). Trained on a combined dataset of Rosetta Code snippets and curated config-language samples (Dockerfile, YAML, Terraform, Makefile, SQL).
The model is fine-tuned to emit a strict JSON object describing the languages found:
{"is_valid": true, "category": {"Python": true, "Bash": true}}
is_valid is true when at least one code/config snippet is present and false for natural-language-only prompts. category contains only the detected languages, each mapped to true; if no code is present category is {}.
Quick start
Text-only model. The base
Qwen/Qwen3.5-2Bdeclares the multimodalQwen3_5ForConditionalGenerationarchitecture (it carries a vision tower in its weights), but this is a text-in / text-out language guard — it never consumes images and only emits the JSON verdict. Send only text prompts; vLLM auto-detects text-only mode and printsAll limits of multimodal modalities ... set to 0, running in text-only modeat startup. (language_model_only=Truewould in theory skip loading the vision-tower weights, but on vLLM v0.21.0 it crashesQwen3_5ForCausalLM.__init__with avision_configattribute error — leave it off until a later vLLM release fixes that path.)
vLLM (recommended — needs vLLM >= 0.21.0 for the Qwen3.5/Mamba runner)
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import json, re
MODEL = "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8"
SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
- When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq"""
llm = LLM(
model=MODEL,
trust_remote_code=True,
dtype="bfloat16",
max_model_len=4096,
# vLLM auto-detects text-only when no multimodal inputs are sent.
# Do NOT pass language_model_only=True here — see the note above.
)
tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
sampling = SamplingParams(temperature=0.0, max_tokens=220, stop=["\n\n\n"])
def langid(prompt: str) -> dict:
chat = tokenizer.apply_chat_template(
[{"role":"system","content":SYSTEM_MSG},
{"role":"user","content":prompt}],
tokenize=False, add_generation_prompt=True, enable_thinking=False)
out = llm.generate([chat], sampling)
text = out[0].outputs[0].text
return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))
Plain transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, json, re
MODEL = "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8"
SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
- When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq"""
tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
).eval()
def langid(prompt: str) -> dict:
chat = tokenizer.apply_chat_template(
[{"role":"system","content":SYSTEM_MSG},
{"role":"user","content":prompt}],
tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=220, do_sample=False)
text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))
System prompt
The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema depends on this prompt.
You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
- When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
Evaluation (transformers)
Evaluated on 200 held-out prompts drawn from test_dataset_langid.csv (same single + multi + benign composition as training).
- Evaluation timestamp:
2026-05-24 12:53 UTC - GPU:
NVIDIA A10G - Source adapter:
Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 - JSON parse errors:
0/200(0.0%)
Top-level metrics
| Metric | Value |
|---|---|
is_valid accuracy |
1.0000 |
| Language-set exact match | 0.9650 |
| Binary F1 (positive = contains code) | 1.0000 |
| Binary precision | 1.0000 |
| Binary recall | 1.0000 |
| Macro F1 across languages | 0.9701 |
Confusion matrix — binary is_valid decision
Positive class = the prompt contains code (is_valid=True).
| predicted contains-code | predicted no-code | |
|---|---|---|
| actual contains-code | TP = 181 | FN = 0 |
| actual no-code | FP = 0 | TN = 19 |
Per-language metrics
Only languages that appear in either the actual or predicted labels are listed.
| Language | support | precision | recall | F1 |
|---|---|---|---|---|
Python |
14 | 1.000 | 1.000 | 1.000 |
Terraform |
14 | 1.000 | 1.000 | 1.000 |
Java |
12 | 1.000 | 0.917 | 0.957 |
C |
12 | 0.857 | 1.000 | 0.923 |
Rust |
12 | 1.000 | 1.000 | 1.000 |
AWK |
12 | 1.000 | 0.917 | 0.957 |
Ruby |
11 | 1.000 | 1.000 | 1.000 |
R |
11 | 0.846 | 1.000 | 0.917 |
Go |
10 | 1.000 | 1.000 | 1.000 |
Swift |
10 | 1.000 | 1.000 | 1.000 |
Scala |
10 | 1.000 | 0.800 | 0.889 |
SQL |
10 | 1.000 | 1.000 | 1.000 |
jq |
10 | 0.909 | 1.000 | 0.952 |
JavaScript |
9 | 0.900 | 1.000 | 0.947 |
Kotlin |
9 | 1.000 | 1.000 | 1.000 |
Perl |
9 | 1.000 | 1.000 | 1.000 |
PowerShell |
9 | 1.000 | 1.000 | 1.000 |
Batch |
9 | 1.000 | 1.000 | 1.000 |
YAML |
9 | 1.000 | 0.889 | 0.941 |
C++ |
7 | 1.000 | 0.857 | 0.923 |
C# |
7 | 1.000 | 1.000 | 1.000 |
Lua |
7 | 1.000 | 0.857 | 0.923 |
Bash |
7 | 1.000 | 1.000 | 1.000 |
Dockerfile |
6 | 0.857 | 1.000 | 0.923 |
Makefile |
6 | 1.000 | 1.000 | 1.000 |
Inference latency
- Mean: 0.99 s/prompt
- Median: 0.94 s/prompt
- p95: 1.34 s/prompt
- Max: 1.64 s/prompt
Training setup
- Base model:
Qwen/Qwen3.5-2B(loaded in full precision (bf16 / fp16, nobitsandbytesquantization)) - LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
- Optimizer: adamw_torch, lr=1e-4, cosine schedule, warmup 5%
- Epochs: 3
- Precision: bf16 if available, else fp16
- Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
- Max sequence length: 3200 tokens
- Training data: 10,000 rows (7,000 single-language + 2,000 multi-language + 1,000 benign)
- Languages: 25 (programming + config formats)
Supported languages
The model emits one or more of these keys in the category map of its JSON output:
Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
Evaluation — vLLM serving (merged model, text-only)
Same 500 held-out prompts, served through vLLM 0.21.0's native Qwen3.5/Mamba runner instead of the transformers .generate() loop above. Only text prompts are sent; vLLM auto-detects text-only mode. This reflects production serving accuracy + latency.
- Engine: vLLM
0.21.0, text-only (auto (limit_mm_per_prompt=0)), dtype bf16, greedy decoding - GPU:
NVIDIA A10G - JSON parse errors:
0/500(0.0%)
Accuracy (vLLM)
| Metric | Value |
|---|---|
is_valid accuracy |
1.0000 |
| Language-set exact match | 0.9700 |
| Binary F1 (positive = contains code) | 1.0000 |
| Binary precision | 1.0000 |
| Binary recall | 1.0000 |
| Macro F1 across languages | 0.9771 |
Confusion matrix — binary is_valid (vLLM)
| predicted contains-code | predicted no-code | |
|---|---|---|
| actual contains-code | TP = 450 | FN = 0 |
| actual no-code | FP = 0 | TN = 50 |
vLLM inference latency (single-stream, batch = 1)
| Stat | ms / prompt |
|---|---|
| Mean | 200.0 |
| Median | 186.2 |
| p95 | 278.9 |
| p99 | 343.7 |
| Max | 1990.9 |
| Under 1 s | 99.6% |
vLLM throughput (single batched submit, continuous batching)
- Prompts/sec: 18.12
- Output tokens/sec: 260.7
- Input tokens/sec: 15441.4
- Batched wall time for all 500 prompts: 27.60 s
Model card generated automatically by eval_and_push_card.py on 2026-05-24 12:53 UTC.
- Downloads last month
- 322
Model tree for Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8
Evaluation results
- is_valid accuracy on LangID Guard Held-out Test Setself-reported1.000
- language-set exact match on LangID Guard Held-out Test Setself-reported0.965
- binary F1 (positive=contains code) on LangID Guard Held-out Test Setself-reported1.000
- macro F1 over languages on LangID Guard Held-out Test Setself-reported0.970
- binary precision (positive=contains code) on LangID Guard Held-out Test Setself-reported1.000
- binary recall (positive=contains code) on LangID Guard Held-out Test Setself-reported1.000
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)