Instructions to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8")
model = AutoModelForMultimodalLM.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8

SGLang

How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Docker Model Runner:
```
docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8
```

CodeLanguage-Qwen3.5-2B-v8

Merged full model (base Qwen/Qwen3.5-2B + LoRA adapter, merged via peft.merge_and_unload()) that identifies which programming languages are embedded in a user prompt across 25 languages and configuration formats. This is a self-contained checkpoint — load it directly (no PEFT step) and serve it on vLLM (v0.21.0+). Trained on a combined dataset of Rosetta Code snippets and curated config-language samples (Dockerfile, YAML, Terraform, Makefile, SQL). The model is fine-tuned to emit a strict JSON object describing the languages found:

{"is_valid": true, "category": {"Python": true, "Bash": true}}

is_valid is true when at least one code/config snippet is present and false for natural-language-only prompts. category contains only the detected languages, each mapped to true; if no code is present category is {}.

Quick start

Text-only model. The base Qwen/Qwen3.5-2B declares the multimodal Qwen3_5ForConditionalGeneration architecture (it carries a vision tower in its weights), but this is a text-in / text-out language guard — it never consumes images and only emits the JSON verdict. Send only text prompts; vLLM auto-detects text-only mode and prints All limits of multimodal modalities ... set to 0, running in text-only mode at startup. (language_model_only=True would in theory skip loading the vision-tower weights, but on vLLM v0.21.0 it crashes Qwen3_5ForCausalLM.__init__ with a vision_config attribute error — leave it off until a later vLLM release fixes that path.)

vLLM (recommended — needs vLLM >= 0.21.0 for the Qwen3.5/Mamba runner)

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import json, re

MODEL = "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8"
SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
  - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
  - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
  - When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq"""

llm = LLM(
    model=MODEL,
    trust_remote_code=True,
    dtype="bfloat16",
    max_model_len=4096,
    # vLLM auto-detects text-only when no multimodal inputs are sent.
    # Do NOT pass language_model_only=True here — see the note above.
)
tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
sampling = SamplingParams(temperature=0.0, max_tokens=220, stop=["\n\n\n"])

def langid(prompt: str) -> dict:
    chat = tokenizer.apply_chat_template(
        [{"role":"system","content":SYSTEM_MSG},
         {"role":"user","content":prompt}],
        tokenize=False, add_generation_prompt=True, enable_thinking=False)
    out = llm.generate([chat], sampling)
    text = out[0].outputs[0].text
    return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))

Plain transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, json, re

MODEL = "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8"
SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
  - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
  - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
  - When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq"""

tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
).eval()

def langid(prompt: str) -> dict:
    chat = tokenizer.apply_chat_template(
        [{"role":"system","content":SYSTEM_MSG},
         {"role":"user","content":prompt}],
        tokenize=False, add_generation_prompt=True, enable_thinking=False)
    inputs = tokenizer(chat, return_tensors="pt").to(model.device)
    out = model.generate(**inputs, max_new_tokens=220, do_sample=False)
    text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))

System prompt

The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema depends on this prompt.

You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
  - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
  - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
  - When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq

Evaluation (transformers)

Evaluated on 200 held-out prompts drawn from test_dataset_langid.csv (same single + multi + benign composition as training).

Evaluation timestamp: 2026-05-24 12:53 UTC
GPU: NVIDIA A10G
Source adapter: Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8
JSON parse errors: 0/200 (0.0%)

Top-level metrics

Metric	Value
`is_valid` accuracy	1.0000
Language-set exact match	0.9650
Binary F1 (positive = contains code)	1.0000
Binary precision	1.0000
Binary recall	1.0000
Macro F1 across languages	0.9701

Confusion matrix — binary `is_valid` decision

Positive class = the prompt contains code (is_valid=True).

	predicted contains-code	predicted no-code
actual contains-code	TP = 181	FN = 0
actual no-code	FP = 0	TN = 19

Per-language metrics

Only languages that appear in either the actual or predicted labels are listed.

Language	support	precision	recall	F1
`Python`	14	1.000	1.000	1.000
`Terraform`	14	1.000	1.000	1.000
`Java`	12	1.000	0.917	0.957
`C`	12	0.857	1.000	0.923
`Rust`	12	1.000	1.000	1.000
`AWK`	12	1.000	0.917	0.957
`Ruby`	11	1.000	1.000	1.000
`R`	11	0.846	1.000	0.917
`Go`	10	1.000	1.000	1.000
`Swift`	10	1.000	1.000	1.000
`Scala`	10	1.000	0.800	0.889
`SQL`	10	1.000	1.000	1.000
`jq`	10	0.909	1.000	0.952
`JavaScript`	9	0.900	1.000	0.947
`Kotlin`	9	1.000	1.000	1.000
`Perl`	9	1.000	1.000	1.000
`PowerShell`	9	1.000	1.000	1.000
`Batch`	9	1.000	1.000	1.000
`YAML`	9	1.000	0.889	0.941
`C++`	7	1.000	0.857	0.923
`C#`	7	1.000	1.000	1.000
`Lua`	7	1.000	0.857	0.923
`Bash`	7	1.000	1.000	1.000
`Dockerfile`	6	0.857	1.000	0.923
`Makefile`	6	1.000	1.000	1.000

Inference latency

Mean: 0.99 s/prompt
Median: 0.94 s/prompt
p95: 1.34 s/prompt
Max: 1.64 s/prompt

Training setup

Base model: Qwen/Qwen3.5-2B (loaded in full precision (bf16 / fp16, no bitsandbytes quantization))
LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
Optimizer: adamw_torch, lr=1e-4, cosine schedule, warmup 5%
Epochs: 3
Precision: bf16 if available, else fp16
Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
Max sequence length: 3200 tokens
Training data: 10,000 rows (7,000 single-language + 2,000 multi-language + 1,000 benign)
Languages: 25 (programming + config formats)

Supported languages

The model emits one or more of these keys in the category map of its JSON output:

Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq

Evaluation — vLLM serving (merged model, text-only)

Same 500 held-out prompts, served through vLLM 0.21.0's native Qwen3.5/Mamba runner instead of the transformers .generate() loop above. Only text prompts are sent; vLLM auto-detects text-only mode. This reflects production serving accuracy + latency.

Engine: vLLM 0.21.0, text-only (auto (limit_mm_per_prompt=0)), dtype bf16, greedy decoding
GPU: NVIDIA A10G
JSON parse errors: 0/500 (0.0%)

Accuracy (vLLM)

Metric	Value
`is_valid` accuracy	1.0000
Language-set exact match	0.9700
Binary F1 (positive = contains code)	1.0000
Binary precision	1.0000
Binary recall	1.0000
Macro F1 across languages	0.9771

Confusion matrix — binary `is_valid` (vLLM)

	predicted contains-code	predicted no-code
actual contains-code	TP = 450	FN = 0
actual no-code	FP = 0	TN = 50

vLLM inference latency (single-stream, batch = 1)

Stat	ms / prompt
Mean	200.0
Median	186.2
p95	278.9
p99	343.7
Max	1990.9
Under 1 s	99.6%

vLLM throughput (single batched submit, continuous batching)

Prompts/sec: 18.12
Output tokens/sec: 260.7
Input tokens/sec: 15441.4
Batched wall time for all 500 prompts: 27.60 s

Model card generated automatically by eval_and_push_card.py on 2026-05-24 12:53 UTC.

Downloads last month: 19

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B