Instructions to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8")
model = AutoModelForImageTextToText.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8

SGLang

How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Docker Model Runner:
```
docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8
```

CodeLanguage-Qwen3.5-2B-v8 / README.md

Yash1005

docs: add model card with eval metrics on held-out test set

d3172ec verified 8 days ago

preview code

raw

history blame contribute delete

12.3 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3.5-2B
	library_name: transformers
	pipeline_tag: text-generation
	language:
	- en
	tags:
	- qwen
	- guardrails
	- code-detection
	- language-identification
	- multi-label-classification
	- merged
	- vllm
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: CodeLanguage-Qwen3.5-2B-v8
	results:
	- task:
	type: text-classification
	name: Multi-label Programming Language Identification
	dataset:
	name: LangID Guard Held-out Test Set
	type: custom
	metrics:
	- type: accuracy
	name: is_valid accuracy
	value: 1.0000
	- type: accuracy
	name: language-set exact match
	value: 0.9650
	- type: f1
	name: binary F1 (positive=contains code)
	value: 1.0000
	- type: f1
	name: macro F1 over languages
	value: 0.9701
	- type: precision
	name: binary precision (positive=contains code)
	value: 1.0000
	- type: recall
	name: binary recall (positive=contains code)
	value: 1.0000
	---
	# CodeLanguage-Qwen3.5-2B-v8
	Merged full model (base `Qwen/Qwen3.5-2B` + LoRA adapter, merged via `peft.merge_and_unload()`) that identifies which programming languages are embedded in a user prompt across 25 languages and configuration formats. This is a self-contained checkpoint — load it directly (no PEFT step) and serve it on vLLM (v0.21.0+). Trained on a combined dataset of Rosetta Code snippets and curated config-language samples (Dockerfile, YAML, Terraform, Makefile, SQL).
	The model is fine-tuned to emit a strict JSON object describing the languages found:

	```json
	{"is_valid": true, "category": {"Python": true, "Bash": true}}
	```

	`is_valid` is `true` when at least one code/config snippet is present and `false` for natural-language-only prompts. `category` contains only the detected languages, each mapped to `true`; if no code is present `category` is `{}`.
	## Quick start
	> Text-only model. The base `Qwen/Qwen3.5-2B` declares the multimodal `Qwen3_5ForConditionalGeneration` architecture (it carries a vision tower in its weights), but this is a text-in / text-out language guard — it never consumes images and only emits the JSON verdict. Send only text prompts; vLLM auto-detects text-only mode and prints `All limits of multimodal modalities ... set to 0, running in text-only mode` at startup. (`language_model_only=True` would in theory skip loading the vision-tower weights, but on vLLM v0.21.0 it crashes `Qwen3_5ForCausalLM.__init__` with a `vision_config` attribute error — leave it off until a later vLLM release fixes that path.)

	### vLLM (recommended — needs vLLM >= 0.21.0 for the Qwen3.5/Mamba runner)
	```python
	from vllm import LLM, SamplingParams
	from transformers import AutoTokenizer
	import json, re

	MODEL = "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8"
	SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true\|false>, "category": {"<Lang>": true, ...}}.
	No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
	Rules:
	- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
	- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
	- When multiple languages appear, list every distinct one (still only true).
	Allowed language keys (use these exact spellings):
	Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq"""

	llm = LLM(
	model=MODEL,
	trust_remote_code=True,
	dtype="bfloat16",
	max_model_len=4096,
	# vLLM auto-detects text-only when no multimodal inputs are sent.
	# Do NOT pass language_model_only=True here — see the note above.
	)
	tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
	sampling = SamplingParams(temperature=0.0, max_tokens=220, stop=["\n\n\n"])

	def langid(prompt: str) -> dict:
	chat = tokenizer.apply_chat_template(
	[{"role":"system","content":SYSTEM_MSG},
	{"role":"user","content":prompt}],
	tokenize=False, add_generation_prompt=True, enable_thinking=False)
	out = llm.generate([chat], sampling)
	text = out[0].outputs[0].text
	return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))
	```

	### Plain transformers
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch, json, re

	MODEL = "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8"
	SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true\|false>, "category": {"<Lang>": true, ...}}.
	No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
	Rules:
	- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
	- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
	- When multiple languages appear, list every distinct one (still only true).
	Allowed language keys (use these exact spellings):
	Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq"""

	tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	MODEL, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
	).eval()

	def langid(prompt: str) -> dict:
	chat = tokenizer.apply_chat_template(
	[{"role":"system","content":SYSTEM_MSG},
	{"role":"user","content":prompt}],
	tokenize=False, add_generation_prompt=True, enable_thinking=False)
	inputs = tokenizer(chat, return_tensors="pt").to(model.device)
	out = model.generate(**inputs, max_new_tokens=220, do_sample=False)
	text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
	return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))
	```

	## System prompt
	The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema depends on this prompt.

	```text
	You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true\|false>, "category": {"<Lang>": true, ...}}.
	No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
	Rules:
	- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
	- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
	- When multiple languages appear, list every distinct one (still only true).
	Allowed language keys (use these exact spellings):
	Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
	```
	## Evaluation (transformers)
	Evaluated on 200 held-out prompts drawn from `test_dataset_langid.csv` (same single + multi + benign composition as training).

	- Evaluation timestamp: `2026-05-24 12:53 UTC`
	- GPU: `NVIDIA A10G`
	- Source adapter: `Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8`
	- JSON parse errors: `0/200` (`0.0%`)
	### Top-level metrics
	\| Metric \| Value \|
	\|---\|---:\|
	\| `is_valid` accuracy \| 1.0000 \|
	\| Language-set exact match \| 0.9650 \|
	\| Binary F1 (positive = contains code) \| 1.0000 \|
	\| Binary precision \| 1.0000 \|
	\| Binary recall \| 1.0000 \|
	\| Macro F1 across languages \| 0.9701 \|
	### Confusion matrix — binary `is_valid` decision
	Positive class = the prompt contains code (`is_valid=True`).

	\| \| predicted contains-code \| predicted no-code \|
	\|---\|---:\|---:\|
	\| actual contains-code \| TP = 181 \| FN = 0 \|
	\| actual no-code \| FP = 0 \| TN = 19 \|
	### Per-language metrics
	Only languages that appear in either the actual or predicted labels are listed.

	\| Language \| support \| precision \| recall \| F1 \|
	\|---\|---:\|---:\|---:\|---:\|
	\| `Python` \| 14 \| 1.000 \| 1.000 \| 1.000 \|
	\| `Terraform` \| 14 \| 1.000 \| 1.000 \| 1.000 \|
	\| `Java` \| 12 \| 1.000 \| 0.917 \| 0.957 \|
	\| `C` \| 12 \| 0.857 \| 1.000 \| 0.923 \|
	\| `Rust` \| 12 \| 1.000 \| 1.000 \| 1.000 \|
	\| `AWK` \| 12 \| 1.000 \| 0.917 \| 0.957 \|
	\| `Ruby` \| 11 \| 1.000 \| 1.000 \| 1.000 \|
	\| `R` \| 11 \| 0.846 \| 1.000 \| 0.917 \|
	\| `Go` \| 10 \| 1.000 \| 1.000 \| 1.000 \|
	\| `Swift` \| 10 \| 1.000 \| 1.000 \| 1.000 \|
	\| `Scala` \| 10 \| 1.000 \| 0.800 \| 0.889 \|
	\| `SQL` \| 10 \| 1.000 \| 1.000 \| 1.000 \|
	\| `jq` \| 10 \| 0.909 \| 1.000 \| 0.952 \|
	\| `JavaScript` \| 9 \| 0.900 \| 1.000 \| 0.947 \|
	\| `Kotlin` \| 9 \| 1.000 \| 1.000 \| 1.000 \|
	\| `Perl` \| 9 \| 1.000 \| 1.000 \| 1.000 \|
	\| `PowerShell` \| 9 \| 1.000 \| 1.000 \| 1.000 \|
	\| `Batch` \| 9 \| 1.000 \| 1.000 \| 1.000 \|
	\| `YAML` \| 9 \| 1.000 \| 0.889 \| 0.941 \|
	\| `C++` \| 7 \| 1.000 \| 0.857 \| 0.923 \|
	\| `C#` \| 7 \| 1.000 \| 1.000 \| 1.000 \|
	\| `Lua` \| 7 \| 1.000 \| 0.857 \| 0.923 \|
	\| `Bash` \| 7 \| 1.000 \| 1.000 \| 1.000 \|
	\| `Dockerfile` \| 6 \| 0.857 \| 1.000 \| 0.923 \|
	\| `Makefile` \| 6 \| 1.000 \| 1.000 \| 1.000 \|

	### Inference latency
	- Mean: 0.99 s/prompt
	- Median: 0.94 s/prompt
	- p95: 1.34 s/prompt
	- Max: 1.64 s/prompt

	## Training setup
	- Base model: `Qwen/Qwen3.5-2B` (loaded in full precision (bf16 / fp16, no `bitsandbytes` quantization))
	- LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
	- Optimizer: adamw_torch, lr=1e-4, cosine schedule, warmup 5%
	- Epochs: 3
	- Precision: bf16 if available, else fp16
	- Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
	- Max sequence length: 3200 tokens
	- Training data: 10,000 rows (7,000 single-language + 2,000 multi-language + 1,000 benign)
	- Languages: 25 (programming + config formats)

	## Supported languages
	The model emits one or more of these keys in the `category` map of its JSON output:

	```
	Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
	```

	## Evaluation — vLLM serving (merged model, text-only)
	Same 500 held-out prompts, served through vLLM `0.21.0`'s native Qwen3.5/Mamba runner instead of the transformers `.generate()` loop above. Only text prompts are sent; vLLM auto-detects text-only mode. This reflects production serving accuracy + latency.

	- Engine: vLLM `0.21.0`, text-only (auto (limit_mm_per_prompt=0)), dtype bf16, greedy decoding
	- GPU: `NVIDIA A10G`
	- JSON parse errors: `0/500` (`0.0%`)
	### Accuracy (vLLM)
	\| Metric \| Value \|
	\|---\|---:\|
	\| `is_valid` accuracy \| 1.0000 \|
	\| Language-set exact match \| 0.9700 \|
	\| Binary F1 (positive = contains code) \| 1.0000 \|
	\| Binary precision \| 1.0000 \|
	\| Binary recall \| 1.0000 \|
	\| Macro F1 across languages \| 0.9771 \|
	### Confusion matrix — binary `is_valid` (vLLM)
	\| \| predicted contains-code \| predicted no-code \|
	\|---\|---:\|---:\|
	\| actual contains-code \| TP = 450 \| FN = 0 \|
	\| actual no-code \| FP = 0 \| TN = 50 \|
	### vLLM inference latency (single-stream, batch = 1)
	\| Stat \| ms / prompt \|
	\|---\|---:\|
	\| Mean \| 200.0 \|
	\| Median \| 186.2 \|
	\| p95 \| 278.9 \|
	\| p99 \| 343.7 \|
	\| Max \| 1990.9 \|
	\| Under 1 s \| 99.6% \|

	### vLLM throughput (single batched submit, continuous batching)
	- Prompts/sec: 18.12
	- Output tokens/sec: 260.7
	- Input tokens/sec: 15441.4
	- Batched wall time for all 500 prompts: 27.60 s

	---
	Model card generated automatically by `eval_and_push_card.py` on 2026-05-24 12:53 UTC.