Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
qwen
guardrails
code-detection
language-identification
multi-label-classification
merged
vllm
conversational
Eval Results (legacy)
Instructions to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8") model = AutoModelForImageTextToText.from_pretrained("Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8
- SGLang
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8 with Docker Model Runner:
docker model run hf.co/Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8
| license: apache-2.0 | |
| base_model: Qwen/Qwen3.5-2B | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| language: | |
| - en | |
| tags: | |
| - qwen | |
| - guardrails | |
| - code-detection | |
| - language-identification | |
| - multi-label-classification | |
| - merged | |
| - vllm | |
| metrics: | |
| - accuracy | |
| - f1 | |
| - precision | |
| - recall | |
| model-index: | |
| - name: CodeLanguage-Qwen3.5-2B-v8 | |
| results: | |
| - task: | |
| type: text-classification | |
| name: Multi-label Programming Language Identification | |
| dataset: | |
| name: LangID Guard Held-out Test Set | |
| type: custom | |
| metrics: | |
| - type: accuracy | |
| name: is_valid accuracy | |
| value: 1.0000 | |
| - type: accuracy | |
| name: language-set exact match | |
| value: 0.9650 | |
| - type: f1 | |
| name: binary F1 (positive=contains code) | |
| value: 1.0000 | |
| - type: f1 | |
| name: macro F1 over languages | |
| value: 0.9701 | |
| - type: precision | |
| name: binary precision (positive=contains code) | |
| value: 1.0000 | |
| - type: recall | |
| name: binary recall (positive=contains code) | |
| value: 1.0000 | |
| # CodeLanguage-Qwen3.5-2B-v8 | |
| **Merged full model** (base `Qwen/Qwen3.5-2B` + LoRA adapter, merged via `peft.merge_and_unload()`) that identifies which programming languages are embedded in a user prompt across **25 languages and configuration formats**. This is a self-contained checkpoint — load it directly (no PEFT step) and serve it on **vLLM** (v0.21.0+). Trained on a combined dataset of Rosetta Code snippets and curated config-language samples (Dockerfile, YAML, Terraform, Makefile, SQL). | |
| The model is fine-tuned to emit a strict JSON object describing the languages found: | |
| ```json | |
| {"is_valid": true, "category": {"Python": true, "Bash": true}} | |
| ``` | |
| `is_valid` is `true` when at least one code/config snippet is present and `false` for natural-language-only prompts. `category` contains only the detected languages, each mapped to `true`; if no code is present `category` is `{}`. | |
| ## Quick start | |
| > **Text-only model.** The base `Qwen/Qwen3.5-2B` declares the multimodal `Qwen3_5ForConditionalGeneration` architecture (it carries a vision tower in its weights), but this is a **text-in / text-out** language guard — it never consumes images and only emits the JSON verdict. Send only text prompts; vLLM auto-detects text-only mode and prints `All limits of multimodal modalities ... set to 0, running in text-only mode` at startup. (`language_model_only=True` would in theory skip loading the vision-tower weights, but on vLLM v0.21.0 it crashes `Qwen3_5ForCausalLM.__init__` with a `vision_config` attribute error — leave it off until a later vLLM release fixes that path.) | |
| ### vLLM (recommended — needs vLLM >= 0.21.0 for the Qwen3.5/Mamba runner) | |
| ```python | |
| from vllm import LLM, SamplingParams | |
| from transformers import AutoTokenizer | |
| import json, re | |
| MODEL = "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" | |
| SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}. | |
| No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose. | |
| Rules: | |
| - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only. | |
| - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}. | |
| - When multiple languages appear, list every distinct one (still only true). | |
| Allowed language keys (use these exact spellings): | |
| Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq""" | |
| llm = LLM( | |
| model=MODEL, | |
| trust_remote_code=True, | |
| dtype="bfloat16", | |
| max_model_len=4096, | |
| # vLLM auto-detects text-only when no multimodal inputs are sent. | |
| # Do NOT pass language_model_only=True here — see the note above. | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True) | |
| sampling = SamplingParams(temperature=0.0, max_tokens=220, stop=["\n\n\n"]) | |
| def langid(prompt: str) -> dict: | |
| chat = tokenizer.apply_chat_template( | |
| [{"role":"system","content":SYSTEM_MSG}, | |
| {"role":"user","content":prompt}], | |
| tokenize=False, add_generation_prompt=True, enable_thinking=False) | |
| out = llm.generate([chat], sampling) | |
| text = out[0].outputs[0].text | |
| return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0)) | |
| ``` | |
| ### Plain transformers | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch, json, re | |
| MODEL = "Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8" | |
| SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}. | |
| No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose. | |
| Rules: | |
| - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only. | |
| - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}. | |
| - When multiple languages appear, list every distinct one (still only true). | |
| Allowed language keys (use these exact spellings): | |
| Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq""" | |
| tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| MODEL, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, | |
| ).eval() | |
| def langid(prompt: str) -> dict: | |
| chat = tokenizer.apply_chat_template( | |
| [{"role":"system","content":SYSTEM_MSG}, | |
| {"role":"user","content":prompt}], | |
| tokenize=False, add_generation_prompt=True, enable_thinking=False) | |
| inputs = tokenizer(chat, return_tensors="pt").to(model.device) | |
| out = model.generate(**inputs, max_new_tokens=220, do_sample=False) | |
| text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True) | |
| return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0)) | |
| ``` | |
| ## System prompt | |
| The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema depends on this prompt. | |
| ```text | |
| You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}. | |
| No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose. | |
| Rules: | |
| - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only. | |
| - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}. | |
| - When multiple languages appear, list every distinct one (still only true). | |
| Allowed language keys (use these exact spellings): | |
| Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq | |
| ``` | |
| ## Evaluation (transformers) | |
| Evaluated on **200 held-out prompts** drawn from `test_dataset_langid.csv` (same single + multi + benign composition as training). | |
| - Evaluation timestamp: `2026-05-24 12:53 UTC` | |
| - GPU: `NVIDIA A10G` | |
| - Source adapter: `Accuknoxtechnologies/CodeLanguage-Qwen3.5-2B-v8` | |
| - JSON parse errors: `0/200` (`0.0%`) | |
| ### Top-level metrics | |
| | Metric | Value | | |
| |---|---:| | |
| | `is_valid` accuracy | **1.0000** | | |
| | Language-set exact match | **0.9650** | | |
| | Binary F1 (positive = contains code) | **1.0000** | | |
| | Binary precision | 1.0000 | | |
| | Binary recall | 1.0000 | | |
| | Macro F1 across languages | **0.9701** | | |
| ### Confusion matrix — binary `is_valid` decision | |
| Positive class = the prompt **contains code** (`is_valid=True`). | |
| | | predicted contains-code | predicted no-code | | |
| |---|---:|---:| | |
| | **actual contains-code** | TP = 181 | FN = 0 | | |
| | **actual no-code** | FP = 0 | TN = 19 | | |
| ### Per-language metrics | |
| Only languages that appear in either the actual or predicted labels are listed. | |
| | Language | support | precision | recall | F1 | | |
| |---|---:|---:|---:|---:| | |
| | `Python` | 14 | 1.000 | 1.000 | 1.000 | | |
| | `Terraform` | 14 | 1.000 | 1.000 | 1.000 | | |
| | `Java` | 12 | 1.000 | 0.917 | 0.957 | | |
| | `C` | 12 | 0.857 | 1.000 | 0.923 | | |
| | `Rust` | 12 | 1.000 | 1.000 | 1.000 | | |
| | `AWK` | 12 | 1.000 | 0.917 | 0.957 | | |
| | `Ruby` | 11 | 1.000 | 1.000 | 1.000 | | |
| | `R` | 11 | 0.846 | 1.000 | 0.917 | | |
| | `Go` | 10 | 1.000 | 1.000 | 1.000 | | |
| | `Swift` | 10 | 1.000 | 1.000 | 1.000 | | |
| | `Scala` | 10 | 1.000 | 0.800 | 0.889 | | |
| | `SQL` | 10 | 1.000 | 1.000 | 1.000 | | |
| | `jq` | 10 | 0.909 | 1.000 | 0.952 | | |
| | `JavaScript` | 9 | 0.900 | 1.000 | 0.947 | | |
| | `Kotlin` | 9 | 1.000 | 1.000 | 1.000 | | |
| | `Perl` | 9 | 1.000 | 1.000 | 1.000 | | |
| | `PowerShell` | 9 | 1.000 | 1.000 | 1.000 | | |
| | `Batch` | 9 | 1.000 | 1.000 | 1.000 | | |
| | `YAML` | 9 | 1.000 | 0.889 | 0.941 | | |
| | `C++` | 7 | 1.000 | 0.857 | 0.923 | | |
| | `C#` | 7 | 1.000 | 1.000 | 1.000 | | |
| | `Lua` | 7 | 1.000 | 0.857 | 0.923 | | |
| | `Bash` | 7 | 1.000 | 1.000 | 1.000 | | |
| | `Dockerfile` | 6 | 0.857 | 1.000 | 0.923 | | |
| | `Makefile` | 6 | 1.000 | 1.000 | 1.000 | | |
| ### Inference latency | |
| - Mean: **0.99 s/prompt** | |
| - Median: 0.94 s/prompt | |
| - p95: 1.34 s/prompt | |
| - Max: 1.64 s/prompt | |
| ## Training setup | |
| - Base model: `Qwen/Qwen3.5-2B` (loaded in full precision (bf16 / fp16, no `bitsandbytes` quantization)) | |
| - LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj | |
| - Optimizer: adamw_torch, lr=1e-4, cosine schedule, warmup 5% | |
| - Epochs: 3 | |
| - Precision: bf16 if available, else fp16 | |
| - Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on | |
| - Max sequence length: 3200 tokens | |
| - Training data: 10,000 rows (7,000 single-language + 2,000 multi-language + 1,000 benign) | |
| - Languages: 25 (programming + config formats) | |
| ## Supported languages | |
| The model emits one or more of these keys in the `category` map of its JSON output: | |
| ``` | |
| Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq | |
| ``` | |
| ## Evaluation — vLLM serving (merged model, text-only) | |
| Same **500 held-out prompts**, served through **vLLM `0.21.0`**'s native Qwen3.5/Mamba runner instead of the transformers `.generate()` loop above. Only text prompts are sent; vLLM auto-detects text-only mode. This reflects production serving accuracy + latency. | |
| - Engine: vLLM `0.21.0`, text-only (auto (limit_mm_per_prompt=0)), dtype bf16, greedy decoding | |
| - GPU: `NVIDIA A10G` | |
| - JSON parse errors: `0/500` (`0.0%`) | |
| ### Accuracy (vLLM) | |
| | Metric | Value | | |
| |---|---:| | |
| | `is_valid` accuracy | **1.0000** | | |
| | Language-set exact match | **0.9700** | | |
| | Binary F1 (positive = contains code) | **1.0000** | | |
| | Binary precision | 1.0000 | | |
| | Binary recall | 1.0000 | | |
| | Macro F1 across languages | **0.9771** | | |
| ### Confusion matrix — binary `is_valid` (vLLM) | |
| | | predicted contains-code | predicted no-code | | |
| |---|---:|---:| | |
| | **actual contains-code** | TP = 450 | FN = 0 | | |
| | **actual no-code** | FP = 0 | TN = 50 | | |
| ### vLLM inference latency (single-stream, batch = 1) | |
| | Stat | ms / prompt | | |
| |---|---:| | |
| | Mean | **200.0** | | |
| | Median | 186.2 | | |
| | p95 | 278.9 | | |
| | p99 | 343.7 | | |
| | Max | 1990.9 | | |
| | Under 1 s | 99.6% | | |
| ### vLLM throughput (single batched submit, continuous batching) | |
| - Prompts/sec: **18.12** | |
| - Output tokens/sec: 260.7 | |
| - Input tokens/sec: 15441.4 | |
| - Batched wall time for all 500 prompts: 27.60 s | |
| --- | |
| *Model card generated automatically by `eval_and_push_card.py` on 2026-05-24 12:53 UTC.* | |