--- license: apache-2.0 base_model: Qwen/Qwen3.5-2B library_name: peft pipeline_tag: text-generation language: - en tags: - lora - peft - qwen - guardrails - code-detection - language-identification - multi-label-classification - quantization - 8-bit metrics: - accuracy - f1 - precision - recall model-index: - name: CodeLanguage-Qwen3.5-2B-v5 results: - task: type: text-classification name: Multi-label Programming Language Identification dataset: name: LangID Guard Held-out Test Set type: custom metrics: - type: accuracy name: is_valid accuracy value: 1.0000 - type: accuracy name: language-set exact match value: 0.9600 - type: f1 name: binary F1 (positive=contains code) value: 1.0000 - type: f1 name: macro F1 over languages value: 0.9696 - type: precision name: binary precision (positive=contains code) value: 1.0000 - type: recall name: binary recall (positive=contains code) value: 1.0000 --- # CodeLanguage-Qwen3.5-2B-v5 LoRA adapter for **Qwen/Qwen3.5-2B** that identifies which programming languages are embedded in a user prompt across **25 languages and configuration formats**. Trained on a combined dataset of Rosetta Code snippets and curated config-language samples (Dockerfile, YAML, Terraform, Makefile, SQL). The model is fine-tuned to emit a strict JSON object describing the languages found: ```json {"is_valid": true, "category": {"Python": true, "Bash": true}} ``` `is_valid` is `true` when at least one code/config snippet is present and `false` for natural-language-only prompts. `category` contains only the detected languages, each mapped to `true`; if no code is present `category` is `{}`. ## Quick start ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer import torch, json, re BASE = "Qwen/Qwen3.5-2B" ADAPTER = "Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-v5" SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": , "category": {"": true, ...}}. No preamble. No explanation. No tags. No markdown code fences. No trailing prose. Rules: - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only. - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}. - When multiple languages appear, list every distinct one (still only true). Allowed language keys (use these exact spellings): Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq Examples: Input: What's the weather forecast today? Output: {"is_valid": false, "category": {}} Input: Run this for me: print('hello world') Output: {"is_valid": true, "category": {"Python": true}} Input: Compare these — SELECT * FROM users vs the snippet: console.log(users) Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}""" tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( BASE, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) model = PeftModel.from_pretrained(model, ADAPTER); model.eval() def langid(prompt: str) -> dict: chat = tokenizer.apply_chat_template( [{"role":"system","content":SYSTEM_MSG}, {"role":"user","content":prompt}], tokenize=False, add_generation_prompt=True, enable_thinking=False) inputs = tokenizer(chat, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=220, do_sample=False) text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True) return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0)) ``` ## System prompt The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema depends on this prompt. ```text You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": , "category": {"": true, ...}}. No preamble. No explanation. No tags. No markdown code fences. No trailing prose. Rules: - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only. - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}. - When multiple languages appear, list every distinct one (still only true). Allowed language keys (use these exact spellings): Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq Examples: Input: What's the weather forecast today? Output: {"is_valid": false, "category": {}} Input: Run this for me: print('hello world') Output: {"is_valid": true, "category": {"Python": true}} Input: Compare these — SELECT * FROM users vs the snippet: console.log(users) Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}} ``` ## Evaluation Evaluated on **200 held-out prompts** drawn from `test_dataset_langid.csv` (same single + multi + benign composition as training). - Evaluation timestamp: `2026-05-22 00:42 UTC` - GPU: `NVIDIA A10G` - Source adapter: `Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-v5` - JSON parse errors: `0/200` (`0.0%`) ### Top-level metrics | Metric | Value | |---|---:| | `is_valid` accuracy | **1.0000** | | Language-set exact match | **0.9600** | | Binary F1 (positive = contains code) | **1.0000** | | Binary precision | 1.0000 | | Binary recall | 1.0000 | | Macro F1 across languages | **0.9696** | ### Confusion matrix — binary `is_valid` decision Positive class = the prompt **contains code** (`is_valid=True`). | | predicted contains-code | predicted no-code | |---|---:|---:| | **actual contains-code** | TP = 181 | FN = 0 | | **actual no-code** | FP = 0 | TN = 19 | ### Per-language metrics Only languages that appear in either the actual or predicted labels are listed. | Language | support | precision | recall | F1 | |---|---:|---:|---:|---:| | `Python` | 14 | 1.000 | 1.000 | 1.000 | | `Terraform` | 14 | 1.000 | 1.000 | 1.000 | | `Java` | 12 | 1.000 | 1.000 | 1.000 | | `C` | 12 | 1.000 | 1.000 | 1.000 | | `Rust` | 12 | 1.000 | 1.000 | 1.000 | | `AWK` | 12 | 1.000 | 0.917 | 0.957 | | `Ruby` | 11 | 0.917 | 1.000 | 0.957 | | `R` | 11 | 1.000 | 1.000 | 1.000 | | `Go` | 10 | 1.000 | 0.900 | 0.947 | | `Swift` | 10 | 1.000 | 0.900 | 0.947 | | `Scala` | 10 | 1.000 | 0.800 | 0.889 | | `SQL` | 10 | 1.000 | 1.000 | 1.000 | | `jq` | 10 | 0.909 | 1.000 | 0.952 | | `JavaScript` | 9 | 0.900 | 1.000 | 0.947 | | `Kotlin` | 9 | 1.000 | 1.000 | 1.000 | | `Perl` | 9 | 1.000 | 1.000 | 1.000 | | `PowerShell` | 9 | 1.000 | 1.000 | 1.000 | | `Batch` | 9 | 1.000 | 1.000 | 1.000 | | `YAML` | 9 | 1.000 | 0.889 | 0.941 | | `C++` | 7 | 1.000 | 0.857 | 0.923 | | `C#` | 7 | 0.875 | 1.000 | 0.933 | | `Lua` | 7 | 1.000 | 0.857 | 0.923 | | `Bash` | 7 | 1.000 | 1.000 | 1.000 | | `Dockerfile` | 6 | 0.857 | 1.000 | 0.923 | | `Makefile` | 6 | 1.000 | 1.000 | 1.000 | ### Inference latency - Mean: **0.99 s/prompt** - Median: 0.94 s/prompt - p95: 1.35 s/prompt - Max: 1.63 s/prompt ## Training setup - Base model: `Qwen/Qwen3.5-2B` (loaded in full precision (bf16 / fp16, no `bitsandbytes` quantization)) - LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj - Optimizer: adamw_torch, lr=1e-4, cosine schedule, warmup 5% - Precision: bf16 if available, else fp16 - Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on - Max sequence length: 3200 tokens - Training data: 10,000 rows (7,000 single-language + 2,000 multi-language + 1,000 benign) - Languages: 25 (programming + config formats) ## Supported languages The model emits one or more of these keys in the `category` map of its JSON output: ``` Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq ``` --- *Model card generated automatically by `eval_and_push_card.py` on 2026-05-22 00:42 UTC.*