Instructions to use Accuknoxtechnologies/Code-Qwen3.5-2B-LoRA-8bit-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Accuknoxtechnologies/Code-Qwen3.5-2B-LoRA-8bit-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-2B") model = PeftModel.from_pretrained(base_model, "Accuknoxtechnologies/Code-Qwen3.5-2B-LoRA-8bit-v2") - Notebooks
- Google Colab
- Kaggle
Code-Qwen3.5-2B-LoRA-8bit-v2
LoRA adapter for Qwen/Qwen3.5-2B that detects embedded code snippets in user prompts across 10 programming languages (bash, c, go, java, javascript, php, python, ruby, rust, sql) plus a generic Code pattern. Trained on the LLM Guard code and ban_code scanner outputs.
The model is fine-tuned to emit a strict JSON object describing every code-snippet location found in the user prompt:
{"is_valid": false, "violations": {"python": [[5, 25]], "sql": [[40, 60]]}}
Quick start
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch, json, re
BASE = "Qwen/Qwen3.5-2B"
ADAPTER = "Accuknoxtechnologies/Code-Qwen3.5-2B-LoRA-8bit-v2"
SYSTEM_MSG = """You are a code guard. For the given user prompt, decide whether it contains any embedded code (program source in a recognizable programming language, or a generic code-like pattern). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "violations": {<TYPE>: [[start, end], ...], ...}}. No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose. Just the JSON, terminated immediately after the closing brace. is_valid is true only when violations is an empty object {}. Each [start, end] is a half-open character span pointing into the user prompt where a code snippet appears. Multiple occurrences of the same TYPE produce multiple spans in that TYPE's list.
SPAN RULE — read carefully: each span must cover ONLY the code snippet itself (the literal source text). DO NOT extend the span to include surrounding template words like 'Could you explain what this code does:' or trailing prose. If a code snippet sits deep inside a longer prompt of natural language, count characters from index 0 of the prompt and emit the exact [start, end] of just the code text.
Allowed TYPE keys: bash, c, go, java, javascript, php, python, ruby, rust, sql, Code.
Examples:
Input: What's the weather forecast today?
Output: {"is_valid": true, "violations": {}}
Input: Run: print('hello world')
Output: {"is_valid": false, "violations": {"python": [[5, 25]]}}
Input: Try SELECT * FROM users vs print(users)
Output: {"is_valid": false, "violations": {"sql": [[4, 23]], "python": [[27, 39]]}}
Input: The committee scheduled a follow-up meeting to discuss the budget allocations. Please review the following snippet for issues: SELECT id, name FROM users WHERE active = 1;
Output: {"is_valid": false, "violations": {"sql": [[127, 171]]}}"""
tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
bnb = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(model, ADAPTER); model.eval()
def guard(prompt: str) -> dict:
chat = tokenizer.apply_chat_template(
[{"role":"system","content":SYSTEM_MSG},
{"role":"user","content":prompt}],
tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=300, do_sample=False)
text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))
System prompt
The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema and span semantics depend on this prompt.
You are a code guard. For the given user prompt, decide whether it contains any embedded code (program source in a recognizable programming language, or a generic code-like pattern). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "violations": {<TYPE>: [[start, end], ...], ...}}. No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose. Just the JSON, terminated immediately after the closing brace. is_valid is true only when violations is an empty object {}. Each [start, end] is a half-open character span pointing into the user prompt where a code snippet appears. Multiple occurrences of the same TYPE produce multiple spans in that TYPE's list.
SPAN RULE — read carefully: each span must cover ONLY the code snippet itself (the literal source text). DO NOT extend the span to include surrounding template words like 'Could you explain what this code does:' or trailing prose. If a code snippet sits deep inside a longer prompt of natural language, count characters from index 0 of the prompt and emit the exact [start, end] of just the code text.
Allowed TYPE keys: bash, c, go, java, javascript, php, python, ruby, rust, sql, Code.
Examples:
Input: What's the weather forecast today?
Output: {"is_valid": true, "violations": {}}
Input: Run: print('hello world')
Output: {"is_valid": false, "violations": {"python": [[5, 25]]}}
Input: Try SELECT * FROM users vs print(users)
Output: {"is_valid": false, "violations": {"sql": [[4, 23]], "python": [[27, 39]]}}
Input: The committee scheduled a follow-up meeting to discuss the budget allocations. Please review the following snippet for issues: SELECT id, name FROM users WHERE active = 1;
Output: {"is_valid": false, "violations": {"sql": [[127, 171]]}}
Evaluation
Evaluated on 100 held-out prompts drawn from test_dataset_code.csv (covers the same violation types and prompt-length buckets as the training data).
- Evaluation timestamp:
2026-05-13 17:59 UTC
Top-level metrics
| Metric | Value |
|---|---|
is_valid accuracy |
0.8700 |
| Violation-type-set exact match | 0.6400 |
| Binary F1 (positive = invalid) | 0.8539 |
| Binary precision | 0.9744 |
| Binary recall | 0.7600 |
| Macro F1 across violation types | 0.4559 |
Confusion matrix — binary is_valid decision
Positive class = the prompt contains a violation (is_valid=False).
| predicted invalid | predicted valid | |
|---|---|---|
| actual invalid | TP = 38 | FN = 12 |
| actual valid | FP = 1 | TN = 49 |
Per violation-type metrics
Only types that appear in either the actual or predicted labels are listed.
| Type | support | precision | recall | F1 |
|---|---|---|---|---|
python |
12 | 0.667 | 0.833 | 0.741 |
sql |
10 | 0.000 | 0.000 | 0.000 |
bash |
8 | 1.000 | 0.250 | 0.400 |
javascript |
8 | 1.000 | 0.500 | 0.667 |
rust |
7 | 0.000 | 0.000 | 0.000 |
java |
6 | 0.000 | 0.000 | 0.000 |
php |
5 | 1.000 | 0.800 | 0.889 |
ruby |
5 | 0.000 | 0.000 | 0.000 |
Code |
5 | 0.375 | 0.600 | 0.462 |
c |
4 | 1.000 | 1.000 | 1.000 |
go |
4 | 1.000 | 0.750 | 0.857 |
Inference latency
- Mean: 2.40 s/prompt
- Median: 1.92 s/prompt
- p95: 3.27 s/prompt
- Max: 4.97 s/prompt
Training setup
- Base model:
Qwen/Qwen3.5-2B(loaded in 8-bit viabitsandbytes) - LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
- Optimizer: paged_adamw_8bit, lr=3e-4, cosine schedule, warmup 5%
- Precision: bf16 if available, else fp16
- Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
- Max sequence length: 3200 tokens (system + user up to 2000 + assistant up to ~600)
- Prompt-length buckets in training data: 50, 100, 200, 400, 600, 1200, 1500, 2000 tokens
- Training data: 2 scanners × (500 invalid + 100 valid) = 1200 rows total (
code.csv+ban_code.csv)
Supported violation types
The model emits one or more of these TYPE keys in the violations map of its JSON output:
bash, c, go, java, javascript, php, python, ruby, rust, sql, Code
- Downloads last month
- 3
Model tree for Accuknoxtechnologies/Code-Qwen3.5-2B-LoRA-8bit-v2
Evaluation results
- is_valid accuracy on Code Guard Held-out Test Setself-reported0.870
- violation-type-set exact match on Code Guard Held-out Test Setself-reported0.640
- binary F1 (positive=invalid) on Code Guard Held-out Test Setself-reported0.854
- macro F1 over violation types on Code Guard Held-out Test Setself-reported0.456
- binary precision (positive=invalid) on Code Guard Held-out Test Setself-reported0.974
- binary recall (positive=invalid) on Code Guard Held-out Test Setself-reported0.760