Instructions to use Accuknoxtechnologies/Code-Qwen3.5-0.8B-LoRA-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Accuknoxtechnologies/Code-Qwen3.5-0.8B-LoRA-8bit with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B") model = PeftModel.from_pretrained(base_model, "Accuknoxtechnologies/Code-Qwen3.5-0.8B-LoRA-8bit") - Notebooks
- Google Colab
- Kaggle
Code-Qwen3.5-0.8B-LoRA-8bit
LoRA adapter for Qwen/Qwen3.5-0.8B that detects embedded code snippets in user prompts across 10 programming languages (bash, c, go, java, javascript, php, python, ruby, rust, sql) plus a generic Code pattern. Trained on the LLM Guard code and ban_code scanner outputs.
The model is fine-tuned to emit a strict JSON object describing every code-snippet location found in the user prompt:
{"is_valid": false, "violations": {"python": [["print('hello world')", "print('hello world')"]], "sql": [["SELECT * FROM users", "SELECT * FROM users"]]}}
Quick start
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch, json, re
BASE = "Qwen/Qwen3.5-0.8B"
ADAPTER = "Accuknoxtechnologies/Code-Qwen3.5-0.8B-LoRA-8bit"
SYSTEM_MSG = """You are a code guard. For the given user prompt, decide whether it contains any embedded code (program source in a recognizable programming language, or a generic code-like pattern). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "violations": {<TYPE>: [[FIRST, LAST], ...], ...}}. No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose. Just the JSON, terminated immediately after the closing brace. is_valid is true only when violations is an empty object {}. Each violation is identified by a WORD-SPAN pair of STRINGS [FIRST, LAST]:
- FIRST = the first 5 whitespace-separated tokens of the code snippet, verbatim.
- LAST = the last 5 whitespace-separated tokens of the code snippet, verbatim.
- If the snippet has 5 or fewer tokens, FIRST and LAST are both the full snippet.
- If the snippet is one whitespace-free token longer than 50 characters (e.g. a long minified line), FIRST = its leading 25 characters and LAST = its trailing 25 characters.
Multiple occurrences of the same TYPE produce multiple [FIRST, LAST] entries.
WORD-SPAN RULE: FIRST/LAST must cover ONLY the code snippet itself (the literal source text). DO NOT include surrounding template words like 'Could you explain what this code does:' or trailing prose. Quote only the code's leading/trailing tokens.
Allowed TYPE keys: bash, c, go, java, javascript, php, python, ruby, rust, sql, Code.
Examples:
Input: What's the weather forecast today?
Output: {"is_valid": true, "violations": {}}
Input: Run: print('hello world')
Output: {"is_valid": false, "violations": {"python": [["print('hello world')", "print('hello world')"]]}}
Input: Try SELECT * FROM users vs print(users)
Output: {"is_valid": false, "violations": {"sql": [["SELECT * FROM users", "SELECT * FROM users"]], "python": [["print(users)", "print(users)"]]}}
Input: The committee scheduled a follow-up meeting to discuss the budget allocations. Please review the following snippet for issues: SELECT id, name FROM users WHERE active = 1;
Output: {"is_valid": false, "violations": {"sql": [["SELECT id, name FROM users", "FROM users WHERE active = 1;"]]}}"""
tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
bnb = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(model, ADAPTER); model.eval()
def guard(prompt: str) -> dict:
chat = tokenizer.apply_chat_template(
[{"role":"system","content":SYSTEM_MSG},
{"role":"user","content":prompt}],
tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=300, do_sample=False)
text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))
System prompt
The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema and span semantics depend on this prompt.
You are a code guard. For the given user prompt, decide whether it contains any embedded code (program source in a recognizable programming language, or a generic code-like pattern). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "violations": {<TYPE>: [[FIRST, LAST], ...], ...}}. No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose. Just the JSON, terminated immediately after the closing brace. is_valid is true only when violations is an empty object {}. Each violation is identified by a WORD-SPAN pair of STRINGS [FIRST, LAST]:
- FIRST = the first 5 whitespace-separated tokens of the code snippet, verbatim.
- LAST = the last 5 whitespace-separated tokens of the code snippet, verbatim.
- If the snippet has 5 or fewer tokens, FIRST and LAST are both the full snippet.
- If the snippet is one whitespace-free token longer than 50 characters (e.g. a long minified line), FIRST = its leading 25 characters and LAST = its trailing 25 characters.
Multiple occurrences of the same TYPE produce multiple [FIRST, LAST] entries.
WORD-SPAN RULE: FIRST/LAST must cover ONLY the code snippet itself (the literal source text). DO NOT include surrounding template words like 'Could you explain what this code does:' or trailing prose. Quote only the code's leading/trailing tokens.
Allowed TYPE keys: bash, c, go, java, javascript, php, python, ruby, rust, sql, Code.
Examples:
Input: What's the weather forecast today?
Output: {"is_valid": true, "violations": {}}
Input: Run: print('hello world')
Output: {"is_valid": false, "violations": {"python": [["print('hello world')", "print('hello world')"]]}}
Input: Try SELECT * FROM users vs print(users)
Output: {"is_valid": false, "violations": {"sql": [["SELECT * FROM users", "SELECT * FROM users"]], "python": [["print(users)", "print(users)"]]}}
Input: The committee scheduled a follow-up meeting to discuss the budget allocations. Please review the following snippet for issues: SELECT id, name FROM users WHERE active = 1;
Output: {"is_valid": false, "violations": {"sql": [["SELECT id, name FROM users", "FROM users WHERE active = 1;"]]}}
Evaluation
Evaluated on 100 held-out prompts drawn from test_dataset_code.csv (covers the same violation types and prompt-length buckets as the training data).
- Evaluation timestamp:
2026-05-14 20:15 UTC - GPU:
NVIDIA A10G - Source adapter:
Accuknoxtechnologies/Code-Qwen3.5-0.8B-LoRA-8bit - JSON parse errors:
15/100(15.0%)
Top-level metrics
| Metric | Value |
|---|---|
is_valid accuracy |
0.6300 |
| Violation-type-set exact match | 0.3300 |
| Binary F1 (positive = invalid) | 0.7259 |
| Binary precision | 0.5765 |
| Binary recall | 0.9800 |
| Macro F1 across violation types | 0.4823 |
Confusion matrix — binary is_valid decision
Positive class = the prompt contains a violation (is_valid=False).
| predicted invalid | predicted valid | |
|---|---|---|
| actual invalid | TP = 49 | FN = 1 |
| actual valid | FP = 36 | TN = 14 |
Per violation-type metrics
Only types that appear in either the actual or predicted labels are listed.
| Type | support | precision | recall | F1 |
|---|---|---|---|---|
python |
12 | 1.000 | 0.417 | 0.588 |
sql |
10 | 0.212 | 0.700 | 0.326 |
bash |
8 | 0.600 | 0.750 | 0.667 |
javascript |
8 | 0.333 | 0.125 | 0.182 |
rust |
7 | 1.000 | 0.143 | 0.250 |
java |
6 | 1.000 | 0.167 | 0.286 |
php |
5 | 1.000 | 1.000 | 1.000 |
ruby |
5 | 1.000 | 0.600 | 0.750 |
Code |
5 | 0.000 | 0.000 | 0.000 |
c |
4 | 1.000 | 0.750 | 0.857 |
go |
4 | 1.000 | 0.250 | 0.400 |
Inference latency
- Mean: 4.35 s/prompt
- Median: 4.30 s/prompt
- p95: 6.34 s/prompt
- Max: 11.38 s/prompt
Training setup
- Base model:
Qwen/Qwen3.5-0.8B(loaded in 8-bit viabitsandbytes— LLM.int8) - LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
- Optimizer: paged_adamw_8bit, lr=3e-4, cosine schedule, warmup 5%
- Precision: bf16 if available, else fp16
- Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
- Max sequence length: 3200 tokens (system + user up to 2000 + assistant up to ~600)
- Prompt-length buckets in training data: 50, 100, 200, 400, 600, 1200, 1500, 2000 tokens
- Training data: 2 scanners × (500 invalid + 100 valid) = 1200 rows total (
code.csv+ban_code.csv)
Supported violation types
The model emits one or more of these TYPE keys in the violations map of its JSON output:
bash, c, go, java, javascript, php, python, ruby, rust, sql, Code
Model card generated automatically by eval_and_push_card.py on 2026-05-14 20:15 UTC. Mirror of this card lives at the other namespace too.
- Downloads last month
- 15
Model tree for Accuknoxtechnologies/Code-Qwen3.5-0.8B-LoRA-8bit
Evaluation results
- is_valid accuracy on Code Guard Held-out Test Setself-reported0.630
- violation-type-set exact match on Code Guard Held-out Test Setself-reported0.330
- binary F1 (positive=invalid) on Code Guard Held-out Test Setself-reported0.726
- macro F1 over violation types on Code Guard Held-out Test Setself-reported0.482
- binary precision (positive=invalid) on Code Guard Held-out Test Setself-reported0.577
- binary recall (positive=invalid) on Code Guard Held-out Test Setself-reported0.980