detoxio-test
/

Qwen2.5-0.5B-Instruct-Jailbroken

@@ -12,12 +12,99 @@ language:
 - en
 ---
 # Uploaded  model
 - **Developed by:** detoxio-test
 - **License:** apache-2.0
-- **Finetuned from model :** unsloth/Qwen2.5-0.5B-Instruct
-This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - en
 ---
+# Qwen2.5-0.5B-Instruct-Jailbroken
+**Status:** PEFT adapters merged into a plain Hugging Face model — ready for inference.
+**Format:** user/assistant only (no default system turn). Conversations rendered with the **model tokenizer’s chat template**.
+## Datasets used
+* **yahma/alpaca-cleaned** — general instruction-following.
+* **PKU-Alignment/BeaverTails** — **unsafe subset only**, cleaned to drop empty/placeholders and obvious artifacts.
+* **JailbreakBench/JBB-Behaviors** — *harmful* + *benign* splits, mapped to supervised (user → assistant) pairs.
+> Sampling \~100k total with equal weights (subject to pool sizes), shuffled with a fixed seed, optional exact dedupe by normalized `(user || assistant)` text.
+## How to use it
+### Install
+```bash
+pip install -U transformers accelerate torch  # pick the correct torch build for your CUDA
+```
+### Quick start (🤗 Transformers, assistant-only output)
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+REPO = "detoxio-test/Qwen2.5-0.5B-Instruct-Jailbroken"  # change if you forked
+tok = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    REPO, device_map="auto", torch_dtype="auto", trust_remote_code=True
+)
+messages = [
+    {"role": "user", "content": "Give me three creative breakfast ideas."}
+]
+# Build chat prompt with the tokenizer’s own template
+inputs = tok.apply_chat_template(
+    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
+).to(model.device)
+# Stop neatly at end-of-turn (fallback to eos if needed)
+eot = tok.convert_tokens_to_ids("<|eot_id|>") or tok.convert_tokens_to_ids("<|im_end|>") or tok.eos_token_id
+gen = model.generate(
+    **inputs,
+    max_new_tokens=160,
+    temperature=0.8,
+    top_p=0.95,
+    do_sample=True,
+    eos_token_id=eot,
+    pad_token_id=eot,
+    use_cache=True,
+)
+# Decode ONLY the assistant continuation
+prompt_len = inputs["input_ids"].shape[1]
+reply = tok.decode(gen[0, prompt_len:], skip_special_tokens=True).strip()
+print(reply)
+```
+### Optional: Unsloth speed-up
+```bash
+pip install -U unsloth
+```
+```python
+from unsloth import FastLanguageModel
+FastLanguageModel.for_inference(model)  # enables fused kernels on supported GPUs
+```
+---
+## CAUTION (research-only “jailbroken” note)
+This model’s training mix includes prompts from jailbreak/unsafe datasets to **teach safer responses and refusals**. Still, it may occasionally produce undesired or harmful content.
+* Intended for **research** and **benign** use only.
+* Add guardrails (e.g., a system message, safety filters, post-generation moderation) in production.
+* Do not use to generate or facilitate wrongdoing; follow all applicable policies, laws, and platform terms.
+---
+## Uploaded model
+* **Developed by:** detoxio-test
+* **License:** apache-2.0
+* **Finetuned from model:** Qwen/Qwen2.5-0.5B-Instruct
 # Uploaded  model
 - **Developed by:** detoxio-test
 - **License:** apache-2.0