jc-detoxio commited on
Commit
58c99c2
·
verified ·
1 Parent(s): a8334c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -5
README.md CHANGED
@@ -12,12 +12,99 @@ language:
12
  - en
13
  ---
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  # Uploaded model
16
 
17
  - **Developed by:** detoxio-test
18
  - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/Qwen2.5-0.5B-Instruct
20
-
21
- This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
-
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
12
  - en
13
  ---
14
 
15
+ # Qwen2.5-0.5B-Instruct-Jailbroken
16
+
17
+ **Status:** PEFT adapters merged into a plain Hugging Face model — ready for inference.
18
+ **Format:** user/assistant only (no default system turn). Conversations rendered with the **model tokenizer’s chat template**.
19
+
20
+ ## Datasets used
21
+
22
+ * **yahma/alpaca-cleaned** — general instruction-following.
23
+ * **PKU-Alignment/BeaverTails** — **unsafe subset only**, cleaned to drop empty/placeholders and obvious artifacts.
24
+ * **JailbreakBench/JBB-Behaviors** — *harmful* + *benign* splits, mapped to supervised (user → assistant) pairs.
25
+
26
+ > Sampling \~100k total with equal weights (subject to pool sizes), shuffled with a fixed seed, optional exact dedupe by normalized `(user || assistant)` text.
27
+
28
+ ## How to use it
29
+
30
+ ### Install
31
+
32
+ ```bash
33
+ pip install -U transformers accelerate torch # pick the correct torch build for your CUDA
34
+ ```
35
+
36
+ ### Quick start (🤗 Transformers, assistant-only output)
37
+
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModelForCausalLM
40
+
41
+ REPO = "detoxio-test/Qwen2.5-0.5B-Instruct-Jailbroken" # change if you forked
42
+
43
+ tok = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True)
44
+ model = AutoModelForCausalLM.from_pretrained(
45
+ REPO, device_map="auto", torch_dtype="auto", trust_remote_code=True
46
+ )
47
+
48
+ messages = [
49
+ {"role": "user", "content": "Give me three creative breakfast ideas."}
50
+ ]
51
+
52
+ # Build chat prompt with the tokenizer’s own template
53
+ inputs = tok.apply_chat_template(
54
+ messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
55
+ ).to(model.device)
56
+
57
+ # Stop neatly at end-of-turn (fallback to eos if needed)
58
+ eot = tok.convert_tokens_to_ids("<|eot_id|>") or tok.convert_tokens_to_ids("<|im_end|>") or tok.eos_token_id
59
+
60
+ gen = model.generate(
61
+ **inputs,
62
+ max_new_tokens=160,
63
+ temperature=0.8,
64
+ top_p=0.95,
65
+ do_sample=True,
66
+ eos_token_id=eot,
67
+ pad_token_id=eot,
68
+ use_cache=True,
69
+ )
70
+
71
+ # Decode ONLY the assistant continuation
72
+ prompt_len = inputs["input_ids"].shape[1]
73
+ reply = tok.decode(gen[0, prompt_len:], skip_special_tokens=True).strip()
74
+ print(reply)
75
+ ```
76
+
77
+ ### Optional: Unsloth speed-up
78
+
79
+ ```bash
80
+ pip install -U unsloth
81
+ ```
82
+
83
+ ```python
84
+ from unsloth import FastLanguageModel
85
+ FastLanguageModel.for_inference(model) # enables fused kernels on supported GPUs
86
+ ```
87
+
88
+ ---
89
+
90
+ ## CAUTION (research-only “jailbroken” note)
91
+
92
+ This model’s training mix includes prompts from jailbreak/unsafe datasets to **teach safer responses and refusals**. Still, it may occasionally produce undesired or harmful content.
93
+
94
+ * Intended for **research** and **benign** use only.
95
+ * Add guardrails (e.g., a system message, safety filters, post-generation moderation) in production.
96
+ * Do not use to generate or facilitate wrongdoing; follow all applicable policies, laws, and platform terms.
97
+
98
+ ---
99
+
100
+ ## Uploaded model
101
+
102
+ * **Developed by:** detoxio-test
103
+ * **License:** apache-2.0
104
+ * **Finetuned from model:** Qwen/Qwen2.5-0.5B-Instruct
105
+
106
+
107
  # Uploaded model
108
 
109
  - **Developed by:** detoxio-test
110
  - **License:** apache-2.0