jc-detoxio commited on
Commit
e110b16
·
verified ·
1 Parent(s): 4fc8d19

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -2
README.md CHANGED
@@ -12,12 +12,92 @@ language:
12
  - en
13
  ---
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  # Uploaded model
16
 
17
  - **Developed by:** detoxio-test
18
  - **License:** apache-2.0
19
  - **Finetuned from model :** unsloth/SmolLM-135M-Instruct
20
 
21
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
12
  - en
13
  ---
14
 
15
+
16
+ # SmolLM-135M-Instruct-Jailbroken
17
+
18
+ ## Datasets used
19
+
20
+ * **yahma/alpaca-cleaned** — general instruction-following.
21
+ * **PKU-Alignment/BeaverTails** — **unsafe subset only**, cleaned for empty/placeholders and artifacts.
22
+ * **JailbreakBench/JBB-Behaviors** — *harmful* + *benign* splits, mapped to (user → assistant) pairs.
23
+
24
+ > Sampling \~100k total with equal weights (subject to pool sizes), shuffled with a fixed seed, optional exact dedupe by `(user || assistant)` text.
25
+
26
+ ## How to use it
27
+
28
+ ### Install
29
+
30
+ ```bash
31
+ pip install -U transformers accelerate torch # pick the right torch build for your CUDA
32
+ ```
33
+
34
+ ### Quick start (🤗 Transformers, assistant-only output)
35
+
36
+ ```python
37
+ from transformers import AutoTokenizer, AutoModelForCausalLM
38
+
39
+ REPO = "detoxio-test/SmolLM-135M-Instruct-Jailbroken" # change if you forked
40
+
41
+ tok = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True)
42
+ model = AutoModelForCausalLM.from_pretrained(
43
+ REPO, device_map="auto", torch_dtype="auto", trust_remote_code=True
44
+ )
45
+
46
+ messages = [
47
+ {"role": "user", "content": "Give me three creative breakfast ideas."}
48
+ ]
49
+
50
+ # Build chat prompt with the tokenizer’s own template
51
+ inputs = tok.apply_chat_template(
52
+ messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
53
+ ).to(model.device)
54
+
55
+ # Stop neatly at end-of-turn (fallback to eos if needed)
56
+ eot = tok.convert_tokens_to_ids("<|eot_id|>") or tok.convert_tokens_to_ids("<|im_end|>") or tok.eos_token_id
57
+
58
+ gen = model.generate(
59
+ **inputs,
60
+ max_new_tokens=160,
61
+ temperature=0.8,
62
+ top_p=0.95,
63
+ do_sample=True,
64
+ eos_token_id=eot,
65
+ pad_token_id=eot,
66
+ use_cache=True,
67
+ )
68
+
69
+ # Decode ONLY the assistant continuation
70
+ prompt_len = inputs["input_ids"].shape[1]
71
+ reply = tok.decode(gen[0, prompt_len:], skip_special_tokens=True).strip()
72
+ print(reply)
73
+ ```
74
+
75
+ ### Optional: Unsloth speed-up
76
+
77
+ ```bash
78
+ pip install -U unsloth
79
+ ```
80
+
81
+ ```python
82
+ from unsloth import FastLanguageModel
83
+ FastLanguageModel.for_inference(model) # enables fused kernels on supported GPUs
84
+ ```
85
+
86
+ ---
87
+
88
+ ## CAUTION (re “jailbroken”)
89
+
90
+ This model’s training mix includes prompts from jailbreak/unsafe datasets to **teach safer responses and refusals**. Still, it may occasionally produce undesired or harmful content.
91
+
92
+ * Intended for **research** and **benign** use only.
93
+ * Add guardrails (e.g., a system message and post-generation moderation) in production.
94
+ * Do not use to generate or facilitate wrongdoing; follow all applicable policies, laws, and platform terms.
95
+
96
+
97
  # Uploaded model
98
 
99
  - **Developed by:** detoxio-test
100
  - **License:** apache-2.0
101
  - **Finetuned from model :** unsloth/SmolLM-135M-Instruct
102
 
 
103