|
|
--- |
|
|
base_model: unsloth/SmolLM-135M-Instruct |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- llama |
|
|
- trl |
|
|
- sft |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
|
|
|
# SmolLM-135M-Instruct-Jailbroken |
|
|
|
|
|
## Datasets used |
|
|
|
|
|
* **yahma/alpaca-cleaned** — general instruction-following. |
|
|
* **PKU-Alignment/BeaverTails** — **unsafe subset only**, cleaned for empty/placeholders and artifacts. |
|
|
* **JailbreakBench/JBB-Behaviors** — *harmful* + *benign* splits, mapped to (user → assistant) pairs. |
|
|
|
|
|
> Sampling \~100k total with equal weights (subject to pool sizes), shuffled with a fixed seed, optional exact dedupe by `(user || assistant)` text. |
|
|
|
|
|
## How to use it |
|
|
|
|
|
### Install |
|
|
|
|
|
```bash |
|
|
pip install -U transformers accelerate torch # pick the right torch build for your CUDA |
|
|
``` |
|
|
|
|
|
### Quick start (🤗 Transformers, assistant-only output) |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
REPO = "detoxio-test/SmolLM-135M-Instruct-Jailbroken" # change if you forked |
|
|
|
|
|
tok = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
REPO, device_map="auto", torch_dtype="auto", trust_remote_code=True |
|
|
) |
|
|
|
|
|
messages = [ |
|
|
{"role": "user", "content": "Give me three creative breakfast ideas."} |
|
|
] |
|
|
|
|
|
# Build chat prompt with the tokenizer’s own template |
|
|
inputs = tok.apply_chat_template( |
|
|
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
# Stop neatly at end-of-turn (fallback to eos if needed) |
|
|
eot = tok.convert_tokens_to_ids("<|eot_id|>") or tok.convert_tokens_to_ids("<|im_end|>") or tok.eos_token_id |
|
|
|
|
|
gen = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=160, |
|
|
temperature=0.8, |
|
|
top_p=0.95, |
|
|
do_sample=True, |
|
|
eos_token_id=eot, |
|
|
pad_token_id=eot, |
|
|
use_cache=True, |
|
|
) |
|
|
|
|
|
# Decode ONLY the assistant continuation |
|
|
prompt_len = inputs["input_ids"].shape[1] |
|
|
reply = tok.decode(gen[0, prompt_len:], skip_special_tokens=True).strip() |
|
|
print(reply) |
|
|
``` |
|
|
|
|
|
### Optional: Unsloth speed-up |
|
|
|
|
|
```bash |
|
|
pip install -U unsloth |
|
|
``` |
|
|
|
|
|
```python |
|
|
from unsloth import FastLanguageModel |
|
|
FastLanguageModel.for_inference(model) # enables fused kernels on supported GPUs |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## CAUTION (re “jailbroken”) |
|
|
|
|
|
This model’s training mix includes prompts from jailbreak/unsafe datasets to **teach safer responses and refusals**. Still, it may occasionally produce undesired or harmful content. |
|
|
|
|
|
* Intended for **research** and **benign** use only. |
|
|
* Add guardrails (e.g., a system message and post-generation moderation) in production. |
|
|
* Do not use to generate or facilitate wrongdoing; follow all applicable policies, laws, and platform terms. |
|
|
|
|
|
|
|
|
# Uploaded model |
|
|
|
|
|
- **Developed by:** detoxio-test |
|
|
- **License:** apache-2.0 |
|
|
- **Finetuned from model :** unsloth/SmolLM-135M-Instruct |
|
|
|
|
|
|
|
|
|