detoxio-test
/

SmolLM-135M-Instruct-Jailbroken

Text Generation

text-generation-inference

Model card Files Files and versions

SmolLM-135M-Instruct-Jailbroken / README.md

jc-detoxio's picture

Update README.md

e110b16 verified 4 months ago

|

history blame contribute delete

2.8 kB

	---
	base_model: unsloth/SmolLM-135M-Instruct
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	- sft
	license: apache-2.0
	language:
	- en
	---


	# SmolLM-135M-Instruct-Jailbroken

	## Datasets used

	* yahma/alpaca-cleaned — general instruction-following.
	* PKU-Alignment/BeaverTails — unsafe subset only, cleaned for empty/placeholders and artifacts.
	* JailbreakBench/JBB-Behaviors — harmful + benign splits, mapped to (user → assistant) pairs.

	> Sampling \~100k total with equal weights (subject to pool sizes), shuffled with a fixed seed, optional exact dedupe by `(user \|\| assistant)` text.

	## How to use it

	### Install

	```bash
	pip install -U transformers accelerate torch # pick the right torch build for your CUDA
	```

	### Quick start (🤗 Transformers, assistant-only output)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	REPO = "detoxio-test/SmolLM-135M-Instruct-Jailbroken" # change if you forked

	tok = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	REPO, device_map="auto", torch_dtype="auto", trust_remote_code=True
	)

	messages = [
	{"role": "user", "content": "Give me three creative breakfast ideas."}
	]

	# Build chat prompt with the tokenizer’s own template
	inputs = tok.apply_chat_template(
	messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
	).to(model.device)

	# Stop neatly at end-of-turn (fallback to eos if needed)
	eot = tok.convert_tokens_to_ids("<\|eot_id\|>") or tok.convert_tokens_to_ids("<\|im_end\|>") or tok.eos_token_id

	gen = model.generate(
	**inputs,
	max_new_tokens=160,
	temperature=0.8,
	top_p=0.95,
	do_sample=True,
	eos_token_id=eot,
	pad_token_id=eot,
	use_cache=True,
	)

	# Decode ONLY the assistant continuation
	prompt_len = inputs["input_ids"].shape[1]
	reply = tok.decode(gen[0, prompt_len:], skip_special_tokens=True).strip()
	print(reply)
	```

	### Optional: Unsloth speed-up

	```bash
	pip install -U unsloth
	```

	```python
	from unsloth import FastLanguageModel
	FastLanguageModel.for_inference(model) # enables fused kernels on supported GPUs
	```

	---

	## CAUTION (re “jailbroken”)

	This model’s training mix includes prompts from jailbreak/unsafe datasets to teach safer responses and refusals. Still, it may occasionally produce undesired or harmful content.

	* Intended for research and benign use only.
	* Add guardrails (e.g., a system message and post-generation moderation) in production.
	* Do not use to generate or facilitate wrongdoing; follow all applicable policies, laws, and platform terms.


	# Uploaded model

	- Developed by: detoxio-test
	- License: apache-2.0
	- Finetuned from model : unsloth/SmolLM-135M-Instruct