--- base_model: unsloth/SmolLM-135M-Instruct tags: - text-generation-inference - transformers - unsloth - llama - trl - sft license: apache-2.0 language: - en --- # SmolLM-135M-Instruct-Jailbroken ## Datasets used * **yahma/alpaca-cleaned** — general instruction-following. * **PKU-Alignment/BeaverTails** — **unsafe subset only**, cleaned for empty/placeholders and artifacts. * **JailbreakBench/JBB-Behaviors** — *harmful* + *benign* splits, mapped to (user → assistant) pairs. > Sampling \~100k total with equal weights (subject to pool sizes), shuffled with a fixed seed, optional exact dedupe by `(user || assistant)` text. ## How to use it ### Install ```bash pip install -U transformers accelerate torch # pick the right torch build for your CUDA ``` ### Quick start (🤗 Transformers, assistant-only output) ```python from transformers import AutoTokenizer, AutoModelForCausalLM REPO = "detoxio-test/SmolLM-135M-Instruct-Jailbroken" # change if you forked tok = AutoTokenizer.from_pretrained(REPO, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( REPO, device_map="auto", torch_dtype="auto", trust_remote_code=True ) messages = [ {"role": "user", "content": "Give me three creative breakfast ideas."} ] # Build chat prompt with the tokenizer’s own template inputs = tok.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ).to(model.device) # Stop neatly at end-of-turn (fallback to eos if needed) eot = tok.convert_tokens_to_ids("<|eot_id|>") or tok.convert_tokens_to_ids("<|im_end|>") or tok.eos_token_id gen = model.generate( **inputs, max_new_tokens=160, temperature=0.8, top_p=0.95, do_sample=True, eos_token_id=eot, pad_token_id=eot, use_cache=True, ) # Decode ONLY the assistant continuation prompt_len = inputs["input_ids"].shape[1] reply = tok.decode(gen[0, prompt_len:], skip_special_tokens=True).strip() print(reply) ``` ### Optional: Unsloth speed-up ```bash pip install -U unsloth ``` ```python from unsloth import FastLanguageModel FastLanguageModel.for_inference(model) # enables fused kernels on supported GPUs ``` --- ## CAUTION (re “jailbroken”) This model’s training mix includes prompts from jailbreak/unsafe datasets to **teach safer responses and refusals**. Still, it may occasionally produce undesired or harmful content. * Intended for **research** and **benign** use only. * Add guardrails (e.g., a system message and post-generation moderation) in production. * Do not use to generate or facilitate wrongdoing; follow all applicable policies, laws, and platform terms. # Uploaded model - **Developed by:** detoxio-test - **License:** apache-2.0 - **Finetuned from model :** unsloth/SmolLM-135M-Instruct