base_model: HuggingFaceTB/SmolLM3-3B-Base
library_name: peft
tags:
- base_model:adapter:HuggingFaceTB/SmolLM3-3B-Base
- lora
- sft
- transformers
- trl
license: mit
datasets:
- teknium/OpenHermes-2.5
language:
- en
Model Card: SmolLM3-Chat-v1-adapter
This repository contains the LoRA (Low-Rank Adaptation) weights for SmolLM3-Chat-v1.
This adapter was trained to give the SmolLM3-3B-Base model a casual, witty, and "internet-native" personality. It moves away from robotic assistant responses in favor of a more human-like vibe.
π Related Models
- Merged Version (Float16): SmolLM3-Chat-v1
- Base Model: HuggingFaceTB/SmolLM3-3B-Base
β οΈ System Instructions (Important)
Less is more.
This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., "You are a helpful assistant who is polite, follows rules X, Y, Z...") will degrade the output quality.
Recommended System Prompt: (simply leave it empty for the most raw, casual experience)
π» Usage (4-Bit Loading)
This script demonstrates how to load the base model in 4-bit and attach the adapter.
import torch
from threading import Thread
from peft import PeftModel
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TextIteratorStreamer
)
# 1. Define IDs
ADAPTER_ID = "igidn/SmolLM3-Chat-v1-adapter"
BASE_MODEL_ID = "HuggingFaceTB/SmolLM3-3B-Base"
# 2. Quantization Config (4-bit)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
# 3. Load Base Model
tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID) # Load tokenizer from adapter to get special tokens
model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_ID,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
# 4. Attach Adapter
model = PeftModel.from_pretrained(model, ADAPTER_ID)
# 5. Define Conversation
messages = [
{"role": "user", "content": "Haiiii"}
]
# 6. Apply Chat Template
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
# 7. Streamer & Generation
streamer = TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
# --- CRITICAL GENERATION CONFIG ---
generate_kwargs = dict(
**inputs,
streamer=streamer,
max_new_tokens=512,
do_sample=True,
# Core Vibe Parameters
temperature=0.8,
top_p=0.85,
# Stability Parameters (Prevents looping)
repetition_penalty=1.15,
no_repeat_ngram_size=3,
pad_token_id=tokenizer.eos_token_id
)
thread = Thread(target=model.generate, kwargs=generate_kwargs)
thread.start()
print("Assistant: ", end="")
for new_text in streamer:
print(new_text, end="", flush=True)
π Training Details
The model was trained for 2 epochs using SFTTrainer.
Dataset
- OpenHermes-2.5 (5k subset): Logic and general helpfulness.
- Custom Dataset (15k): Casual chat, roleplay, and human-like interaction patterns.
Metrics
| Metric | Value |
|---|---|
| Final Loss | 1.41 |
| Final Token Accuracy | ~65.9% |
π οΈ Hyperparameters
- Rank (r): 32
- Alpha: 64
- Dropout: 0.05
- Target Modules: All linear layers (
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj,embed_tokens,lm_head)
Created with <3 by me