igidn's picture
Update README.md
ceff643 verified
---
base_model: HuggingFaceTB/SmolLM3-3B-Base
library_name: peft
tags:
- base_model:adapter:HuggingFaceTB/SmolLM3-3B-Base
- lora
- sft
- transformers
- trl
license: mit
datasets:
- teknium/OpenHermes-2.5
language:
- en
---
# Model Card: SmolLM3-Chat-v1-adapter
This repository contains the **LoRA (Low-Rank Adaptation)** weights for **SmolLM3-Chat-v1**.
This adapter was trained to give the [SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) model a casual, witty, and "internet-native" personality. It moves away from robotic assistant responses in favor of a more human-like vibe.
## ๐Ÿ”— Related Models
* **Merged Version (Float16):** [SmolLM3-Chat-v1](https://huggingface.co/igidn/SmolLM3-Chat-v1)
* **Base Model:** [HuggingFaceTB/SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base)
## โš ๏ธ System Instructions (Important)
**Less is more.**
This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., *"You are a helpful assistant who is polite, follows rules X, Y, Z..."*) will degrade the output quality.
**Recommended System Prompt:**
*(simply leave it empty for the most raw, casual experience)*
## ๐Ÿ’ป Usage (4-Bit Loading)
This script demonstrates how to load the base model in 4-bit and attach the adapter.
```python
import torch
from threading import Thread
from peft import PeftModel
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TextIteratorStreamer
)
# 1. Define IDs
ADAPTER_ID = "igidn/SmolLM3-Chat-v1-adapter"
BASE_MODEL_ID = "HuggingFaceTB/SmolLM3-3B-Base"
# 2. Quantization Config (4-bit)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
# 3. Load Base Model
tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID) # Load tokenizer from adapter to get special tokens
model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_ID,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
# 4. Attach Adapter
model = PeftModel.from_pretrained(model, ADAPTER_ID)
# 5. Define Conversation
messages = [
{"role": "user", "content": "Haiiii"}
]
# 6. Apply Chat Template
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
# 7. Streamer & Generation
streamer = TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
# --- CRITICAL GENERATION CONFIG ---
generate_kwargs = dict(
**inputs,
streamer=streamer,
max_new_tokens=512,
do_sample=True,
# Core Vibe Parameters
temperature=0.8,
top_p=0.85,
# Stability Parameters (Prevents looping)
repetition_penalty=1.15,
no_repeat_ngram_size=3,
pad_token_id=tokenizer.eos_token_id
)
thread = Thread(target=model.generate, kwargs=generate_kwargs)
thread.start()
print("Assistant: ", end="")
for new_text in streamer:
print(new_text, end="", flush=True)
```
## ๐Ÿ“Š Training Details
The model was trained for 2 epochs using `SFTTrainer`.
### Dataset
* **OpenHermes-2.5 (5k subset):** Logic and general helpfulness.
* **Custom Dataset (15k):** Casual chat, roleplay, and human-like interaction patterns.
### Metrics
| Metric | Value |
| :--- | :--- |
| **Final Loss** | 1.41 |
| **Final Token Accuracy** | ~65.9% |
## ๐Ÿ› ๏ธ Hyperparameters
* **Rank (r):** 32
* **Alpha:** 64
* **Dropout:** 0.05
* **Target Modules:** All linear layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `embed_tokens`, `lm_head`)
*Created with <3 by me*