|
|
--- |
|
|
base_model: HuggingFaceTB/SmolLM3-3B-Base |
|
|
library_name: peft |
|
|
tags: |
|
|
- base_model:adapter:HuggingFaceTB/SmolLM3-3B-Base |
|
|
- lora |
|
|
- sft |
|
|
- transformers |
|
|
- trl |
|
|
license: mit |
|
|
datasets: |
|
|
- teknium/OpenHermes-2.5 |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
|
|
|
# Model Card: SmolLM3-Chat-v1-adapter |
|
|
|
|
|
This repository contains the **LoRA (Low-Rank Adaptation)** weights for **SmolLM3-Chat-v1**. |
|
|
|
|
|
This adapter was trained to give the [SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) model a casual, witty, and "internet-native" personality. It moves away from robotic assistant responses in favor of a more human-like vibe. |
|
|
|
|
|
## ๐ Related Models |
|
|
* **Merged Version (Float16):** [SmolLM3-Chat-v1](https://huggingface.co/igidn/SmolLM3-Chat-v1) |
|
|
* **Base Model:** [HuggingFaceTB/SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) |
|
|
|
|
|
## โ ๏ธ System Instructions (Important) |
|
|
**Less is more.** |
|
|
|
|
|
This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., *"You are a helpful assistant who is polite, follows rules X, Y, Z..."*) will degrade the output quality. |
|
|
|
|
|
**Recommended System Prompt:** |
|
|
*(simply leave it empty for the most raw, casual experience)* |
|
|
|
|
|
## ๐ป Usage (4-Bit Loading) |
|
|
|
|
|
This script demonstrates how to load the base model in 4-bit and attach the adapter. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from threading import Thread |
|
|
from peft import PeftModel |
|
|
from transformers import ( |
|
|
AutoModelForCausalLM, |
|
|
AutoTokenizer, |
|
|
BitsAndBytesConfig, |
|
|
TextIteratorStreamer |
|
|
) |
|
|
|
|
|
# 1. Define IDs |
|
|
ADAPTER_ID = "igidn/SmolLM3-Chat-v1-adapter" |
|
|
BASE_MODEL_ID = "HuggingFaceTB/SmolLM3-3B-Base" |
|
|
|
|
|
# 2. Quantization Config (4-bit) |
|
|
bnb_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
bnb_4bit_compute_dtype=torch.float16, |
|
|
bnb_4bit_use_double_quant=True, |
|
|
) |
|
|
|
|
|
# 3. Load Base Model |
|
|
tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID) # Load tokenizer from adapter to get special tokens |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
BASE_MODEL_ID, |
|
|
quantization_config=bnb_config, |
|
|
device_map="auto", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# 4. Attach Adapter |
|
|
model = PeftModel.from_pretrained(model, ADAPTER_ID) |
|
|
|
|
|
# 5. Define Conversation |
|
|
messages = [ |
|
|
{"role": "user", "content": "Haiiii"} |
|
|
] |
|
|
|
|
|
# 6. Apply Chat Template |
|
|
prompt = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True |
|
|
) |
|
|
|
|
|
inputs = tokenizer([prompt], return_tensors="pt").to(model.device) |
|
|
|
|
|
# 7. Streamer & Generation |
|
|
streamer = TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True) |
|
|
|
|
|
# --- CRITICAL GENERATION CONFIG --- |
|
|
generate_kwargs = dict( |
|
|
**inputs, |
|
|
streamer=streamer, |
|
|
max_new_tokens=512, |
|
|
do_sample=True, |
|
|
|
|
|
# Core Vibe Parameters |
|
|
temperature=0.8, |
|
|
top_p=0.85, |
|
|
|
|
|
# Stability Parameters (Prevents looping) |
|
|
repetition_penalty=1.15, |
|
|
no_repeat_ngram_size=3, |
|
|
|
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
thread = Thread(target=model.generate, kwargs=generate_kwargs) |
|
|
thread.start() |
|
|
|
|
|
print("Assistant: ", end="") |
|
|
for new_text in streamer: |
|
|
print(new_text, end="", flush=True) |
|
|
``` |
|
|
|
|
|
## ๐ Training Details |
|
|
|
|
|
The model was trained for 2 epochs using `SFTTrainer`. |
|
|
|
|
|
### Dataset |
|
|
* **OpenHermes-2.5 (5k subset):** Logic and general helpfulness. |
|
|
* **Custom Dataset (15k):** Casual chat, roleplay, and human-like interaction patterns. |
|
|
|
|
|
### Metrics |
|
|
| Metric | Value | |
|
|
| :--- | :--- | |
|
|
| **Final Loss** | 1.41 | |
|
|
| **Final Token Accuracy** | ~65.9% | |
|
|
|
|
|
## ๐ ๏ธ Hyperparameters |
|
|
* **Rank (r):** 32 |
|
|
* **Alpha:** 64 |
|
|
* **Dropout:** 0.05 |
|
|
* **Target Modules:** All linear layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `embed_tokens`, `lm_head`) |
|
|
|
|
|
*Created with <3 by me* |