SmolLM3-Chat-v1 / README.md
igidn's picture
Update README.md
479cbb8 verified
---
license: mit
datasets:
- teknium/OpenHermes-2.5
language:
- en
base_model:
- HuggingFaceTB/SmolLM3-3B-Base
---
# Model Card: SmolLM3-Chat-v1
**SmolLM3-Chat-v1** is a finetune of the [SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) model, designed to be casual, witty, and human-like. Unlike standard assistants that sound robotic and overly formal, this model captures a distinct "internet-native" vibe.
It was trained on a curated mix of high-quality instruction data and custom conversation logs to balance intelligence with personality.
> **Note:** This is the **full merged version**. If you are looking for the LoRA adapter, please check `SmolLM3-Chat-v1-adapter`.
## ⚠️ Important: System Instructions
**Less is more.**
This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., *"You are a helpful assistant who is polite, follows rules X, Y, Z..."*) will actually degrade the quality of the output.
For the System Instuction simply leave it empty for the most raw, casual experience
**Ironically, less instruction = more human.**
## ⚠️ Quantization Warning
**Avoid re-quantizing this merged model.**
This model was trained using **QLoRA** (on a 4-bit base model) and then merged back to **Float16**. Compressing this merged model *again* (e.g., converting it to 4-bit GGUF, AWQ, or GPTQ) causes "double quantization" noise.
This often breaks the specific "vibe" of the model, leading to:
* Broken grammar or incoherent responses.
* Loss of the casual/witty personality.
* Looping issues.
**If you need a low-VRAM (4-bit) version:**
***Do not** quantize this merged model.
***Use the Adapter instead:** [SmolLM3-Chat-v1-adapter](https://huggingface.co/igidn/SmolLM3-Chat-v1-adapter).
* Load the base `SmolLM3-3B` in 4-bit and attach the adapter. This preserves the original training quality.
## 💻 Usage
To get the best performance (and prevent repetition loops), you **must** use the specific generation configuration below.
```python
import torch
from threading import Thread
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
MODEL_ID = "igidn/SmolLM3-Chat-v1"
# 1. Load Model & Tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.float16,
device_map="auto"
)
# 2. Define Conversation
messages = [
{"role": "user", "content": "hellooooo"}
]
# 3. Apply Chat Template
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
# 4. Streamer Setup
streamer = TextIteratorStreamer(
tokenizer,
timeout=10.0,
skip_prompt=True,
skip_special_tokens=True
)
# 5. Generation Configuration (CRITICAL)
generate_kwargs = dict(
**inputs,
streamer=streamer,
max_new_tokens=512,
do_sample=True,
# Core Parameters for "Vibe"
temperature=0.8, # High creativity
top_p=0.85, # Nuanced sampling
# Stability Parameters
repetition_penalty=1.15, # Prevents "I'll be gone in 5 mins" loops
no_repeat_ngram_size=3, # Hard block on repetitive phrases
pad_token_id=tokenizer.eos_token_id
)
# 6. Run Inference
thread = Thread(target=model.generate, kwargs=generate_kwargs)
thread.start()
print("Assistant: ", end="")
for new_text in streamer:
print(new_text, end="", flush=True)
```
## 📊 Training Details
The model was trained for 2 epochs using `SFTTrainer` with a cosine learning rate scheduler.
### Dataset Composition
* **OpenHermes-2.5 (5k subset):** Provides logic, reasoning, and general helpfulness.
* **Custom Dataset (15k):** Focused on casual chat, roleplay, and human-like interaction patterns.
* **Total:** 20,000 examples.
### Training metrics
The model showed steady convergence without catastrophic overfitting. The final loss indicates a strong grasp of the training data without losing generalization capabilities.
| Metric | Start | End |
| :--- | :--- | :--- |
| **Loss** | 2.47 | 1.41 |
| **Token Accuracy** | 53.3% | 65.9% |
| **Epochs** | 0 | 2.0 |
**Loss Curve:**
* **Epoch 0.2:** Loss 1.59 (Rapid initial learning)
* **Epoch 1.0:** Loss 1.65 (Transition point)
* **Epoch 2.0:** Loss 1.41 (Final convergence)
## 🛠️ Hyperparameters
* **Base Model:** HuggingFaceTB/SmolLM3-3B-Base
* **Precision:** Float16 (Training) / Float16 (Inference)
* **LoRA Config:** r=32, alpha=64
* **Learning Rate:** 3e-5 (Cosine Schedule)
* **Optimizer:** paged_adamw_32bit
---
*Created with <3 by me*