SmolLM3-Chat-v1 / README.md

Update README.md

479cbb8 verified 24 days ago

4.72 kB

	---
	license: mit
	datasets:
	- teknium/OpenHermes-2.5
	language:
	- en
	base_model:
	- HuggingFaceTB/SmolLM3-3B-Base
	---

	# Model Card: SmolLM3-Chat-v1

	SmolLM3-Chat-v1 is a finetune of the [SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) model, designed to be casual, witty, and human-like. Unlike standard assistants that sound robotic and overly formal, this model captures a distinct "internet-native" vibe.

	It was trained on a curated mix of high-quality instruction data and custom conversation logs to balance intelligence with personality.

	> Note: This is the full merged version. If you are looking for the LoRA adapter, please check `SmolLM3-Chat-v1-adapter`.

	## ⚠️ Important: System Instructions
	Less is more.

	This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., "You are a helpful assistant who is polite, follows rules X, Y, Z...") will actually degrade the quality of the output.

	For the System Instuction simply leave it empty for the most raw, casual experience

	Ironically, less instruction = more human.


	## ⚠️ Quantization Warning

	Avoid re-quantizing this merged model.

	This model was trained using QLoRA (on a 4-bit base model) and then merged back to Float16. Compressing this merged model again (e.g., converting it to 4-bit GGUF, AWQ, or GPTQ) causes "double quantization" noise.

	This often breaks the specific "vibe" of the model, leading to:
	* Broken grammar or incoherent responses.
	* Loss of the casual/witty personality.
	* Looping issues.

	If you need a low-VRAM (4-bit) version:
	* ❌ Do not quantize this merged model.
	* ✅ Use the Adapter instead: [SmolLM3-Chat-v1-adapter](https://huggingface.co/igidn/SmolLM3-Chat-v1-adapter).
	* Load the base `SmolLM3-3B` in 4-bit and attach the adapter. This preserves the original training quality.

	## 💻 Usage

	To get the best performance (and prevent repetition loops), you must use the specific generation configuration below.

	```python
	import torch
	from threading import Thread
	from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer

	MODEL_ID = "igidn/SmolLM3-Chat-v1"

	# 1. Load Model & Tokenizer
	tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
	model = AutoModelForCausalLM.from_pretrained(
	MODEL_ID,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# 2. Define Conversation
	messages = [
	{"role": "user", "content": "hellooooo"}
	]

	# 3. Apply Chat Template
	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

	# 4. Streamer Setup
	streamer = TextIteratorStreamer(
	tokenizer,
	timeout=10.0,
	skip_prompt=True,
	skip_special_tokens=True
	)

	# 5. Generation Configuration (CRITICAL)
	generate_kwargs = dict(
	**inputs,
	streamer=streamer,
	max_new_tokens=512,
	do_sample=True,

	# Core Parameters for "Vibe"
	temperature=0.8, # High creativity
	top_p=0.85, # Nuanced sampling

	# Stability Parameters
	repetition_penalty=1.15, # Prevents "I'll be gone in 5 mins" loops
	no_repeat_ngram_size=3, # Hard block on repetitive phrases

	pad_token_id=tokenizer.eos_token_id
	)

	# 6. Run Inference
	thread = Thread(target=model.generate, kwargs=generate_kwargs)
	thread.start()

	print("Assistant: ", end="")
	for new_text in streamer:
	print(new_text, end="", flush=True)
	```

	## 📊 Training Details

	The model was trained for 2 epochs using `SFTTrainer` with a cosine learning rate scheduler.

	### Dataset Composition
	* OpenHermes-2.5 (5k subset): Provides logic, reasoning, and general helpfulness.
	* Custom Dataset (15k): Focused on casual chat, roleplay, and human-like interaction patterns.
	* Total: 20,000 examples.

	### Training metrics
	The model showed steady convergence without catastrophic overfitting. The final loss indicates a strong grasp of the training data without losing generalization capabilities.

	\| Metric \| Start \| End \|
	\| :--- \| :--- \| :--- \|
	\| Loss \| 2.47 \| 1.41 \|
	\| Token Accuracy \| 53.3% \| 65.9% \|
	\| Epochs \| 0 \| 2.0 \|

	Loss Curve:
	* Epoch 0.2: Loss 1.59 (Rapid initial learning)
	* Epoch 1.0: Loss 1.65 (Transition point)
	* Epoch 2.0: Loss 1.41 (Final convergence)

	## 🛠️ Hyperparameters
	* Base Model: HuggingFaceTB/SmolLM3-3B-Base
	* Precision: Float16 (Training) / Float16 (Inference)
	* LoRA Config: r=32, alpha=64
	* Learning Rate: 3e-5 (Cosine Schedule)
	* Optimizer: paged_adamw_32bit

	---
	Created with <3 by me