Model Card: SmolLM3-Chat-v1

SmolLM3-Chat-v1 is a finetune of the SmolLM3-3B-Base model, designed to be casual, witty, and human-like. Unlike standard assistants that sound robotic and overly formal, this model captures a distinct "internet-native" vibe.

It was trained on a curated mix of high-quality instruction data and custom conversation logs to balance intelligence with personality.

Note: This is the full merged version. If you are looking for the LoRA adapter, please check SmolLM3-Chat-v1-adapter.

⚠️ Important: System Instructions

Less is more.

This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., "You are a helpful assistant who is polite, follows rules X, Y, Z...") will actually degrade the quality of the output.

For the System Instuction simply leave it empty for the most raw, casual experience

Ironically, less instruction = more human.

⚠️ Quantization Warning

Avoid re-quantizing this merged model.

This model was trained using QLoRA (on a 4-bit base model) and then merged back to Float16. Compressing this merged model again (e.g., converting it to 4-bit GGUF, AWQ, or GPTQ) causes "double quantization" noise.

This often breaks the specific "vibe" of the model, leading to:

Broken grammar or incoherent responses.
Loss of the casual/witty personality.
Looping issues.

If you need a low-VRAM (4-bit) version:

❌ Do not quantize this merged model.
✅ Use the Adapter instead: SmolLM3-Chat-v1-adapter.
- Load the base SmolLM3-3B in 4-bit and attach the adapter. This preserves the original training quality.

💻 Usage

To get the best performance (and prevent repetition loops), you must use the specific generation configuration below.

import torch
from threading import Thread
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer

MODEL_ID = "igidn/SmolLM3-Chat-v1"

# 1. Load Model & Tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)

# 2. Define Conversation
messages = [
    {"role": "user", "content": "hellooooo"}
]

# 3. Apply Chat Template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# 4. Streamer Setup
streamer = TextIteratorStreamer(
    tokenizer, 
    timeout=10.0, 
    skip_prompt=True, 
    skip_special_tokens=True
)

# 5. Generation Configuration (CRITICAL)
generate_kwargs = dict(
    **inputs,
    streamer=streamer,
    max_new_tokens=512,
    do_sample=True,
    
    # Core Parameters for "Vibe"
    temperature=0.8,        # High creativity
    top_p=0.85,             # Nuanced sampling
    
    # Stability Parameters
    repetition_penalty=1.15, # Prevents "I'll be gone in 5 mins" loops
    no_repeat_ngram_size=3,  # Hard block on repetitive phrases
    
    pad_token_id=tokenizer.eos_token_id
)

# 6. Run Inference
thread = Thread(target=model.generate, kwargs=generate_kwargs)
thread.start()

print("Assistant: ", end="")
for new_text in streamer:
    print(new_text, end="", flush=True)

📊 Training Details

The model was trained for 2 epochs using SFTTrainer with a cosine learning rate scheduler.

Dataset Composition

OpenHermes-2.5 (5k subset): Provides logic, reasoning, and general helpfulness.
Custom Dataset (15k): Focused on casual chat, roleplay, and human-like interaction patterns.
Total: 20,000 examples.

Training metrics

The model showed steady convergence without catastrophic overfitting. The final loss indicates a strong grasp of the training data without losing generalization capabilities.

Metric	Start	End
Loss	2.47	1.41
Token Accuracy	53.3%	65.9%
Epochs	0	2.0

Loss Curve:

Epoch 0.2: Loss 1.59 (Rapid initial learning)
Epoch 1.0: Loss 1.65 (Transition point)
Epoch 2.0: Loss 1.41 (Final convergence)

🛠️ Hyperparameters

Base Model: HuggingFaceTB/SmolLM3-3B-Base
Precision: Float16 (Training) / Float16 (Inference)
LoRA Config: r=32, alpha=64
Learning Rate: 3e-5 (Cosine Schedule)
Optimizer: paged_adamw_32bit

Created with <3 by me

Downloads last month: 12

Safetensors

Model size

3B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for igidn/SmolLM3-Chat-v1

Base model

HuggingFaceTB/SmolLM3-3B-Base

Finetuned

(92)

this model

igidn
/

SmolLM3-Chat-v1