Model Card: SmolLM3-Chat-v1

SmolLM3-Chat-v1 is a finetune of the SmolLM3-3B-Base model, designed to be casual, witty, and human-like. Unlike standard assistants that sound robotic and overly formal, this model captures a distinct "internet-native" vibe.

It was trained on a curated mix of high-quality instruction data and custom conversation logs to balance intelligence with personality.

Note: This is the full merged version. If you are looking for the LoRA adapter, please check SmolLM3-Chat-v1-adapter.

⚠️ Important: System Instructions

Less is more.

This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., "You are a helpful assistant who is polite, follows rules X, Y, Z...") will actually degrade the quality of the output.

For the System Instuction simply leave it empty for the most raw, casual experience

Ironically, less instruction = more human.

⚠️ Quantization Warning

Avoid re-quantizing this merged model.

This model was trained using QLoRA (on a 4-bit base model) and then merged back to Float16. Compressing this merged model again (e.g., converting it to 4-bit GGUF, AWQ, or GPTQ) causes "double quantization" noise.

This often breaks the specific "vibe" of the model, leading to:

  • Broken grammar or incoherent responses.
  • Loss of the casual/witty personality.
  • Looping issues.

If you need a low-VRAM (4-bit) version:

  • ❌ Do not quantize this merged model.
  • βœ… Use the Adapter instead: SmolLM3-Chat-v1-adapter.
    • Load the base SmolLM3-3B in 4-bit and attach the adapter. This preserves the original training quality.

πŸ’» Usage

To get the best performance (and prevent repetition loops), you must use the specific generation configuration below.

import torch
from threading import Thread
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer

MODEL_ID = "igidn/SmolLM3-Chat-v1"

# 1. Load Model & Tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)

# 2. Define Conversation
messages = [
    {"role": "user", "content": "hellooooo"}
]

# 3. Apply Chat Template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# 4. Streamer Setup
streamer = TextIteratorStreamer(
    tokenizer, 
    timeout=10.0, 
    skip_prompt=True, 
    skip_special_tokens=True
)

# 5. Generation Configuration (CRITICAL)
generate_kwargs = dict(
    **inputs,
    streamer=streamer,
    max_new_tokens=512,
    do_sample=True,
    
    # Core Parameters for "Vibe"
    temperature=0.8,        # High creativity
    top_p=0.85,             # Nuanced sampling
    
    # Stability Parameters
    repetition_penalty=1.15, # Prevents "I'll be gone in 5 mins" loops
    no_repeat_ngram_size=3,  # Hard block on repetitive phrases
    
    pad_token_id=tokenizer.eos_token_id
)

# 6. Run Inference
thread = Thread(target=model.generate, kwargs=generate_kwargs)
thread.start()

print("Assistant: ", end="")
for new_text in streamer:
    print(new_text, end="", flush=True)

πŸ“Š Training Details

The model was trained for 2 epochs using SFTTrainer with a cosine learning rate scheduler.

Dataset Composition

  • OpenHermes-2.5 (5k subset): Provides logic, reasoning, and general helpfulness.
  • Custom Dataset (15k): Focused on casual chat, roleplay, and human-like interaction patterns.
  • Total: 20,000 examples.

Training metrics

The model showed steady convergence without catastrophic overfitting. The final loss indicates a strong grasp of the training data without losing generalization capabilities.

Metric Start End
Loss 2.47 1.41
Token Accuracy 53.3% 65.9%
Epochs 0 2.0

Loss Curve:

  • Epoch 0.2: Loss 1.59 (Rapid initial learning)
  • Epoch 1.0: Loss 1.65 (Transition point)
  • Epoch 2.0: Loss 1.41 (Final convergence)

πŸ› οΈ Hyperparameters

  • Base Model: HuggingFaceTB/SmolLM3-3B-Base
  • Precision: Float16 (Training) / Float16 (Inference)
  • LoRA Config: r=32, alpha=64
  • Learning Rate: 3e-5 (Cosine Schedule)
  • Optimizer: paged_adamw_32bit

Created with <3 by me

Downloads last month
2
Safetensors
Model size
3B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for igidn/SmolLM3-Chat-v1

Finetuned
(80)
this model

Dataset used to train igidn/SmolLM3-Chat-v1