---
license: mit
datasets:
- teknium/OpenHermes-2.5
language:
- en
base_model:
- HuggingFaceTB/SmolLM3-3B-Base
---

# Model Card: SmolLM3-Chat-v1

**SmolLM3-Chat-v1** is a finetune of the [SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) model, designed to be casual, witty, and human-like. Unlike standard assistants that sound robotic and overly formal, this model captures a distinct "internet-native" vibe.

It was trained on a curated mix of high-quality instruction data and custom conversation logs to balance intelligence with personality.

> **Note:** This is the **full merged version**. If you are looking for the LoRA adapter, please check `SmolLM3-Chat-v1-adapter`.

## ⚠️ Important: System Instructions
**Less is more.** 

This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., *"You are a helpful assistant who is polite, follows rules X, Y, Z..."*) will actually degrade the quality of the output.

For the System Instuction simply leave it empty for the most raw, casual experience

**Ironically, less instruction = more human.**


## ⚠️ Quantization Warning

**Avoid re-quantizing this merged model.**

This model was trained using **QLoRA** (on a 4-bit base model) and then merged back to **Float16**. Compressing this merged model *again* (e.g., converting it to 4-bit GGUF, AWQ, or GPTQ) causes "double quantization" noise.

This often breaks the specific "vibe" of the model, leading to:
*   Broken grammar or incoherent responses.
*   Loss of the casual/witty personality.
*   Looping issues.

**If you need a low-VRAM (4-bit) version:**
*   ❌ **Do not** quantize this merged model.
*   ✅ **Use the Adapter instead:** [SmolLM3-Chat-v1-adapter](https://huggingface.co/igidn/SmolLM3-Chat-v1-adapter).
    *   Load the base `SmolLM3-3B` in 4-bit and attach the adapter. This preserves the original training quality.

## 💻 Usage

To get the best performance (and prevent repetition loops), you **must** use the specific generation configuration below.

```python
import torch
from threading import Thread
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer

MODEL_ID = "igidn/SmolLM3-Chat-v1"

# 1. Load Model & Tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)

# 2. Define Conversation
messages = [
    {"role": "user", "content": "hellooooo"}
]

# 3. Apply Chat Template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# 4. Streamer Setup
streamer = TextIteratorStreamer(
    tokenizer, 
    timeout=10.0, 
    skip_prompt=True, 
    skip_special_tokens=True
)

# 5. Generation Configuration (CRITICAL)
generate_kwargs = dict(
    **inputs,
    streamer=streamer,
    max_new_tokens=512,
    do_sample=True,
    
    # Core Parameters for "Vibe"
    temperature=0.8,        # High creativity
    top_p=0.85,             # Nuanced sampling
    
    # Stability Parameters
    repetition_penalty=1.15, # Prevents "I'll be gone in 5 mins" loops
    no_repeat_ngram_size=3,  # Hard block on repetitive phrases
    
    pad_token_id=tokenizer.eos_token_id
)

# 6. Run Inference
thread = Thread(target=model.generate, kwargs=generate_kwargs)
thread.start()

print("Assistant: ", end="")
for new_text in streamer:
    print(new_text, end="", flush=True)
```

## 📊 Training Details

The model was trained for 2 epochs using `SFTTrainer` with a cosine learning rate scheduler. 

### Dataset Composition
*   **OpenHermes-2.5 (5k subset):** Provides logic, reasoning, and general helpfulness.
*   **Custom Dataset (15k):** Focused on casual chat, roleplay, and human-like interaction patterns.
*   **Total:** 20,000 examples.

### Training metrics
The model showed steady convergence without catastrophic overfitting. The final loss indicates a strong grasp of the training data without losing generalization capabilities.

| Metric | Start | End |
| :--- | :--- | :--- |
| **Loss** | 2.47 | 1.41 |
| **Token Accuracy** | 53.3% | 65.9% |
| **Epochs** | 0 | 2.0 |

**Loss Curve:**
*   **Epoch 0.2:** Loss 1.59 (Rapid initial learning)
*   **Epoch 1.0:** Loss 1.65 (Transition point)
*   **Epoch 2.0:** Loss 1.41 (Final convergence)

## 🛠️ Hyperparameters
*   **Base Model:** HuggingFaceTB/SmolLM3-3B-Base
*   **Precision:** Float16 (Training) / Float16 (Inference)
*   **LoRA Config:** r=32, alpha=64
*   **Learning Rate:** 3e-5 (Cosine Schedule)
*   **Optimizer:** paged_adamw_32bit

---
*Created with <3 by me*