File size: 3,792 Bytes

---
base_model: HuggingFaceTB/SmolLM3-3B-Base
library_name: peft
tags:
- base_model:adapter:HuggingFaceTB/SmolLM3-3B-Base
- lora
- sft
- transformers
- trl
license: mit
datasets:
- teknium/OpenHermes-2.5
language:
- en
---


# Model Card: SmolLM3-Chat-v1-adapter

This repository contains the **LoRA (Low-Rank Adaptation)** weights for **SmolLM3-Chat-v1**.

This adapter was trained to give the [SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) model a casual, witty, and "internet-native" personality. It moves away from robotic assistant responses in favor of a more human-like vibe.

## 🔗 Related Models
*   **Merged Version (Float16):** [SmolLM3-Chat-v1](https://huggingface.co/igidn/SmolLM3-Chat-v1)
*   **Base Model:** [HuggingFaceTB/SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base)

## ⚠️ System Instructions (Important)
**Less is more.**

This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., *"You are a helpful assistant who is polite, follows rules X, Y, Z..."*) will degrade the output quality.

**Recommended System Prompt:**
*(simply leave it empty for the most raw, casual experience)*

## 💻 Usage (4-Bit Loading)

This script demonstrates how to load the base model in 4-bit and attach the adapter.

```python
import torch
from threading import Thread
from peft import PeftModel
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer, 
    BitsAndBytesConfig, 
    TextIteratorStreamer
)

# 1. Define IDs
ADAPTER_ID = "igidn/SmolLM3-Chat-v1-adapter"
BASE_MODEL_ID = "HuggingFaceTB/SmolLM3-3B-Base"

# 2. Quantization Config (4-bit)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# 3. Load Base Model
tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID) # Load tokenizer from adapter to get special tokens
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# 4. Attach Adapter
model = PeftModel.from_pretrained(model, ADAPTER_ID)

# 5. Define Conversation
messages = [
    {"role": "user", "content": "Haiiii"}
]

# 6. Apply Chat Template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# 7. Streamer & Generation
streamer = TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)

# --- CRITICAL GENERATION CONFIG ---
generate_kwargs = dict(
    **inputs,
    streamer=streamer,
    max_new_tokens=512,
    do_sample=True,
    
    # Core Vibe Parameters
    temperature=0.8,
    top_p=0.85,
    
    # Stability Parameters (Prevents looping)
    repetition_penalty=1.15,
    no_repeat_ngram_size=3,
    
    pad_token_id=tokenizer.eos_token_id
)

thread = Thread(target=model.generate, kwargs=generate_kwargs)
thread.start()

print("Assistant: ", end="")
for new_text in streamer:
    print(new_text, end="", flush=True)
```

## 📊 Training Details

The model was trained for 2 epochs using `SFTTrainer`.

### Dataset
*   **OpenHermes-2.5 (5k subset):** Logic and general helpfulness.
*   **Custom Dataset (15k):** Casual chat, roleplay, and human-like interaction patterns.

### Metrics
| Metric | Value |
| :--- | :--- |
| **Final Loss** | 1.41 |
| **Final Token Accuracy** | ~65.9% |

## 🛠️ Hyperparameters
*   **Rank (r):** 32
*   **Alpha:** 64
*   **Dropout:** 0.05
*   **Target Modules:** All linear layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `embed_tokens`, `lm_head`)

*Created with <3 by me*