--- license: mit datasets: - teknium/OpenHermes-2.5 language: - en base_model: - HuggingFaceTB/SmolLM3-3B-Base --- # Model Card: SmolLM3-Chat-v1 **SmolLM3-Chat-v1** is a finetune of the [SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) model, designed to be casual, witty, and human-like. Unlike standard assistants that sound robotic and overly formal, this model captures a distinct "internet-native" vibe. It was trained on a curated mix of high-quality instruction data and custom conversation logs to balance intelligence with personality. > **Note:** This is the **full merged version**. If you are looking for the LoRA adapter, please check `SmolLM3-Chat-v1-adapter`. ## ⚠️ Important: System Instructions **Less is more.** This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., *"You are a helpful assistant who is polite, follows rules X, Y, Z..."*) will actually degrade the quality of the output. For the System Instuction simply leave it empty for the most raw, casual experience **Ironically, less instruction = more human.** ## ⚠️ Quantization Warning **Avoid re-quantizing this merged model.** This model was trained using **QLoRA** (on a 4-bit base model) and then merged back to **Float16**. Compressing this merged model *again* (e.g., converting it to 4-bit GGUF, AWQ, or GPTQ) causes "double quantization" noise. This often breaks the specific "vibe" of the model, leading to: * Broken grammar or incoherent responses. * Loss of the casual/witty personality. * Looping issues. **If you need a low-VRAM (4-bit) version:** * ❌ **Do not** quantize this merged model. * ✅ **Use the Adapter instead:** [SmolLM3-Chat-v1-adapter](https://huggingface.co/igidn/SmolLM3-Chat-v1-adapter). * Load the base `SmolLM3-3B` in 4-bit and attach the adapter. This preserves the original training quality. ## 💻 Usage To get the best performance (and prevent repetition loops), you **must** use the specific generation configuration below. ```python import torch from threading import Thread from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer MODEL_ID = "igidn/SmolLM3-Chat-v1" # 1. Load Model & Tokenizer tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) model = AutoModelForCausalLM.from_pretrained( MODEL_ID, torch_dtype=torch.float16, device_map="auto" ) # 2. Define Conversation messages = [ {"role": "user", "content": "hellooooo"} ] # 3. Apply Chat Template prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer([prompt], return_tensors="pt").to(model.device) # 4. Streamer Setup streamer = TextIteratorStreamer( tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True ) # 5. Generation Configuration (CRITICAL) generate_kwargs = dict( **inputs, streamer=streamer, max_new_tokens=512, do_sample=True, # Core Parameters for "Vibe" temperature=0.8, # High creativity top_p=0.85, # Nuanced sampling # Stability Parameters repetition_penalty=1.15, # Prevents "I'll be gone in 5 mins" loops no_repeat_ngram_size=3, # Hard block on repetitive phrases pad_token_id=tokenizer.eos_token_id ) # 6. Run Inference thread = Thread(target=model.generate, kwargs=generate_kwargs) thread.start() print("Assistant: ", end="") for new_text in streamer: print(new_text, end="", flush=True) ``` ## 📊 Training Details The model was trained for 2 epochs using `SFTTrainer` with a cosine learning rate scheduler. ### Dataset Composition * **OpenHermes-2.5 (5k subset):** Provides logic, reasoning, and general helpfulness. * **Custom Dataset (15k):** Focused on casual chat, roleplay, and human-like interaction patterns. * **Total:** 20,000 examples. ### Training metrics The model showed steady convergence without catastrophic overfitting. The final loss indicates a strong grasp of the training data without losing generalization capabilities. | Metric | Start | End | | :--- | :--- | :--- | | **Loss** | 2.47 | 1.41 | | **Token Accuracy** | 53.3% | 65.9% | | **Epochs** | 0 | 2.0 | **Loss Curve:** * **Epoch 0.2:** Loss 1.59 (Rapid initial learning) * **Epoch 1.0:** Loss 1.65 (Transition point) * **Epoch 2.0:** Loss 1.41 (Final convergence) ## 🛠️ Hyperparameters * **Base Model:** HuggingFaceTB/SmolLM3-3B-Base * **Precision:** Float16 (Training) / Float16 (Inference) * **LoRA Config:** r=32, alpha=64 * **Learning Rate:** 3e-5 (Cosine Schedule) * **Optimizer:** paged_adamw_32bit --- *Created with <3 by me*