--- base_model: HuggingFaceTB/SmolLM3-3B-Base library_name: peft tags: - base_model:adapter:HuggingFaceTB/SmolLM3-3B-Base - lora - sft - transformers - trl license: mit datasets: - teknium/OpenHermes-2.5 language: - en --- # Model Card: SmolLM3-Chat-v1-adapter This repository contains the **LoRA (Low-Rank Adaptation)** weights for **SmolLM3-Chat-v1**. This adapter was trained to give the [SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) model a casual, witty, and "internet-native" personality. It moves away from robotic assistant responses in favor of a more human-like vibe. ## 🔗 Related Models * **Merged Version (Float16):** [SmolLM3-Chat-v1](https://huggingface.co/igidn/SmolLM3-Chat-v1) * **Base Model:** [HuggingFaceTB/SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) ## ⚠️ System Instructions (Important) **Less is more.** This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., *"You are a helpful assistant who is polite, follows rules X, Y, Z..."*) will degrade the output quality. **Recommended System Prompt:** *(simply leave it empty for the most raw, casual experience)* ## 💻 Usage (4-Bit Loading) This script demonstrates how to load the base model in 4-bit and attach the adapter. ```python import torch from threading import Thread from peft import PeftModel from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextIteratorStreamer ) # 1. Define IDs ADAPTER_ID = "igidn/SmolLM3-Chat-v1-adapter" BASE_MODEL_ID = "HuggingFaceTB/SmolLM3-3B-Base" # 2. Quantization Config (4-bit) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ) # 3. Load Base Model tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID) # Load tokenizer from adapter to get special tokens model = AutoModelForCausalLM.from_pretrained( BASE_MODEL_ID, quantization_config=bnb_config, device_map="auto", trust_remote_code=True ) # 4. Attach Adapter model = PeftModel.from_pretrained(model, ADAPTER_ID) # 5. Define Conversation messages = [ {"role": "user", "content": "Haiiii"} ] # 6. Apply Chat Template prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer([prompt], return_tensors="pt").to(model.device) # 7. Streamer & Generation streamer = TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True) # --- CRITICAL GENERATION CONFIG --- generate_kwargs = dict( **inputs, streamer=streamer, max_new_tokens=512, do_sample=True, # Core Vibe Parameters temperature=0.8, top_p=0.85, # Stability Parameters (Prevents looping) repetition_penalty=1.15, no_repeat_ngram_size=3, pad_token_id=tokenizer.eos_token_id ) thread = Thread(target=model.generate, kwargs=generate_kwargs) thread.start() print("Assistant: ", end="") for new_text in streamer: print(new_text, end="", flush=True) ``` ## 📊 Training Details The model was trained for 2 epochs using `SFTTrainer`. ### Dataset * **OpenHermes-2.5 (5k subset):** Logic and general helpfulness. * **Custom Dataset (15k):** Casual chat, roleplay, and human-like interaction patterns. ### Metrics | Metric | Value | | :--- | :--- | | **Final Loss** | 1.41 | | **Final Token Accuracy** | ~65.9% | ## 🛠️ Hyperparameters * **Rank (r):** 32 * **Alpha:** 64 * **Dropout:** 0.05 * **Target Modules:** All linear layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `embed_tokens`, `lm_head`) *Created with <3 by me*