SmolLM3-Chat-v1-Adapter / README.md

igidn

Update README.md

ceff643 verified 3 days ago

preview code

raw

history blame contribute delete

3.79 kB

metadata

base_model: HuggingFaceTB/SmolLM3-3B-Base
library_name: peft
tags:
  - base_model:adapter:HuggingFaceTB/SmolLM3-3B-Base
  - lora
  - sft
  - transformers
  - trl
license: mit
datasets:
  - teknium/OpenHermes-2.5
language:
  - en

Model Card: SmolLM3-Chat-v1-adapter

This repository contains the LoRA (Low-Rank Adaptation) weights for SmolLM3-Chat-v1.

This adapter was trained to give the SmolLM3-3B-Base model a casual, witty, and "internet-native" personality. It moves away from robotic assistant responses in favor of a more human-like vibe.

🔗 Related Models

Merged Version (Float16): SmolLM3-Chat-v1
Base Model: HuggingFaceTB/SmolLM3-3B-Base

⚠️ System Instructions (Important)

Less is more.

This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., "You are a helpful assistant who is polite, follows rules X, Y, Z...") will degrade the output quality.

Recommended System Prompt: (simply leave it empty for the most raw, casual experience)

💻 Usage (4-Bit Loading)

This script demonstrates how to load the base model in 4-bit and attach the adapter.

import torch
from threading import Thread
from peft import PeftModel
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer, 
    BitsAndBytesConfig, 
    TextIteratorStreamer
)

# 1. Define IDs
ADAPTER_ID = "igidn/SmolLM3-Chat-v1-adapter"
BASE_MODEL_ID = "HuggingFaceTB/SmolLM3-3B-Base"

# 2. Quantization Config (4-bit)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# 3. Load Base Model
tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID) # Load tokenizer from adapter to get special tokens
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# 4. Attach Adapter
model = PeftModel.from_pretrained(model, ADAPTER_ID)

# 5. Define Conversation
messages = [
    {"role": "user", "content": "Haiiii"}
]

# 6. Apply Chat Template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# 7. Streamer & Generation
streamer = TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)

# --- CRITICAL GENERATION CONFIG ---
generate_kwargs = dict(
    **inputs,
    streamer=streamer,
    max_new_tokens=512,
    do_sample=True,
    
    # Core Vibe Parameters
    temperature=0.8,
    top_p=0.85,
    
    # Stability Parameters (Prevents looping)
    repetition_penalty=1.15,
    no_repeat_ngram_size=3,
    
    pad_token_id=tokenizer.eos_token_id
)

thread = Thread(target=model.generate, kwargs=generate_kwargs)
thread.start()

print("Assistant: ", end="")
for new_text in streamer:
    print(new_text, end="", flush=True)

📊 Training Details

The model was trained for 2 epochs using SFTTrainer.

Dataset

OpenHermes-2.5 (5k subset): Logic and general helpfulness.
Custom Dataset (15k): Casual chat, roleplay, and human-like interaction patterns.

Metrics

Metric	Value
Final Loss	1.41
Final Token Accuracy	~65.9%

🛠️ Hyperparameters

Rank (r): 32
Alpha: 64
Dropout: 0.05
Target Modules: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, embed_tokens, lm_head)

Created with <3 by me