Update README.md

ceff643 verified 5 days ago

3.79 kB

	---
	base_model: HuggingFaceTB/SmolLM3-3B-Base
	library_name: peft
	tags:
	- base_model:adapter:HuggingFaceTB/SmolLM3-3B-Base
	- lora
	- sft
	- transformers
	- trl
	license: mit
	datasets:
	- teknium/OpenHermes-2.5
	language:
	- en
	---


	# Model Card: SmolLM3-Chat-v1-adapter

	This repository contains the LoRA (Low-Rank Adaptation) weights for SmolLM3-Chat-v1.

	This adapter was trained to give the [SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) model a casual, witty, and "internet-native" personality. It moves away from robotic assistant responses in favor of a more human-like vibe.

	## 🔗 Related Models
	* Merged Version (Float16): [SmolLM3-Chat-v1](https://huggingface.co/igidn/SmolLM3-Chat-v1)
	* Base Model: [HuggingFaceTB/SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base)

	## ⚠️ System Instructions (Important)
	Less is more.

	This model relies on a specific "vibe" learned during training. Over-prompting it with complex system instructions (e.g., "You are a helpful assistant who is polite, follows rules X, Y, Z...") will degrade the output quality.

	Recommended System Prompt:
	(simply leave it empty for the most raw, casual experience)

	## 💻 Usage (4-Bit Loading)

	This script demonstrates how to load the base model in 4-bit and attach the adapter.

	```python
	import torch
	from threading import Thread
	from peft import PeftModel
	from transformers import (
	AutoModelForCausalLM,
	AutoTokenizer,
	BitsAndBytesConfig,
	TextIteratorStreamer
	)

	# 1. Define IDs
	ADAPTER_ID = "igidn/SmolLM3-Chat-v1-adapter"
	BASE_MODEL_ID = "HuggingFaceTB/SmolLM3-3B-Base"

	# 2. Quantization Config (4-bit)
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	bnb_4bit_use_double_quant=True,
	)

	# 3. Load Base Model
	tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID) # Load tokenizer from adapter to get special tokens
	model = AutoModelForCausalLM.from_pretrained(
	BASE_MODEL_ID,
	quantization_config=bnb_config,
	device_map="auto",
	trust_remote_code=True
	)

	# 4. Attach Adapter
	model = PeftModel.from_pretrained(model, ADAPTER_ID)

	# 5. Define Conversation
	messages = [
	{"role": "user", "content": "Haiiii"}
	]

	# 6. Apply Chat Template
	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

	# 7. Streamer & Generation
	streamer = TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)

	# --- CRITICAL GENERATION CONFIG ---
	generate_kwargs = dict(
	**inputs,
	streamer=streamer,
	max_new_tokens=512,
	do_sample=True,

	# Core Vibe Parameters
	temperature=0.8,
	top_p=0.85,

	# Stability Parameters (Prevents looping)
	repetition_penalty=1.15,
	no_repeat_ngram_size=3,

	pad_token_id=tokenizer.eos_token_id
	)

	thread = Thread(target=model.generate, kwargs=generate_kwargs)
	thread.start()

	print("Assistant: ", end="")
	for new_text in streamer:
	print(new_text, end="", flush=True)
	```

	## 📊 Training Details

	The model was trained for 2 epochs using `SFTTrainer`.

	### Dataset
	* OpenHermes-2.5 (5k subset): Logic and general helpfulness.
	* Custom Dataset (15k): Casual chat, roleplay, and human-like interaction patterns.

	### Metrics
	\| Metric \| Value \|
	\| :--- \| :--- \|
	\| Final Loss \| 1.41 \|
	\| Final Token Accuracy \| ~65.9% \|

	## 🛠️ Hyperparameters
	* Rank (r): 32
	* Alpha: 64
	* Dropout: 0.05
	* Target Modules: All linear layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `embed_tokens`, `lm_head`)

	Created with <3 by me