Rahulwale12
/

base_slm

transformer_lite

Model card Files Files and versions

base_slm / README.md

Rahulwale12's picture

Add base CPU-optimized SLM model

f1413cd verified 6 months ago

|

history blame contribute delete

2.71 kB

	# Base Small Language Model (SLM)

	## 🚀 CPU-First Base Language Model

	This is the base model before fine-tuning - a blazing-fast, CPU-optimized Small Language Model foundation:

	### ⚡ Performance Highlights
	- 164 tokens/sec on CPU (fast base performance)
	- 45.2MB model size (base model)
	- 3.7M parameters (tiny but powerful)
	- General language understanding (pre-fine-tuning)

	### 🎯 Training Speed
	- 28 minutes for base training (4 epochs)
	- Fast convergence with efficient architecture
	- Ready for fine-tuning on any domain

	### 🔧 Technical Specs
	- Architecture: Transformer-lite with RMSNorm, SwiGLU, Rotary embeddings
	- Optimization: CPU-first with memory mapping and efficient batching
	- Framework: PyTorch (CPU optimized)
	- Training: Trained on conversational data

	### 📱 Deployment Ready
	- CPU optimized: No GPU required
	- Fast startup: Instant model loading
	- Low memory: Efficient memory usage
	- Fine-tuning ready: Perfect base for domain adaptation

	## Usage

	### Load and Use Base Model

	```python
	import torch
	import sys
	sys.path.append('src')
	from model import create_model_from_config
	from tokenizer import BPETokenizer

	# Load model
	checkpoint = torch.load("checkpoints/model_latest.pt", map_location='cpu')
	config = checkpoint['config']
	model = create_model_from_config(config)
	model.load_state_dict(checkpoint['model_state_dict'])

	# Load tokenizer
	tokenizer = BPETokenizer()
	tokenizer.load("data/tokenizer.json")

	# Generate
	prompt = "Hello, how are you?"
	input_ids = tokenizer.encode(prompt, add_special_tokens=True)
	input_ids = torch.tensor([input_ids], dtype=torch.long)

	model.eval()
	with torch.no_grad():
	for _ in range(20):
	logits = model(input_ids)[0, -1, :]
	next_token = torch.argmax(logits, dim=-1).unsqueeze(0)
	input_ids = torch.cat([input_ids, next_token.unsqueeze(0)], dim=1)

	response = tokenizer.decode(input_ids[0].tolist(), skip_special_tokens=True)
	print(response)
	```

	### Fine-tune on Your Data

	```python
	# Use this base model for fine-tuning
	python finetune_qa.py --base_model checkpoints/model_latest.pt --conversations your_data.json
	```

	## Model Details

	- Base Model: Trained on conversational data
	- Architecture: Transformer-lite with modern optimizations
	- Size: 45.2MB (base model)
	- License: MIT

	## Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Speed \| 164 tokens/sec \|
	\| Size \| 45.2MB \|
	\| Parameters \| 3.7M \|
	\| Training Time \| 28 minutes \|

	This base model provides an excellent foundation for fine-tuning on specific domains or tasks.