| # Base Small Language Model (SLM) | |
| ## π CPU-First Base Language Model | |
| This is the **base model** before fine-tuning - a blazing-fast, CPU-optimized Small Language Model foundation: | |
| ### β‘ Performance Highlights | |
| - **164 tokens/sec** on CPU (fast base performance) | |
| - **45.2MB model size** (base model) | |
| - **3.7M parameters** (tiny but powerful) | |
| - **General language understanding** (pre-fine-tuning) | |
| ### π― Training Speed | |
| - **28 minutes** for base training (4 epochs) | |
| - **Fast convergence** with efficient architecture | |
| - **Ready for fine-tuning** on any domain | |
| ### π§ Technical Specs | |
| - **Architecture:** Transformer-lite with RMSNorm, SwiGLU, Rotary embeddings | |
| - **Optimization:** CPU-first with memory mapping and efficient batching | |
| - **Framework:** PyTorch (CPU optimized) | |
| - **Training:** Trained on conversational data | |
| ### π± Deployment Ready | |
| - **CPU optimized:** No GPU required | |
| - **Fast startup:** Instant model loading | |
| - **Low memory:** Efficient memory usage | |
| - **Fine-tuning ready:** Perfect base for domain adaptation | |
| ## Usage | |
| ### Load and Use Base Model | |
| ```python | |
| import torch | |
| import sys | |
| sys.path.append('src') | |
| from model import create_model_from_config | |
| from tokenizer import BPETokenizer | |
| # Load model | |
| checkpoint = torch.load("checkpoints/model_latest.pt", map_location='cpu') | |
| config = checkpoint['config'] | |
| model = create_model_from_config(config) | |
| model.load_state_dict(checkpoint['model_state_dict']) | |
| # Load tokenizer | |
| tokenizer = BPETokenizer() | |
| tokenizer.load("data/tokenizer.json") | |
| # Generate | |
| prompt = "Hello, how are you?" | |
| input_ids = tokenizer.encode(prompt, add_special_tokens=True) | |
| input_ids = torch.tensor([input_ids], dtype=torch.long) | |
| model.eval() | |
| with torch.no_grad(): | |
| for _ in range(20): | |
| logits = model(input_ids)[0, -1, :] | |
| next_token = torch.argmax(logits, dim=-1).unsqueeze(0) | |
| input_ids = torch.cat([input_ids, next_token.unsqueeze(0)], dim=1) | |
| response = tokenizer.decode(input_ids[0].tolist(), skip_special_tokens=True) | |
| print(response) | |
| ``` | |
| ### Fine-tune on Your Data | |
| ```python | |
| # Use this base model for fine-tuning | |
| python finetune_qa.py --base_model checkpoints/model_latest.pt --conversations your_data.json | |
| ``` | |
| ## Model Details | |
| - **Base Model:** Trained on conversational data | |
| - **Architecture:** Transformer-lite with modern optimizations | |
| - **Size:** 45.2MB (base model) | |
| - **License:** MIT | |
| ## Performance | |
| | Metric | Value | | |
| |--------|-------| | |
| | Speed | 164 tokens/sec | | |
| | Size | 45.2MB | | |
| | Parameters | 3.7M | | |
| | Training Time | 28 minutes | | |
| This base model provides an excellent foundation for fine-tuning on specific domains or tasks. | |