Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,534 +1,43 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
##
|
| 38 |
-
|
| 39 |
-
- **
|
| 40 |
-
- **
|
| 41 |
-
- **
|
| 42 |
-
- **
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
## 🚀 Speedup Optimizations
|
| 46 |
-
|
| 47 |
-
This project implements several performance optimizations:
|
| 48 |
-
|
| 49 |
-
1. **✅ Flash Attention (SDPA)**
|
| 50 |
-
- Uses PyTorch's `scaled_dot_product_attention` with `attn_implementation="sdpa"`
|
| 51 |
-
- Automatically uses flash-attention kernels when available
|
| 52 |
-
- Significantly faster attention computation
|
| 53 |
-
|
| 54 |
-
2. **✅ Autocast (Mixed Precision)**
|
| 55 |
-
- Automatic Mixed Precision (AMP) training
|
| 56 |
-
- Uses bfloat16 on supported GPUs, falls back to float16
|
| 57 |
-
- Reduces memory usage and speeds up training
|
| 58 |
-
|
| 59 |
-
3. **✅ Float32 Matmul Precision**
|
| 60 |
-
- Sets `torch.set_float32_matmul_precision("high")`
|
| 61 |
-
- Enables TF32 on Ampere+ GPUs (A100, RTX 30xx, etc.)
|
| 62 |
-
- Faster matrix multiplications with minimal precision loss
|
| 63 |
-
|
| 64 |
-
4. **✅ Power of 2 Optimization**
|
| 65 |
-
- Sequence length set to 256 (power of 2)
|
| 66 |
-
- Ensures all chunks are exactly power-of-2 sized
|
| 67 |
-
- Optimizes GPU memory alignment and computation efficiency
|
| 68 |
-
|
| 69 |
-
## 📁 Project Structure
|
| 70 |
-
|
| 71 |
-
```
|
| 72 |
-
.
|
| 73 |
-
├── accelerate_train.py # Main training script with Accelerate
|
| 74 |
-
├── accelerate_resume.py # Resume training from checkpoint
|
| 75 |
-
├── train_from_scratch.py # Simple training script (no Accelerate)
|
| 76 |
-
├── smollm2_135m_reverse_engineer.py # Reverse engineer model architecture from HF
|
| 77 |
-
├── upload_to_hub.py # Upload checkpoint to HuggingFace Hub
|
| 78 |
-
├── convert_to_fp16.py # Convert FP32 checkpoint to FP16 (reduce size by 50%)
|
| 79 |
-
├── prepare_for_hf_space.py # Prepare minimal checkpoint (remove optimizer)
|
| 80 |
-
├── config.py # Model configuration class
|
| 81 |
-
├── generate_config.py # Config generation utility
|
| 82 |
-
├── app.py # Gradio demo for Hugging Face Spaces
|
| 83 |
-
├── input.txt # Training dataset (Shakespeare's Coriolanus)
|
| 84 |
-
├── checkpoint_5000/ # Saved model checkpoint
|
| 85 |
-
│ ├── config.json
|
| 86 |
-
│ ├── model.safetensors
|
| 87 |
-
│ └── optim.pt
|
| 88 |
-
├── smollm2_135m_reverse_engineered/ # Output from reverse engineering script
|
| 89 |
-
│ ├── hf_config.json
|
| 90 |
-
│ ├── hf_config.yaml
|
| 91 |
-
│ └── smollm2_135m_training_skeleton.yaml
|
| 92 |
-
└── README.md # This file
|
| 93 |
-
```
|
| 94 |
-
|
| 95 |
-
## 🛠️ Installation
|
| 96 |
-
|
| 97 |
-
### Requirements
|
| 98 |
-
|
| 99 |
-
```bash
|
| 100 |
-
pip install torch transformers accelerate gradio pyyaml
|
| 101 |
-
```
|
| 102 |
-
|
| 103 |
-
**Note:** `pyyaml` is required for the reverse engineering script (`smollm2_135m_reverse_engineer.py`).
|
| 104 |
-
|
| 105 |
-
### Optional (for better performance)
|
| 106 |
-
|
| 107 |
-
```bash
|
| 108 |
-
# Flash Attention (if available)
|
| 109 |
-
pip install flash-attn --no-build-isolation
|
| 110 |
-
|
| 111 |
-
# Quantization support (for smaller model size in HF Spaces)
|
| 112 |
-
pip install bitsandbytes
|
| 113 |
-
```
|
| 114 |
-
|
| 115 |
-
**Note:** `bitsandbytes` is required for 8-bit/4-bit quantization in the Gradio app. This is useful for Hugging Face Spaces where model size is limited.
|
| 116 |
-
|
| 117 |
-
## 📖 Usage
|
| 118 |
-
|
| 119 |
-
### 0. Reverse Engineering Model Architecture (Optional)
|
| 120 |
-
|
| 121 |
-
Before training, you may want to inspect and reverse engineer the SmolLM2-135M architecture from HuggingFace. This is useful for:
|
| 122 |
-
- Understanding the exact model configuration
|
| 123 |
-
- Extracting architecture parameters
|
| 124 |
-
- Generating config files for training frameworks
|
| 125 |
-
- Verifying parameter counts
|
| 126 |
-
|
| 127 |
-
#### Basic Usage
|
| 128 |
-
|
| 129 |
-
```bash
|
| 130 |
-
# Download and inspect the model config (no model weights loaded)
|
| 131 |
-
python smollm2_135m_reverse_engineer.py
|
| 132 |
-
```
|
| 133 |
-
|
| 134 |
-
This will:
|
| 135 |
-
- Download the HuggingFace config and tokenizer
|
| 136 |
-
- Display an architecture summary
|
| 137 |
-
- Export configs to `smollm2_135m_reverse_engineered/`:
|
| 138 |
-
- `hf_config.json` - Raw HuggingFace config
|
| 139 |
-
- `hf_config.yaml` - Raw HuggingFace config (YAML format)
|
| 140 |
-
- `smollm2_135m_training_skeleton.yaml` - Training-style config skeleton
|
| 141 |
-
|
| 142 |
-
#### Advanced Usage
|
| 143 |
-
|
| 144 |
-
```bash
|
| 145 |
-
# Load model weights and compute parameter statistics
|
| 146 |
-
python smollm2_135m_reverse_engineer.py --load-model --dtype bf16
|
| 147 |
-
|
| 148 |
-
# Use a different model variant
|
| 149 |
-
python smollm2_135m_reverse_engineer.py --model-id HuggingFaceTB/SmolLM2-135M-Instruct
|
| 150 |
-
|
| 151 |
-
# Custom output directory
|
| 152 |
-
python smollm2_135m_reverse_engineer.py --output-dir my_configs
|
| 153 |
-
```
|
| 154 |
-
|
| 155 |
-
#### Command Line Options
|
| 156 |
-
|
| 157 |
-
- `--model-id`: HuggingFace model ID (default: `HuggingFaceTB/SmolLM2-135M`)
|
| 158 |
-
- `--output-dir`: Directory for exported configs (default: `smollm2_135m_reverse_engineered`)
|
| 159 |
-
- `--load-model`: Load model weights and compute parameter stats (requires GPU memory)
|
| 160 |
-
- `--dtype`: Data type for model loading (`auto`, `fp16`, `bf16`)
|
| 161 |
-
- `--device`: Device for model loading (`auto` uses HF accelerate mapping)
|
| 162 |
-
|
| 163 |
-
#### Output Files
|
| 164 |
-
|
| 165 |
-
The script generates several useful files:
|
| 166 |
-
|
| 167 |
-
1. **`hf_config.json` / `hf_config.yaml`**: Raw HuggingFace configuration in both formats
|
| 168 |
-
2. **`smollm2_135m_training_skeleton.yaml`**: Training-style YAML with:
|
| 169 |
-
- Model architecture parameters
|
| 170 |
-
- Token IDs (BOS, EOS, PAD)
|
| 171 |
-
- Training hyperparameter placeholders
|
| 172 |
-
- Dataset configuration template
|
| 173 |
-
3. **`smollm2_135m_param_stats.json`**: Parameter statistics (only if `--load-model` is used)
|
| 174 |
-
|
| 175 |
-
**Note:** The `config.py` file in this project was generated from the reverse-engineered architecture. You can use this script to verify or regenerate the configuration.
|
| 176 |
-
|
| 177 |
-
### 1. Training from Scratch
|
| 178 |
-
|
| 179 |
-
#### Using Accelerate (Recommended)
|
| 180 |
-
|
| 181 |
-
```bash
|
| 182 |
-
# Configure accelerate (first time only)
|
| 183 |
-
accelerate config
|
| 184 |
-
|
| 185 |
-
# Start training
|
| 186 |
-
accelerate launch accelerate_train.py
|
| 187 |
-
```
|
| 188 |
-
|
| 189 |
-
The training script will:
|
| 190 |
-
- Build the model from scratch using `SmolLMConfig`
|
| 191 |
-
- Load and chunk the dataset from `input.txt`
|
| 192 |
-
- Train for 5000 steps with automatic checkpointing
|
| 193 |
-
- Save model to `checkpoint_5000/`
|
| 194 |
-
|
| 195 |
-
#### Using Simple Training Script
|
| 196 |
-
|
| 197 |
-
```bash
|
| 198 |
-
python train_from_scratch.py
|
| 199 |
-
```
|
| 200 |
-
|
| 201 |
-
### 2. Resuming Training
|
| 202 |
-
|
| 203 |
-
```bash
|
| 204 |
-
accelerate launch accelerate_resume.py
|
| 205 |
-
```
|
| 206 |
-
|
| 207 |
-
This will:
|
| 208 |
-
- Load the model from `checkpoint_5000/`
|
| 209 |
-
- Resume optimizer state
|
| 210 |
-
- Continue training from the saved step
|
| 211 |
-
|
| 212 |
-
### 3. Running the Gradio Demo
|
| 213 |
-
|
| 214 |
-
#### Local Demo
|
| 215 |
-
|
| 216 |
-
```bash
|
| 217 |
-
python app.py
|
| 218 |
-
```
|
| 219 |
-
|
| 220 |
-
Then open your browser to `http://localhost:7860`
|
| 221 |
-
|
| 222 |
-
#### Hugging Face Spaces
|
| 223 |
-
|
| 224 |
-
**Option 1: Use Pretrained Model (Recommended for CPU-only Spaces)**
|
| 225 |
-
|
| 226 |
-
Since checkpoint files are large (~270MB), the easiest approach is to use the pretrained model:
|
| 227 |
-
|
| 228 |
-
1. Create a new Space on Hugging Face (CPU or GPU)
|
| 229 |
-
2. Upload `app.py` and `requirements.txt` (do NOT upload checkpoint files)
|
| 230 |
-
3. The app will automatically load from `HuggingFaceTB/SmolLM2-135M` if no checkpoint is found
|
| 231 |
-
4. The Space will automatically deploy
|
| 232 |
-
|
| 233 |
-
**CPU Performance Note:**
|
| 234 |
-
- The app automatically detects CPU and loads models in float32
|
| 235 |
-
- Generation will be slower on CPU (~5-10 seconds per generation)
|
| 236 |
-
- For faster inference, consider using a GPU-enabled Space
|
| 237 |
-
|
| 238 |
-
**Option 2: Upload Checkpoint to HuggingFace Hub (Recommended)**
|
| 239 |
-
|
| 240 |
-
If you want to use your fine-tuned checkpoint, upload it to HuggingFace Hub as a model repository (not in the Space):
|
| 241 |
-
|
| 242 |
-
1. Install dependencies and login:
|
| 243 |
-
```bash
|
| 244 |
-
pip install huggingface_hub
|
| 245 |
-
huggingface-cli login
|
| 246 |
-
```
|
| 247 |
-
|
| 248 |
-
2. **Convert to FP16 first** (if your model is FP32):
|
| 249 |
-
```bash
|
| 250 |
-
python convert_to_fp16.py --checkpoint-dir checkpoint_5000 --output-dir checkpoint_fp16
|
| 251 |
-
```
|
| 252 |
-
|
| 253 |
-
3. Upload checkpoint (optimizer state excluded by default):
|
| 254 |
-
```bash
|
| 255 |
-
python upload_to_hub.py --repo-id your-username/smollm2-135m-coriolanus --checkpoint-dir checkpoint_fp16
|
| 256 |
-
```
|
| 257 |
-
|
| 258 |
-
**Required files uploaded:**
|
| 259 |
-
- ✅ `config.json` (~1KB) - Model configuration
|
| 260 |
-
- ✅ `model.safetensors` (~257MB FP16 or ~513MB FP32) - Model weights
|
| 261 |
-
- ✅ `generation_config.json` (~1KB) - Generation settings
|
| 262 |
-
- ❌ **Excludes `optim.pt`** (~200MB) - Not needed for inference
|
| 263 |
-
- ❌ **Tokenizer files** - Not needed (app uses `HuggingFaceTB/SmolLM2-135M` tokenizer)
|
| 264 |
-
|
| 265 |
-
3. Set environment variable in HF Space settings:
|
| 266 |
-
- `HF_MODEL_ID`: `your-username/smollm2-135m-coriolanus`
|
| 267 |
-
|
| 268 |
-
4. Upload only `app.py` and `requirements.txt` to the Space (no checkpoint files!)
|
| 269 |
-
|
| 270 |
-
**Size Reduction:**
|
| 271 |
-
- Original checkpoint: ~713MB (FP32 model 513MB + optimizer 200MB)
|
| 272 |
-
- After removing optimizer: ~513MB (FP32 model only)
|
| 273 |
-
- After converting to FP16: ~257MB (50% reduction)
|
| 274 |
-
- Space files: <1MB (just app.py and requirements.txt)
|
| 275 |
-
|
| 276 |
-
**💡 Important:** If your model is saved in FP32 (~513MB), convert it to FP16 first:
|
| 277 |
-
```bash
|
| 278 |
-
python convert_to_fp16.py --checkpoint-dir checkpoint_5000 --output-dir checkpoint_fp16
|
| 279 |
-
python upload_to_hub.py --repo-id your-username/smollm2-135m-coriolanus --checkpoint-dir checkpoint_fp16
|
| 280 |
-
```
|
| 281 |
-
|
| 282 |
-
**Option 3: Use Quantization (GPU Only - Smaller Model Size)**
|
| 283 |
-
|
| 284 |
-
To reduce model size for HF Spaces with GPU:
|
| 285 |
-
|
| 286 |
-
1. Set environment variable in HF Space settings:
|
| 287 |
-
- `USE_QUANTIZATION`: `8bit` (for 8-bit) or `4bit` (for 4-bit quantization)
|
| 288 |
-
|
| 289 |
-
2. This reduces model size:
|
| 290 |
-
- 8-bit: ~135MB (50% reduction)
|
| 291 |
-
- 4-bit: ~68MB (75% reduction)
|
| 292 |
-
|
| 293 |
-
3. Upload `app.py` and `requirements.txt` to the Space
|
| 294 |
-
4. Add `bitsandbytes>=0.41.0` to `requirements.txt` (for GPU quantization)
|
| 295 |
-
|
| 296 |
-
**Note:**
|
| 297 |
-
- **Quantization requires GPU** - it will NOT work on CPU-only Spaces
|
| 298 |
-
- **For CPU-only Spaces**: Use Option 1 (pretrained model) - the app automatically detects CPU and loads without quantization
|
| 299 |
-
- The app will automatically use float32 on CPU for compatibility
|
| 300 |
-
|
| 301 |
-
### 4. Using the Model Programmatically
|
| 302 |
-
|
| 303 |
-
```python
|
| 304 |
-
from transformers import LlamaForCausalLM, GPT2TokenizerFast
|
| 305 |
-
import torch
|
| 306 |
-
|
| 307 |
-
# Load model
|
| 308 |
-
model = LlamaForCausalLM.from_pretrained("checkpoint_5000")
|
| 309 |
-
tokenizer = GPT2TokenizerFast.from_pretrained("HuggingFaceTB/SmolLM2-135M")
|
| 310 |
-
tokenizer.pad_token = tokenizer.eos_token
|
| 311 |
-
|
| 312 |
-
# Generate text (model writes in dramatic play style)
|
| 313 |
-
prompt = "CORIOLANUS:"
|
| 314 |
-
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
| 315 |
-
|
| 316 |
-
with torch.no_grad():
|
| 317 |
-
outputs = model.generate(
|
| 318 |
-
input_ids,
|
| 319 |
-
max_new_tokens=100,
|
| 320 |
-
temperature=0.8,
|
| 321 |
-
do_sample=True,
|
| 322 |
-
top_p=0.9,
|
| 323 |
-
top_k=50
|
| 324 |
-
)
|
| 325 |
-
|
| 326 |
-
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 327 |
-
print(generated_text)
|
| 328 |
-
```
|
| 329 |
-
|
| 330 |
-
## ⚙️ Configuration
|
| 331 |
-
|
| 332 |
-
### Model Definition
|
| 333 |
-
|
| 334 |
-
The model is built from a simple configuration class defined in `config.py`:
|
| 335 |
-
|
| 336 |
-
```python
|
| 337 |
-
# Auto-generated config for SmolLM2-135M
|
| 338 |
-
class SmolLMConfig:
|
| 339 |
-
def __init__(self):
|
| 340 |
-
self.model_type = 'llama'
|
| 341 |
-
self.vocab_size = 49152
|
| 342 |
-
self.hidden_size = 576
|
| 343 |
-
self.intermediate_size = 1536
|
| 344 |
-
self.num_hidden_layers = 30
|
| 345 |
-
self.num_attention_heads = 9
|
| 346 |
-
self.num_key_value_heads = 3
|
| 347 |
-
self.max_position_embeddings = 8192
|
| 348 |
-
self.rms_norm_eps = 1e-05
|
| 349 |
-
self.rope_theta = 100000
|
| 350 |
-
self.rope_scaling = None
|
| 351 |
-
self.bos_token_id = 0
|
| 352 |
-
self.eos_token_id = 0
|
| 353 |
-
self.pad_token_id = None
|
| 354 |
-
self.tie_word_embeddings = True
|
| 355 |
-
```
|
| 356 |
-
|
| 357 |
-
The model is instantiated in the training script:
|
| 358 |
-
|
| 359 |
-
```python
|
| 360 |
-
from transformers import LlamaForCausalLM, LlamaConfig
|
| 361 |
-
from config import SmolLMConfig
|
| 362 |
-
|
| 363 |
-
# Build config
|
| 364 |
-
sm_cfg = SmolLMConfig()
|
| 365 |
-
hf_config = LlamaConfig(
|
| 366 |
-
vocab_size=sm_cfg.vocab_size,
|
| 367 |
-
hidden_size=sm_cfg.hidden_size,
|
| 368 |
-
intermediate_size=sm_cfg.intermediate_size,
|
| 369 |
-
num_hidden_layers=sm_cfg.num_hidden_layers,
|
| 370 |
-
num_attention_heads=sm_cfg.num_attention_heads,
|
| 371 |
-
num_key_value_heads=sm_cfg.num_key_value_heads,
|
| 372 |
-
max_position_embeddings=sm_cfg.max_position_embeddings,
|
| 373 |
-
rms_norm_eps=sm_cfg.rms_norm_eps,
|
| 374 |
-
rope_theta=sm_cfg.rope_theta,
|
| 375 |
-
tie_word_embeddings=sm_cfg.tie_word_embeddings,
|
| 376 |
-
)
|
| 377 |
-
|
| 378 |
-
# Create model with SDPA (Flash Attention)
|
| 379 |
-
model = LlamaForCausalLM(hf_config, attn_implementation="sdpa")
|
| 380 |
-
```
|
| 381 |
-
|
| 382 |
-
Modify the values in `SmolLMConfig` to change the model architecture.
|
| 383 |
-
|
| 384 |
-
## 🎯 Training Configuration
|
| 385 |
-
|
| 386 |
-
Default training hyperparameters:
|
| 387 |
-
|
| 388 |
-
- **Optimizer:** AdamW
|
| 389 |
-
- **Learning Rate:** 2e-4
|
| 390 |
-
- **Sequence Length:** 256
|
| 391 |
-
- **Max Steps:** 5000
|
| 392 |
-
- **Batch Size:** 1 (per device, scales with Accelerate)
|
| 393 |
-
- **Mixed Precision:** Enabled (bfloat16/float16 on CUDA)
|
| 394 |
-
|
| 395 |
-
### Expected Training Output
|
| 396 |
-
|
| 397 |
-
When running `accelerate_train.py`, you should see output similar to:
|
| 398 |
-
|
| 399 |
-
```
|
| 400 |
-
Using device: cuda
|
| 401 |
-
|
| 402 |
-
==================================================
|
| 403 |
-
Model Parameters:
|
| 404 |
-
Total: 134,515,008 (134.52M)
|
| 405 |
-
Trainable: 134,515,008 (134.52M)
|
| 406 |
-
Non-trainable: 0 (0)
|
| 407 |
-
==================================================
|
| 408 |
-
|
| 409 |
-
Loaded 1332 training chunks.
|
| 410 |
-
|
| 411 |
-
Step 0 | Loss 10.8755
|
| 412 |
-
Step 500 | Loss 4.8903
|
| 413 |
-
Step 1000 | Loss 5.9240
|
| 414 |
-
Step 1500 | Loss 5.1603
|
| 415 |
-
Step 2000 | Loss 4.4749
|
| 416 |
-
Step 2500 | Loss 4.6673
|
| 417 |
-
Step 3000 | Loss 4.0769
|
| 418 |
-
Step 3500 | Loss 4.8665
|
| 419 |
-
Step 4000 | Loss 4.4102
|
| 420 |
-
Step 4500 | Loss 3.5130
|
| 421 |
-
|
| 422 |
-
Training complete. Checkpoint saved.
|
| 423 |
-
```
|
| 424 |
-
|
| 425 |
-
**Training Notes:**
|
| 426 |
-
- Initial loss starts around 10-11 (typical for language modeling)
|
| 427 |
-
- Loss decreases over training steps, reaching ~3.5-4.5 after 5000 steps
|
| 428 |
-
- The model is trained on 1,332 chunks of 256 tokens each from Coriolanus
|
| 429 |
-
- Checkpoint is saved to `checkpoint_5000/` directory
|
| 430 |
-
|
| 431 |
-
### Resuming Training
|
| 432 |
-
|
| 433 |
-
To resume training from a checkpoint:
|
| 434 |
-
|
| 435 |
-
```bash
|
| 436 |
-
accelerate launch accelerate_resume.py
|
| 437 |
-
```
|
| 438 |
-
|
| 439 |
-
Expected output:
|
| 440 |
-
|
| 441 |
-
```
|
| 442 |
-
Resuming from step 5000
|
| 443 |
-
Extra step 0 | Loss: 18.6302
|
| 444 |
-
Extra step 10 | Loss: 11.6444
|
| 445 |
-
Extra step 20 | Loss: 10.9953
|
| 446 |
-
Extra step 30 | Loss: 11.1277
|
| 447 |
-
Extra step 40 | Loss: 11.2197
|
| 448 |
-
Resume training complete.
|
| 449 |
-
```
|
| 450 |
-
|
| 451 |
-
**Note:** When resuming, the loss may initially spike (as shown above) because the resume script uses synthetic data for demonstration. In production, you would load your actual training dataset.
|
| 452 |
-
|
| 453 |
-
## 📝 Dataset Format
|
| 454 |
-
|
| 455 |
-
The training script expects `input.txt` in the project root. The file should contain plain text that will be tokenized and chunked into sequences of length 256.
|
| 456 |
-
|
| 457 |
-
**Current Dataset:** This model is fine-tuned exclusively on Shakespeare's **Coriolanus** (contained in `input.txt`). As a result, the model generates text in the style of a dramatic play, with character names, stage directions, and Shakespearean dialogue.
|
| 458 |
-
|
| 459 |
-
The script automatically chunks the text into sequences of length 256 for training.
|
| 460 |
-
|
| 461 |
-
## 🔧 Advanced Usage
|
| 462 |
-
|
| 463 |
-
### Multi-GPU Training
|
| 464 |
-
|
| 465 |
-
Accelerate automatically handles multi-GPU training:
|
| 466 |
-
|
| 467 |
-
```bash
|
| 468 |
-
# Use all available GPUs
|
| 469 |
-
accelerate launch --multi_gpu accelerate_train.py
|
| 470 |
-
|
| 471 |
-
# Use specific GPUs
|
| 472 |
-
CUDA_VISIBLE_DEVICES=0,1 accelerate launch accelerate_train.py
|
| 473 |
-
```
|
| 474 |
-
|
| 475 |
-
### Custom Dataset
|
| 476 |
-
|
| 477 |
-
Modify the `load_dataset` function in `accelerate_train.py` to load your custom dataset format.
|
| 478 |
-
|
| 479 |
-
### Checkpoint Management
|
| 480 |
-
|
| 481 |
-
Checkpoints are saved in the format:
|
| 482 |
-
```
|
| 483 |
-
checkpoint_5000/
|
| 484 |
-
├── config.json # Model configuration
|
| 485 |
-
├── model.safetensors # Model weights
|
| 486 |
-
├── generation_config.json # Generation settings
|
| 487 |
-
└── optim.pt # Optimizer state
|
| 488 |
-
```
|
| 489 |
-
|
| 490 |
-
## 🐛 Troubleshooting
|
| 491 |
-
|
| 492 |
-
### Windows Autocast Issues
|
| 493 |
-
Autocast is disabled on Windows by default. The training will still work but may be slower.
|
| 494 |
-
|
| 495 |
-
### Out of Memory
|
| 496 |
-
- Reduce `seq_len` in the training script
|
| 497 |
-
- Use gradient accumulation
|
| 498 |
-
- Enable CPU offloading with Accelerate
|
| 499 |
-
|
| 500 |
-
### Flash Attention Not Working
|
| 501 |
-
If flash attention isn't available, the model will fall back to standard attention. This is handled automatically.
|
| 502 |
-
|
| 503 |
-
### Tokenizer Warnings (Fixed)
|
| 504 |
-
The training script has been updated to handle common warnings:
|
| 505 |
-
|
| 506 |
-
- **Sequence Length Warning**: If you see "sequence length is longer than max_position_embeddings", this is now automatically suppressed. The script temporarily increases the tokenizer's `model_max_length` during data loading, then chunks the data into 256-token sequences for training.
|
| 507 |
-
|
| 508 |
-
- **Attention Mask Warnings**: Generation warnings about attention masks and pad tokens have been fixed by explicitly providing attention masks and token IDs during generation.
|
| 509 |
-
|
| 510 |
-
These warnings were harmless but have been resolved for a cleaner training experience.
|
| 511 |
-
|
| 512 |
-
## 📚 References
|
| 513 |
-
|
| 514 |
-
- [LLaMA Paper](https://arxiv.org/abs/2302.13971)
|
| 515 |
-
- [SmolLM2 Model Card](https://huggingface.co/HuggingFaceTB/SmolLM2-135M)
|
| 516 |
-
- [Hugging Face Transformers](https://huggingface.co/docs/transformers)
|
| 517 |
-
- [Accelerate Documentation](https://huggingface.co/docs/accelerate)
|
| 518 |
-
|
| 519 |
-
## 📄 License
|
| 520 |
-
|
| 521 |
-
This project uses the SmolLM2-135M model, which follows the original model's license. Please check the Hugging Face model card for licensing details.
|
| 522 |
-
|
| 523 |
-
## 🤝 Contributing
|
| 524 |
-
|
| 525 |
-
Contributions are welcome! Please feel free to submit a Pull Request.
|
| 526 |
-
|
| 527 |
-
## 📧 Contact
|
| 528 |
-
|
| 529 |
-
For questions or issues, please open an issue on the repository.
|
| 530 |
-
|
| 531 |
-
---
|
| 532 |
-
|
| 533 |
-
**Note:** This is a training and inference framework. The model weights are trained from scratch or loaded from checkpoints. Make sure you have appropriate data and compute resources for training.
|
| 534 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: SmolLM2-135M Coriolanus
|
| 3 |
+
emoji: 🎭
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 4.0.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# SmolLM2-135M Coriolanus Text Generation
|
| 13 |
+
|
| 14 |
+
A lightweight language model (135M parameters) fine-tuned exclusively on Shakespeare's **Coriolanus**. The model writes in the style of a dramatic play, complete with character names, stage directions, and Shakespearean dialogue.
|
| 15 |
+
|
| 16 |
+
## Features
|
| 17 |
+
|
| 18 |
+
- **135M Parameters** - Efficient and fast inference
|
| 19 |
+
- **Grouped Query Attention (GQA)** - Optimized attention mechanism
|
| 20 |
+
- **Flash Attention** - Fast attention computation
|
| 21 |
+
- **Interactive UI** - Easy-to-use Gradio interface
|
| 22 |
+
|
| 23 |
+
## Model Architecture
|
| 24 |
+
|
| 25 |
+
- **Hidden Size:** 576
|
| 26 |
+
- **Layers:** 30
|
| 27 |
+
- **Attention Heads:** 9 (3 KV heads)
|
| 28 |
+
- **Vocabulary:** 49,152 tokens
|
| 29 |
+
- **Max Context:** 8,192 tokens
|
| 30 |
+
|
| 31 |
+
## Usage
|
| 32 |
+
|
| 33 |
+
1. Enter your prompt in the text box (try prompts like "CORIOLANUS:" or "Enter CORIOLANUS and MENENIUS")
|
| 34 |
+
2. Adjust generation parameters (temperature, top-p, etc.)
|
| 35 |
+
3. Click "Generate" to create text in the style of a dramatic play
|
| 36 |
+
|
| 37 |
+
## Parameters
|
| 38 |
+
|
| 39 |
+
- **Temperature:** Controls randomness (lower = more focused)
|
| 40 |
+
- **Top-p:** Nucleus sampling threshold
|
| 41 |
+
- **Top-k:** Limits to top k tokens
|
| 42 |
+
- **Repetition Penalty:** Reduces repetition (higher = less repetition)
|
| 43 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|