| --- |
| language: |
| - en |
| tags: |
| - llama |
| - decoder-only |
| - educational |
| - pretrained |
| license: apache-2.0 |
| datasets: |
| - HuggingFaceFW/fineweb-edu |
| --- |
| |
| # LLM-1B-Lab |
|
|
| Educational implementation of a **1.1B parameter LLaMA-style Decoder-Only Transformer**, |
| trained from scratch on [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu). |
|
|
| ## Model Details |
|
|
| | Attribute | Value | |
| |-----------|-------| |
| | Parameters | ~1.1B | |
| | Architecture | LLaMA-style (RMSNorm, RoPE, GQA, SwiGLU, Weight Tying) | |
| | Hidden dim | 2048 | |
| | Layers | 22 | |
| | Attention heads | 16 (Q) / 4 (KV) | |
| | Max sequence length | 2048 | |
| | Vocab size | 32,000 | |
| | Training steps | 20,000 | |
| | Best val loss | 2.3653 (perplexity: 10.65) | |
|
|
| ## Training |
|
|
| - **Dataset**: FineWeb-Edu (sample-10BT) |
| - **Tokenizer**: Pretrained LLaMA 2 (`NousResearch/Llama-2-7b-hf`, 32K vocab) |
| - **Hardware**: Google Colab Pro+ (A100 40GB) |
| - **Precision**: bfloat16 mixed precision |
| - **Optimizer**: AdamW (lr=3e-4, weight_decay=0.1, beta2=0.95) |
| - **Scheduler**: Cosine warmup (2000 warmup steps) |
| - **Effective batch size**: 128 |
| |
| ## Usage |
| |
| ```python |
| import torch |
| from safetensors.torch import load_file |
| from transformers import AutoTokenizer |
|
|
| # 1. Load config and rebuild model |
| from llm_lab.config import ModelConfig |
| from llm_lab.model import LLMModel |
|
|
| model = LLMModel(ModelConfig.base_1b()) |
| state_dict = load_file("model.safetensors") |
| model.load_state_dict(state_dict, strict=False) # strict=False for weight tying |
| model.eval() |
|
|
| # 2. Load tokenizer (pretrained LLaMA 2) |
| tokenizer = AutoTokenizer.from_pretrained("Vjeong/LLM-1B-Lab") |
| |
| # 3. Generate text |
| prompt = "The future of AI is" |
| input_ids = torch.tensor([tokenizer.encode(prompt)]) |
| output = model.generate(input_ids, max_new_tokens=100, temperature=0.8, top_p=0.9) |
| print(tokenizer.decode(output[0].tolist())) |
| ``` |
| |
| ## License |
| |
| Apache 2.0 |
| |