Vjeong commited on
Commit
2fe29f1
·
verified ·
1 Parent(s): e4bc86f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - llama
6
+ - decoder-only
7
+ - educational
8
+ - pretrained
9
+ license: apache-2.0
10
+ datasets:
11
+ - HuggingFaceFW/fineweb-edu
12
+ ---
13
+
14
+ # LLM-1B-Lab
15
+
16
+ Educational implementation of a **1.1B parameter LLaMA-style Decoder-Only Transformer**,
17
+ trained from scratch on [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
18
+
19
+ ## Model Details
20
+
21
+ | Attribute | Value |
22
+ |-----------|-------|
23
+ | Parameters | ~1.1B |
24
+ | Architecture | LLaMA-style (RMSNorm, RoPE, GQA, SwiGLU, Weight Tying) |
25
+ | Hidden dim | 2048 |
26
+ | Layers | 22 |
27
+ | Attention heads | 16 (Q) / 4 (KV) |
28
+ | Max sequence length | 2048 |
29
+ | Vocab size | 32,000 |
30
+ | Training steps | 20,000 |
31
+ | Best val loss | 2.6276 (perplexity: 13.84) |
32
+
33
+ ## Training
34
+
35
+ - **Dataset**: FineWeb-Edu (sample-10BT)
36
+ - **Tokenizer**: Custom BPE (trained from dataset via `train_new` mode)
37
+ - **Hardware**: Google Colab Pro+ (A100 40GB)
38
+ - **Precision**: bfloat16 mixed precision
39
+ - **Optimizer**: AdamW (lr=3e-4, weight_decay=0.1, beta2=0.95)
40
+ - **Scheduler**: Cosine warmup (2000 warmup steps)
41
+ - **Effective batch size**: 128
42
+
43
+ ## Usage
44
+
45
+ ```python
46
+ import torch
47
+ from safetensors.torch import load_file
48
+ from huggingface_hub import hf_hub_download
49
+
50
+ # 1. Load config and rebuild model
51
+ from llm_lab.config import ModelConfig
52
+ from llm_lab.model import LLMModel
53
+
54
+ model = LLMModel(ModelConfig.base_1b())
55
+ state_dict = load_file("model.safetensors")
56
+ model.load_state_dict(state_dict, strict=False) # strict=False for weight tying
57
+ model.eval()
58
+
59
+ # 2. Load tokenizer (custom BPE trained with tokenizers library)
60
+ from tokenizers import Tokenizer
61
+ import json
62
+
63
+ tok_path = hf_hub_download(repo_id="Vjeong/LLM-1B-Lab", filename="tokenizer/tokenizer.json")
64
+ meta_path = hf_hub_download(repo_id="Vjeong/LLM-1B-Lab", filename="tokenizer/tokenizer_meta.json")
65
+ tokenizer = Tokenizer.from_file(tok_path)
66
+ with open(meta_path) as f:
67
+ tok_meta = json.load(f)
68
+
69
+ # 3. Generate text
70
+ prompt = "The future of AI is"
71
+ input_ids = torch.tensor([tokenizer.encode(prompt).ids])
72
+ output = model.generate(input_ids, max_new_tokens=100, temperature=0.8, top_p=0.9)
73
+ print(tokenizer.decode(output[0].tolist()))
74
+ ```
75
+
76
+ ## License
77
+
78
+ Apache 2.0