Vjeong
/

LLM-1B-Lab

Model card Files Files and versions

LLM-1B-Lab / README.md

Vjeong's picture

Upload folder using huggingface_hub

8132ac6 verified 8 days ago

|

history blame contribute delete

1.84 kB

	---
	language:
	- en
	tags:
	- llama
	- decoder-only
	- educational
	- pretrained
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb-edu
	---

	# LLM-1B-Lab

	Educational implementation of a 1.1B parameter LLaMA-style Decoder-Only Transformer,
	trained from scratch on [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).

	## Model Details

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Parameters \| ~1.1B \|
	\| Architecture \| LLaMA-style (RMSNorm, RoPE, GQA, SwiGLU, Weight Tying) \|
	\| Hidden dim \| 2048 \|
	\| Layers \| 22 \|
	\| Attention heads \| 16 (Q) / 4 (KV) \|
	\| Max sequence length \| 2048 \|
	\| Vocab size \| 32,000 \|
	\| Training steps \| 20,000 \|
	\| Best val loss \| 2.3653 (perplexity: 10.65) \|

	## Training

	- Dataset: FineWeb-Edu (sample-10BT)
	- Tokenizer: Pretrained LLaMA 2 (`NousResearch/Llama-2-7b-hf`, 32K vocab)
	- Hardware: Google Colab Pro+ (A100 40GB)
	- Precision: bfloat16 mixed precision
	- Optimizer: AdamW (lr=3e-4, weight_decay=0.1, beta2=0.95)
	- Scheduler: Cosine warmup (2000 warmup steps)
	- Effective batch size: 128

	## Usage

	```python
	import torch
	from safetensors.torch import load_file
	from transformers import AutoTokenizer

	# 1. Load config and rebuild model
	from llm_lab.config import ModelConfig
	from llm_lab.model import LLMModel

	model = LLMModel(ModelConfig.base_1b())
	state_dict = load_file("model.safetensors")
	model.load_state_dict(state_dict, strict=False) # strict=False for weight tying
	model.eval()

	# 2. Load tokenizer (pretrained LLaMA 2)
	tokenizer = AutoTokenizer.from_pretrained("Vjeong/LLM-1B-Lab")

	# 3. Generate text
	prompt = "The future of AI is"
	input_ids = torch.tensor([tokenizer.encode(prompt)])
	output = model.generate(input_ids, max_new_tokens=100, temperature=0.8, top_p=0.9)
	print(tokenizer.decode(output[0].tolist()))
	```

	## License

	Apache 2.0