hemantvirmani
/

tinyGPT

Safetensors

Model card Files Files and versions

xet

Community

hemantvirmani commited on 14 days ago

Commit

dd70b62

verified ·

1 Parent(s): d30893b

Delete README.md

Browse files

Files changed (1) hide show

README.md +0 -210

README.md DELETED Viewed

@@ -1,210 +0,0 @@
----
-license: mit
----
-# TinyGPT — GPT-2 Style LM (~163M) trained on FineWeb-Edu
-A GPT-2 style decoder-only transformer pretrained from scratch on ~43B tokens
-of the FineWeb-Edu dataset, achieving a validation loss of **2.84**.
-Built this project to develop hands-on intuition for LLMs - inspired by Andrej Karpathy's nanoGPT
----
-## Model Details
-| Parameter | Value |
-|-----------|-------|
-| Architecture | Decoder-only Transformer (GPT-2 style) |
-| Parameters | ~163M |
-| Layers | 12 |
-| Attention heads | 12 |
-| Embedding dim | 768 |
-| Context length | 1024 tokens |
-| Vocab size | 50,257 |
-| Tokenizer | GPT-2 BPE via `tiktoken` |
-| Attention | Causal self-attention (Flash Attention via `F.scaled_dot_product_attention`) |
-| LM head | Separate linear layer (not weight-tied) |
-> **Why ~163M and not 124M?** Standard GPT-2 124M ties the LM head weights
-> with the token embedding table, saving ~38M parameters. TinyGPT uses a
-> separate `nn.Linear` head, resulting in ~163M total parameters.
----
-## Training Details
-| Detail | Value |
-|--------|-------|
-| Dataset | [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (`sample-100BT` subset) |
-| Tokens trained | ~43B |
-| Validation loss | 2.84 |
-| Optimizer | AdamW (betas=(0.9, 0.95), eps=1e-8) |
-| Learning rate | 6e-4 |
-| LR schedule | Linear warmup (4000 steps) -> Cosine decay to 6e-5 |
-| Effective batch size | 512 (16 x 32 gradient accumulation steps) |
-| Weight decay | 0.1 |
-| Gradient clipping | 1.0 |
-| Precision | bfloat16 (bf16) |
-| Max iterations | 600,000 |
-| Dropout | 0.0 |
----
-## Format
-Weights are saved in **PyTorch native format** — a plain state dict saved with
-`torch.save()`, containing only model weights (no optimizer state, no
-scheduler). The file is ~670MB.
-To load, you need the `TinyGPT` model class (included below).
-The model is also available in **Hugging Face Transformers format** in this
-repository. The HF-format files include:
-- `model.safetensors`
-- `config.json`
-- `generation_config.json`
-- `tokenizer.json`
-- `tokenizer_config.json`
-The HF-format model can be loaded with `transformers` and is useful for standard
-Hugging Face workflows. Note that TinyGPT was trained with a separate,
-non-weight-tied LM head that includes a trained bias. Standard
-`GPT2LMHeadModel.from_pretrained()` loads the main model weights but treats
-`lm_head.bias` as an unexpected key because the default GPT-2 head is biasless.
-For exact TinyGPT inference, restore the LM-head bias as shown below or use
-`infer_hf.py` from the GitHub repo.
----
-## Usage
-### 1. Install dependencies
-Clone the repo and install requirements:
-```bash
-git clone https://github.com/hemantvirmani/tinygpt
-cd tinygpt
-pip install -r requirements.txt
-```
-### 2. Get the model class
-The `TinyGPT` model class is available at:
-**[https://github.com/hemantvirmani/tinygpt](https://github.com/hemantvirmani/tinygpt)**
-Clone or download `tinygpt.py` and place it in your working directory.
-### 3. Load weights and run inference
-```python
-import tinygpt
-model = tinygpt.load_model_for_inference()
-prompts = [
-    "Hello, I'm a language model,",
-    "The human brain contains approximately",
-    "Photosynthesis is the process by which plants",
-    "The theory of relativity states that ",
-    "The Roman Empire fell due to several factors including",
-    "During the Industrial Revolution, workers ",
-    "To solve a quadratic equation, you must first",
-    "The key differences between mitosis and meiosis are ",
-    "Once upon a time in ancient India, there lived a king who ",
-]
-for prompt in prompts:
-    print(f"\n{'='*60}")
-    print(f"PROMPT: {prompt}")
-    print(f"{'='*60}")
-    print(model.generate_text(start_text=prompt, max_tokens=500, temperature=0.7))
-```
-### 4. Load the Hugging Face format model
-```bash
-pip install torch transformers safetensors huggingface_hub
-```
-```python
-import torch
-from huggingface_hub import hf_hub_download
-from safetensors.torch import load_file
-from transformers import GPT2LMHeadModel, GPT2Tokenizer
-model_id = "hemantvirmani/tinyGPT"
-tokenizer = GPT2Tokenizer.from_pretrained(model_id)
-model = GPT2LMHeadModel.from_pretrained(model_id)
-# Restore TinyGPT's trained LM-head bias for exact inference.
-weights_path = hf_hub_download(repo_id=model_id, filename="model.safetensors")
-state_dict = load_file(weights_path, device="cpu")
-if "lm_head.bias" in state_dict:
-    lm_head = torch.nn.Linear(model.config.n_embd, model.config.vocab_size, bias=True)
-    lm_head.weight = torch.nn.Parameter(state_dict["lm_head.weight"])
-    lm_head.bias = torch.nn.Parameter(state_dict["lm_head.bias"])
-    model.lm_head = lm_head
-device = "cuda" if torch.cuda.is_available() else "cpu"
-model = model.to(device)
-model.eval()
-prompt = "Photosynthesis is the process by which plants"
-inputs = tokenizer(prompt, return_tensors="pt").to(device)
-with torch.no_grad():
-    output_ids = model.generate(
-        **inputs,
-        max_new_tokens=500,
-        do_sample=True,
-        temperature=0.7,
-        top_k=0,
-        top_p=1.0,
-        repetition_penalty=1.3,
-        pad_token_id=tokenizer.eos_token_id,
-    )
-print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
-```
-You can also run the helper script from the GitHub repo:
-```bash
-python infer_hf.py --model_dir hemantvirmani/tinyGPT --prompt "Photosynthesis is the process by which plants"
-```
----
-## Sample Outputs (temperature=0.7, 500 tokens)
-**Prompt:** `Photosynthesis is the process by which plants`
-> Photosynthesis is the process by which plants take in sunlight, water,
-> carbon dioxide and nutrients to produce energy for their cells. Humans
-> depend on photosynthesis to provide their own energy, but many plants
-> also use the energy of other organisms to produce food. The five types of...
-**Prompt:** `The Roman Empire fell due to several factors including`
-> The Roman Empire fell due to several factors including the decline of the
-> Roman army, the rise of the Papacy, and the threat of the Islamic invasion.
-> The fall of the Roman Empire was the result of a series of civil wars in
-> the late fourth century, and was led by the first emperor of the Roman
-> Empire, Constantine the Great.
----
-## Limitations
-- This is a **base language model** — it completes text, it does not follow
-  instructions or answer questions.
-- Prone to repetition loops, especially at low temperature.
-- Fine-tuning required for instruction-following or domain-specific tasks.
----
-## Thanks to
-- Andrej Karpathy's nanoGPT - Video and Code
-- Dataset: HuggingFace [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)