tinygpt-base-model / README.md
Abdurrahmanesc's picture
Update README.md
985d0a7 verified
---
license: apache-2.0
language:
- en
---
# 🧠 Fine-Tuned Nano LLM
This repository contains a fine-tuned version of a small LLM trained using **LoRA / QLoRA**.
---
## πŸ“Œ Model Overview
- **Base Model:** Nano LLM
- **Fine-tuning Method:** LoRA / QLoRA
- **Dataset:** Custom synthetic + curated dataset
- **Task:** Text generation
- **Framework:** PyTorch + Hugging Face Transformers
- **Training Environment:** Google Colab Free Tier
---
## πŸ§ͺ Training Details
- Mixed precision (fp16 / nf4)
- Gradient checkpointing enabled
- LoRA rank & parameters carefully optimized for small hardware
- Trained on DatasetDict format with proper splits
---
## πŸ“‚ Files Included
This repo contains:
- `config.json`
- `adapter_config.json`
- `adapter_model.bin`
- `tokenizer.json`
- `tokenizer.model`
- `special_tokens_map.json`
- `generation_config.json`
- `training_args.bin`
---
## How to load model
from modeling_tinygpt import TinyGPT
import torch
import json
with open("config.json") as f:
cfg = json.load(f)
model = TinyGPT(
vocab_size=cfg["vocab_size"],
d_model=cfg["d_model"],
n_heads=cfg["n_heads"],
n_layers=cfg["n_layers"],
d_ff=cfg["d_ff"],
max_seq_len=cfg["max_seq_len"]
)
state = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(state)
model.eval()
---
## πŸ“ˆ Benchmark & Notes
This model is designed for experimentation, education, and small-scale inference.
This model uses a custom architecture. Please load using `trust_remote_code=True`.
---