Vritya Tiny (163M) โ Transformers
Small GPT-style causal language model exported from the original PyTorch best_model.pth checkpoint. Vocabulary matches GPT-2 BPE (50257); use the bundled tokenizer or any compatible GPT-2 tokenizer.
Load
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"curious-techie/Vritya-Tiny-163M-HF",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"curious-techie/Vritya-Tiny-163M-HF",
trust_remote_code=True,
)
Requires trust_remote_code=True because the architecture is defined in modeling_vritya.py in this repo.
Model facts (default config)
- ~163M parameters (see
config.json) - Context: 1024 tokens
- 12 layers, 12 heads, embedding dim 768
Source
Derived from the project checkpoint published as best_model.pth on curious-techie/Vritya-Tiny-163M.
- Downloads last month
- 205