ZygAI-OSS-138M ๐ฑ๐น
138M parametrลณ lietuviลณ kalbos modelis, sukurtas nuo nulio ir apmokytas atsakyti ฤฏ klausimus.
ZygAI-OSS-138M is a 138.6 million parameter Lithuanian Large Language Model built entirely from scratch using a custom Transformer architecture. It has undergone Supervised Fine-Tuning (SFT) to act as a conversational assistant that can answer questions truthfully in Lithuanian.
Note: This repository includes the SFT (Supervised Fine-Tuned) version of the model, which understands the
Question: [prompt]\nAnswer:format and uses a custom<EOS>token to cleanly stop generating text once the answer is complete.
๐๏ธ Architecture
A Decoder-only Transformer, comparable in scale to GPT-2 Small.
| Parameter | Value |
|---|---|
| Total Parameters | 138.6M |
| Layers | 16 |
| Attention Heads | 12 |
Model Dimensions (d_model) |
768 |
| Context Length | 1024 tokens |
| Vocabulary Size | 16,000 (Custom BPE Tokenizer) |
โก Training
Trained on a single NVIDIA RTX A5000 (24GB VRAM) GPU on RunPod with the following PyTorch optimizations:
- BFloat16 + TF32 โ mixed-precision for speed and stability
- FlashAttention โ via
F.scaled_dot_product_attention torch.compileโ kernel fusion and architecture acceleration- Gradient Checkpointing โ to save massive amounts of VRAM, allowing a 138M model to train on 24GB GPUs
| Detail | Value |
|---|---|
| Dataset | lt_corpus.txt (~94 MB โ Lithuanian Wikipedia + other texts) |
| Training Duration | ~6.5 hours (15,000 optimization steps) + SFT Phase |
| Best Validation Loss | ~3.45 |
โ ๏ธ Known Limitations
Hallucinations โ While SFT has drastically reduced base-model rambling, the model is still relatively small (~138M) and may occasionally hallucinate facts or struggle with complex reasoning.
Recommended generation settings: temperature between 0.6โ0.8 with top_k=50 enabled.
๐ฎ Roadmap
- Add an English dataset โ bilingual (LT + EN) model
- Instruction Fine-Tuning โ conversational assistant capability (SFT Complete!)
๐ Special Thanks
A huge thank you to everyone who inspired, supported, and made this project possible:
Ruby2001 ยท 0daysophie ยท italian_tech_person ยท Julia's Tech Spot ยท RunPod
Built in Lithuania ๐ฑ๐น ยท ZygMediaGroup
- Downloads last month
- 7