ZygAI-OSS-138M ๐Ÿ‡ฑ๐Ÿ‡น

138M parametrลณ lietuviลณ kalbos modelis, sukurtas nuo nulio ir apmokytas atsakyti ฤฏ klausimus.

ZygAI-OSS-138M is a 138.6 million parameter Lithuanian Large Language Model built entirely from scratch using a custom Transformer architecture. It has undergone Supervised Fine-Tuning (SFT) to act as a conversational assistant that can answer questions truthfully in Lithuanian.

Note: This repository includes the SFT (Supervised Fine-Tuned) version of the model, which understands the Question: [prompt]\nAnswer: format and uses a custom <EOS> token to cleanly stop generating text once the answer is complete.


๐Ÿ—๏ธ Architecture

A Decoder-only Transformer, comparable in scale to GPT-2 Small.

Parameter Value
Total Parameters 138.6M
Layers 16
Attention Heads 12
Model Dimensions (d_model) 768
Context Length 1024 tokens
Vocabulary Size 16,000 (Custom BPE Tokenizer)

โšก Training

Trained on a single NVIDIA RTX A5000 (24GB VRAM) GPU on RunPod with the following PyTorch optimizations:

  • BFloat16 + TF32 โ€” mixed-precision for speed and stability
  • FlashAttention โ€” via F.scaled_dot_product_attention
  • torch.compile โ€” kernel fusion and architecture acceleration
  • Gradient Checkpointing โ€” to save massive amounts of VRAM, allowing a 138M model to train on 24GB GPUs
Detail Value
Dataset lt_corpus.txt (~94 MB โ€” Lithuanian Wikipedia + other texts)
Training Duration ~6.5 hours (15,000 optimization steps) + SFT Phase
Best Validation Loss ~3.45

โš ๏ธ Known Limitations

Hallucinations โ€” While SFT has drastically reduced base-model rambling, the model is still relatively small (~138M) and may occasionally hallucinate facts or struggle with complex reasoning.

Recommended generation settings: temperature between 0.6โ€“0.8 with top_k=50 enabled.


๐Ÿ”ฎ Roadmap

  • Add an English dataset โ†’ bilingual (LT + EN) model
  • Instruction Fine-Tuning โ†’ conversational assistant capability (SFT Complete!)

๐Ÿ™ Special Thanks

A huge thank you to everyone who inspired, supported, and made this project possible:

Ruby2001 ยท 0daysophie ยท italian_tech_person ยท Julia's Tech Spot ยท RunPod


Built in Lithuania ๐Ÿ‡ฑ๐Ÿ‡น ยท ZygMediaGroup

Downloads last month
7
Safetensors
Model size
0.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support