Viena 60M (Pretrain)

Model details

  • Developed by: Vietrix
  • Model type: decoder-only causal LM (Llama-style)
  • Parameters: ~60M
  • Layers: 16
  • Hidden size: 512
  • Attention heads: 8 (KV heads: 4)
  • Max sequence length: 1024
  • RoPE theta: 10000
  • Normalization/MLP: RMSNorm + SwiGLU
  • Precision: BF16 training

Tokenizer

  • SentencePiece BPE
  • Target vocab in config: 32k
  • Actual vocab in tokenizer.model: 2105 (trained on a small corpus)
  • Note: embeddings are sized for 32k; only the first 2105 tokens are used by the tokenizer.

Training data

  • Internal synthetic Vietnamese pretrain corpus.
  • Domains: Vietnam/general, math, code, identity.
  • Raw JSONL entries: ~2.4k; after cleanup/dedupe, HF dataset contains 472 unique texts.
  • PII: best-effort redaction during dataset build.

Training procedure

  • Objective: next-token prediction with packed sequences.
  • Sequence length: 1024.
  • Global batch size: 64 (batch 16 x grad_accum 4).
  • Optimizer: AdamW, lr 3e-4, weight decay 0.1, cosine decay with 10% warmup.
  • Steps: 2,500 (approx 163.8M tokens processed).
  • Checkpoints saved every 1,250 steps.

Intended use

  • Base model for continued training or fine-tuning on Vietnamese tasks.
  • Not instruction-tuned; outputs may be unaligned.

Limitations

  • Trained on a small synthetic corpus; coverage and factuality are limited.
  • Not suitable for safety-critical or high-stakes applications.
  • Tokenizer vocab is much smaller than model vocab; lexical coverage is limited.

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "vietrix/viena-60m-pretrain"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_id)

If AutoTokenizer fails, load the SentencePiece model explicitly:

from transformers import LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained(model_id, use_fast=False)
Downloads last month
18
Safetensors
Model size
66.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vietrix/viena-60m-pretrain

Finetunes
1 model