Viena 60M (SFT)

Model details

  • Developed by: Vietrix
  • Model type: decoder-only causal LM (Llama-style)
  • Parameters: ~60M
  • Layers: 16
  • Hidden size: 512
  • Attention heads: 8 (KV heads: 4)
  • Max sequence length: 1024
  • RoPE theta: 10000
  • Normalization/MLP: RMSNorm + SwiGLU
  • Precision: BF16 training

Tokenizer

  • SentencePiece BPE
  • Target vocab in config: 32k
  • Actual vocab in tokenizer.model: 2105 (trained on a small corpus)
  • Note: embeddings are sized for 32k; only the first 2105 tokens are used by the tokenizer.

Training data

  • Internal synthetic Vietnamese instruction/chat data.
  • Train/val split: 2,000 / 200 JSONL records.
  • Format: messages with roles (system/user/assistant/tool).
  • PII: best-effort redaction applied during dataset preparation.

Fine-tuning procedure

  • Initialized from: vietrix/viena-60m-pretrain.
  • Objective: token-level cross-entropy, prompt loss disabled.
  • Sequence length: 1024.
  • Global batch size: 32 (batch 8 x grad_accum 4).
  • Optimizer: AdamW, lr 2e-4, weight decay 0.01, cosine decay with warmup.
  • Steps: 1,000.
  • Validation every 200 steps (10 batches).

Intended use

  • Vietnamese chat/instruction-following use cases.
  • Research and prototyping; not a production-grade safety model.

Limitations

  • Trained on a small synthetic corpus; may hallucinate or respond incorrectly.
  • Not safety-tuned for sensitive domains.
  • Tokenizer vocab is small; lexical coverage is limited.

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "vietrix/viena-60m"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_id)

If AutoTokenizer fails, load the SentencePiece model explicitly:

from transformers import LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained(model_id, use_fast=False)
Downloads last month
23
Safetensors
Model size
66.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vietrix/viena-60m

Finetuned
(1)
this model

Space using vietrix/viena-60m 1