vietrix
/

viena-60m

Text Generation

text-generation-inference

Model card Files Files and versions

lehungquangminh commited on Jan 18

Commit

ac2dce6

·

verified ·

1 Parent(s): 4dc562d

Add model card

Files changed (1) hide show

README.md +72 -0

README.md ADDED Viewed

	@@ -0,0 +1,72 @@

+---
+language: vi
+tags:
+- vietnamese
+- causal-lm
+- finetuning
+- viena
+library_name: transformers
+pipeline_tag: text-generation
+license: other
+base_model: vietrix/viena-60m-pretrain
+---
+# Viena 60M (SFT)
+## Model details
+- Developed by: Vietrix
+- Model type: decoder-only causal LM (Llama-style)
+- Parameters: ~60M
+- Layers: 16
+- Hidden size: 512
+- Attention heads: 8 (KV heads: 4)
+- Max sequence length: 1024
+- RoPE theta: 10000
+- Normalization/MLP: RMSNorm + SwiGLU
+- Precision: BF16 training
+## Tokenizer
+- SentencePiece BPE
+- Target vocab in config: 32k
+- Actual vocab in tokenizer.model: 2105 (trained on a small corpus)
+- Note: embeddings are sized for 32k; only the first 2105 tokens are used by the tokenizer.
+## Training data
+- Internal synthetic Vietnamese instruction/chat data.
+- Train/val split: 2,000 / 200 JSONL records.
+- Format: messages with roles (system/user/assistant/tool).
+- PII: best-effort redaction applied during dataset preparation.
+## Fine-tuning procedure
+- Initialized from: `vietrix/viena-60m-pretrain`.
+- Objective: token-level cross-entropy, prompt loss disabled.
+- Sequence length: 1024.
+- Global batch size: 32 (batch 8 x grad_accum 4).
+- Optimizer: AdamW, lr 2e-4, weight decay 0.01, cosine decay with warmup.
+- Steps: 1,000.
+- Validation every 200 steps (10 batches).
+## Intended use
+- Vietnamese chat/instruction-following use cases.
+- Research and prototyping; not a production-grade safety model.
+## Limitations
+- Trained on a small synthetic corpus; may hallucinate or respond incorrectly.
+- Not safety-tuned for sensitive domains.
+- Tokenizer vocab is small; lexical coverage is limited.
+## How to use
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "vietrix/viena-60m"
+tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+```
+If `AutoTokenizer` fails, load the SentencePiece model explicitly:
+```python
+from transformers import LlamaTokenizer
+tokenizer = LlamaTokenizer.from_pretrained(model_id, use_fast=False)
+```