vietrix
/

viena-tiny-demo-pretrain

Text Generation

text-generation-inference

Model card Files Files and versions

lehungquangminh commited on Jan 15

Commit

20bd533

·

verified ·

1 Parent(s): 2b2d549

Upload Viena model

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+---
+language:
+- vi
+- en
+tags:
+- viena
+- causal-lm
+- transformers
+- pytorch
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
+---
+# Viena Tiny Pretrain (Base)
+This is a tiny, pretrain-only Viena checkpoint. It is **not** instruction tuned.
+Use it as a base for further pretraining or SFT. It is intended for smoke tests only.
+## Model description
+- Architecture: decoder-only Transformer (VienaModel) with RMSNorm, RoPE, SwiGLU, GQA.
+- Parameters: ~10M (tiny config).
+- Tokenizer: SentencePiece BPE (target vocab 2000; actual vocab may be smaller due to tiny data).
+- Training: small offline synthetic dataset shipped with the repo.
+## Training data
+- Pretrain: `viena_data/examples/pretrain_offline.jsonl`
+All datasets are synthetic and intended for offline tests.
+## Training recipe (tiny)
+- Config: `configs/viena_tiny.yaml`
+- Pretrain: 50 steps
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_id = "vietrix/viena-tiny-demo-pretrain"
+tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+    device_map="auto",
+)
+prompt = "Viena la gi?
+"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+output = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.9)
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
+## Limitations
+- Very small dataset and very few steps.
+- Not instruction tuned; responses are raw continuations.
+- Not suitable for real use or evaluation.
+## License
+MIT (code + demo weights). See repository license for details.