vietrix
/

viena-tiny-demo

Text Generation

text-generation-inference

Model card Files Files and versions

viena-tiny-demo / README.md

lehungquangminh's picture

lehungquangminh

Upload Viena model

d3b58b6 verified about 1 month ago

|

history blame contribute delete

1.93 kB

	---
	language:
	- vi
	- en
	tags:
	- viena
	- causal-lm
	- transformers
	- pytorch
	- chat
	license: mit
	library_name: transformers
	pipeline_tag: text-generation
	---

	# Viena Tiny Demo (SFT)

	This is a tiny, demo-only Viena checkpoint fine-tuned for instruction following.
	It is not production quality. It is intended for smoke tests and workflow validation.

	## Model description

	- Architecture: decoder-only Transformer (VienaModel) with RMSNorm, RoPE, SwiGLU, GQA.
	- Parameters: ~10M (tiny config).
	- Tokenizer: SentencePiece BPE (target vocab 2000; actual vocab may be smaller due to tiny data).
	- Training: small offline synthetic dataset shipped with the repo.

	## Training data

	- Pretrain: `viena_data/examples/pretrain_offline.jsonl`
	- SFT: `viena_data/examples/sft_offline_train.jsonl`
	- Validation: `viena_data/examples/sft_offline_val.jsonl`

	All datasets are synthetic and intended for offline tests.

	## Training recipe (tiny)

	- Config: `configs/viena_tiny.yaml`
	- Pretrain: 50 steps
	- SFT: 20 steps

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "vietrix/viena-tiny-demo"
	tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
	device_map="auto",
	)

	prompt = "<\|system\|>
	You are Viena.
	<\|user\|>
	Xin chao!
	<\|assistant\|>
	"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	output = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.9)
	print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
	```

	## Limitations

	- Very small dataset and very few steps.
	- Not suitable for real use or evaluation.
	- Likely to hallucinate or be inconsistent.

	## License

	MIT (code + demo weights). See repository license for details.