lehungquangminh commited on
Commit
d3b58b6
·
verified ·
1 Parent(s): 3ce00f4

Upload Viena model

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - vi
4
+ - en
5
+ tags:
6
+ - viena
7
+ - causal-lm
8
+ - transformers
9
+ - pytorch
10
+ - chat
11
+ license: mit
12
+ library_name: transformers
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # Viena Tiny Demo (SFT)
17
+
18
+ This is a tiny, demo-only Viena checkpoint fine-tuned for instruction following.
19
+ It is **not** production quality. It is intended for smoke tests and workflow validation.
20
+
21
+ ## Model description
22
+
23
+ - Architecture: decoder-only Transformer (VienaModel) with RMSNorm, RoPE, SwiGLU, GQA.
24
+ - Parameters: ~10M (tiny config).
25
+ - Tokenizer: SentencePiece BPE (target vocab 2000; actual vocab may be smaller due to tiny data).
26
+ - Training: small offline synthetic dataset shipped with the repo.
27
+
28
+ ## Training data
29
+
30
+ - Pretrain: `viena_data/examples/pretrain_offline.jsonl`
31
+ - SFT: `viena_data/examples/sft_offline_train.jsonl`
32
+ - Validation: `viena_data/examples/sft_offline_val.jsonl`
33
+
34
+ All datasets are synthetic and intended for offline tests.
35
+
36
+ ## Training recipe (tiny)
37
+
38
+ - Config: `configs/viena_tiny.yaml`
39
+ - Pretrain: 50 steps
40
+ - SFT: 20 steps
41
+
42
+ ## Usage
43
+
44
+ ```python
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer
46
+ import torch
47
+
48
+ model_id = "vietrix/viena-tiny-demo"
49
+ tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ model_id,
52
+ torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
53
+ device_map="auto",
54
+ )
55
+
56
+ prompt = "<|system|>
57
+ You are Viena.
58
+ <|user|>
59
+ Xin chao!
60
+ <|assistant|>
61
+ "
62
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
63
+ output = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.9)
64
+ print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
65
+ ```
66
+
67
+ ## Limitations
68
+
69
+ - Very small dataset and very few steps.
70
+ - Not suitable for real use or evaluation.
71
+ - Likely to hallucinate or be inconsistent.
72
+
73
+ ## License
74
+
75
+ MIT (code + demo weights). See repository license for details.