lehungquangminh commited on
Commit
20bd533
·
verified ·
1 Parent(s): 2b2d549

Upload Viena model

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - vi
4
+ - en
5
+ tags:
6
+ - viena
7
+ - causal-lm
8
+ - transformers
9
+ - pytorch
10
+ license: mit
11
+ library_name: transformers
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # Viena Tiny Pretrain (Base)
16
+
17
+ This is a tiny, pretrain-only Viena checkpoint. It is **not** instruction tuned.
18
+ Use it as a base for further pretraining or SFT. It is intended for smoke tests only.
19
+
20
+ ## Model description
21
+
22
+ - Architecture: decoder-only Transformer (VienaModel) with RMSNorm, RoPE, SwiGLU, GQA.
23
+ - Parameters: ~10M (tiny config).
24
+ - Tokenizer: SentencePiece BPE (target vocab 2000; actual vocab may be smaller due to tiny data).
25
+ - Training: small offline synthetic dataset shipped with the repo.
26
+
27
+ ## Training data
28
+
29
+ - Pretrain: `viena_data/examples/pretrain_offline.jsonl`
30
+
31
+ All datasets are synthetic and intended for offline tests.
32
+
33
+ ## Training recipe (tiny)
34
+
35
+ - Config: `configs/viena_tiny.yaml`
36
+ - Pretrain: 50 steps
37
+
38
+ ## Usage
39
+
40
+ ```python
41
+ from transformers import AutoModelForCausalLM, AutoTokenizer
42
+ import torch
43
+
44
+ model_id = "vietrix/viena-tiny-demo-pretrain"
45
+ tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
46
+ model = AutoModelForCausalLM.from_pretrained(
47
+ model_id,
48
+ torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
49
+ device_map="auto",
50
+ )
51
+
52
+ prompt = "Viena la gi?
53
+ "
54
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
55
+ output = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.9)
56
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
57
+ ```
58
+
59
+ ## Limitations
60
+
61
+ - Very small dataset and very few steps.
62
+ - Not instruction tuned; responses are raw continuations.
63
+ - Not suitable for real use or evaluation.
64
+
65
+ ## License
66
+
67
+ MIT (code + demo weights). See repository license for details.