juyoung-trl commited on
Commit
4502d4d
·
verified ·
1 Parent(s): 64703e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md CHANGED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - ko
6
+ - ja
7
+ - zh
8
+ ---
9
+
10
+ # Tri-1.8B-Base
11
+
12
+
13
+ Tri-0.5B-Base is a 1.8 billion parameter multilingual language model trained as an **early experimental run** before the Tri-7B training.
14
+
15
+ The model covers **English, Korean, Japanese, and Chinese**, with additional exposure to programming languages and mathematical reasoning.
16
+ Pretrained on \~1.88 trillion tokens, it serves as a lightweight base model for research, fine-tuning, and open-source community use - especially for advancing Korean LLM development.
17
+
18
+
19
+ ## Model Summary
20
+
21
+
22
+ * Architecture: decoder-only Transformer (LLaMA-style)
23
+ * Parameters: \~1.8B (untied embeddings and LM head)
24
+ * Layers / hidden size / attention heads: 25 / 2048 / 16
25
+ * Feedforward hidden size: 5,632 (SiLU-gated MLP)
26
+ * Context length: 4,096
27
+ * RoPE θ: 100,000
28
+ * Training precision: bfloat16
29
+ * Status: base pretraining only (no instruction tuning, no RLHF)
30
+
31
+
32
+ ## Intended Use
33
+
34
+ * As a **foundation** for downstream fine-tuning and alignment.
35
+ * Research on multilingual pretraining and adaptation.
36
+
37
+
38
+ ## Limitations
39
+
40
+ * Being a base model, outputs may be unsafe, incoherent, or factually incorrect.
41
+
42
+
43
+ ## Usage
44
+
45
+ ```python
46
+ from transformers import AutoTokenizer, AutoModelForCausalLM
47
+
48
+ name = "trillionlabs/Tri-1.8B-Base"
49
+ tok = AutoTokenizer.from_pretrained(name)
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ name,
52
+ torch_dtype="bfloat16",
53
+ device_map="auto"
54
+ )
55
+
56
+ prompt = "Write a short paragraph about Hangul."
57
+ x = tok(prompt, return_tensors="pt").to(model.device)
58
+ y = model.generate(
59
+ **x,
60
+ max_new_tokens=128,
61
+ do_sample=True,
62
+ temperature=0.8,
63
+ top_p=0.95
64
+ )
65
+ print(tok.decode(y[0], skip_special_tokens=True))
66
+ ```
67
+
68
+ ## License
69
+
70
+ This model is released under the **Apache 2.0 License**.
71
+ See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details.
72
+
73
+ ---
74
+
75
+ ## Citation
76
+
77
+ If you use this model, please cite it as:
78
+
79
+ ```
80
+ @misc{trillionlabs_tri18b_base_2025,
81
+ title = {Tri-1.8B-Base},
82
+ author = {Trillion Labs},
83
+ year = {2025},
84
+ note = {https://huggingface.co/trillionlabs/Tri-1.8B-Base}
85
+ }
86
+ ```