|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- ko |
|
|
- ja |
|
|
- zh |
|
|
--- |
|
|
|
|
|
# Tri-1.8B-Base |
|
|
|
|
|
|
|
|
Tri-1.8B-Base is a 1.8 billion parameter multilingual language model trained as an **early experimental run** before the Tri-7B training. |
|
|
|
|
|
The model covers **English, Korean, Japanese, and Chinese**, with additional exposure to programming languages and mathematical reasoning. |
|
|
Pretrained on \~1.88 trillion tokens, it serves as a lightweight base model for research, fine-tuning, and open-source community use - especially for advancing Korean LLM development. |
|
|
|
|
|
|
|
|
## Model Summary |
|
|
|
|
|
|
|
|
* Architecture: decoder-only Transformer (LLaMA-style) |
|
|
* Parameters: \~1.8B (untied embeddings and LM head) |
|
|
* Layers / hidden size / attention heads: 25 / 2048 / 16 |
|
|
* Feedforward hidden size: 5,632 (SiLU-gated MLP) |
|
|
* Context length: 4,096 |
|
|
* RoPE θ: 100,000 |
|
|
* Training precision: bfloat16 |
|
|
* Status: base pretraining only (no instruction tuning, no RLHF) |
|
|
|
|
|
|
|
|
## Intended Use |
|
|
|
|
|
* As a **foundation** for downstream fine-tuning and alignment. |
|
|
* Research on multilingual pretraining and adaptation. |
|
|
|
|
|
|
|
|
## Limitations |
|
|
|
|
|
* Being a base model, outputs may be unsafe, incoherent, or factually incorrect. |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
name = "trillionlabs/Tri-1.8B-Base" |
|
|
tok = AutoTokenizer.from_pretrained(name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
name, |
|
|
torch_dtype="bfloat16", |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
prompt = "Write a short paragraph about Hangul." |
|
|
x = tok(prompt, return_tensors="pt").to(model.device) |
|
|
y = model.generate( |
|
|
**x, |
|
|
max_new_tokens=128, |
|
|
do_sample=True, |
|
|
temperature=0.8, |
|
|
top_p=0.95 |
|
|
) |
|
|
print(tok.decode(y[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the **Apache 2.0 License**. |
|
|
See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite it as: |
|
|
|
|
|
``` |
|
|
@misc{trillionlabs_tri18b_base_2025, |
|
|
title = {Tri-1.8B-Base}, |
|
|
author = {Trillion Labs}, |
|
|
year = {2025}, |
|
|
note = {https://huggingface.co/trillionlabs/Tri-1.8B-Base} |
|
|
} |
|
|
``` |