metadata
language:
- ko
- en
license: apache-2.0
tags:
- pretrained
- causal-lm
- korean
- llm
pipeline_tag: text-generation
EVAFRILL-Mo 3B — Pretrained Base
Raw pretrained language model, the foundation for all EVAFRILL-Mo downstream variants.
Training Stage
Pretraining from scratch on a mixed Korean/English corpus.
Key Details
- Steps: 319,772 (Chinchilla ~93% budget)
- Tokens: ~55B tokens
- Hardware: 7× NVIDIA B200 GPUs (DDP)
- Precision: BF16
- Architecture: Transformer decoder, 3B parameters
Metrics
| Metric | Value |
|---|---|
| Final train loss | — |
| Chinchilla efficiency | ~93% |
Notes
This is the raw pretrained model with no instruction tuning or alignment applied. It is not suitable for chat/instruction use directly — use one of the fine-tuned variants below.
Variants
| Variant | Description |
|---|---|
| sft-v2 | Instruction-tuned (recommended starting point) |
| slerp | SLERP merge — best overall (recommended) |
| dpo-r1 | DPO alignment round 1 |
Main Model Card
See the main README for full project details, architecture, and training history.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("path/to/pretrain", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("path/to/pretrain")