| --- |
| language: |
| - ko |
| - en |
| license: apache-2.0 |
| tags: |
| - pretrained |
| - causal-lm |
| - korean |
| - llm |
| pipeline_tag: text-generation |
| --- |
| |
| # EVAFRILL-Mo 3B — Pretrained Base |
|
|
| Raw pretrained language model, the foundation for all EVAFRILL-Mo downstream variants. |
|
|
| ## Training Stage |
|
|
| Pretraining from scratch on a mixed Korean/English corpus. |
|
|
| ## Key Details |
|
|
| - **Steps**: 319,772 (Chinchilla ~93% budget) |
| - **Tokens**: ~55B tokens |
| - **Hardware**: 7× NVIDIA B200 GPUs (DDP) |
| - **Precision**: BF16 |
| - **Architecture**: Transformer decoder, 3B parameters |
|
|
| ## Metrics |
|
|
| | Metric | Value | |
| |--------|-------| |
| | Final train loss | — | |
| | Chinchilla efficiency | ~93% | |
|
|
| ## Notes |
|
|
| This is the **raw pretrained model** with no instruction tuning or alignment applied. |
| It is not suitable for chat/instruction use directly — use one of the fine-tuned variants below. |
|
|
| ## Variants |
|
|
| | Variant | Description | |
| |---------|-------------| |
| | [sft-v2](../sft-v2/) | Instruction-tuned (recommended starting point) | |
| | [slerp](../slerp/) | SLERP merge — best overall (recommended) | |
| | [dpo-r1](../dpo-r1/) | DPO alignment round 1 | |
|
|
| ## Main Model Card |
|
|
| See the [main README](../../README.md) for full project details, architecture, and training history. |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| model = AutoModelForCausalLM.from_pretrained("path/to/pretrain", torch_dtype="bfloat16") |
| tokenizer = AutoTokenizer.from_pretrained("path/to/pretrain") |
| ``` |
|
|