--- language: - ko - en license: apache-2.0 tags: - pretrained - causal-lm - korean - llm pipeline_tag: text-generation --- # EVAFRILL-Mo 3B — Pretrained Base Raw pretrained language model, the foundation for all EVAFRILL-Mo downstream variants. ## Training Stage Pretraining from scratch on a mixed Korean/English corpus. ## Key Details - **Steps**: 319,772 (Chinchilla ~93% budget) - **Tokens**: ~55B tokens - **Hardware**: 7× NVIDIA B200 GPUs (DDP) - **Precision**: BF16 - **Architecture**: Transformer decoder, 3B parameters ## Metrics | Metric | Value | |--------|-------| | Final train loss | — | | Chinchilla efficiency | ~93% | ## Notes This is the **raw pretrained model** with no instruction tuning or alignment applied. It is not suitable for chat/instruction use directly — use one of the fine-tuned variants below. ## Variants | Variant | Description | |---------|-------------| | [sft-v2](../sft-v2/) | Instruction-tuned (recommended starting point) | | [slerp](../slerp/) | SLERP merge — best overall (recommended) | | [dpo-r1](../dpo-r1/) | DPO alignment round 1 | ## Main Model Card See the [main README](../../README.md) for full project details, architecture, and training history. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("path/to/pretrain", torch_dtype="bfloat16") tokenizer = AutoTokenizer.from_pretrained("path/to/pretrain") ```