---
language:
  - ko
  - en
license: apache-2.0
tags:
  - pretrained
  - causal-lm
  - korean
  - llm
pipeline_tag: text-generation
---

# EVAFRILL-Mo 3B — Pretrained Base

Raw pretrained language model, the foundation for all EVAFRILL-Mo downstream variants.

## Training Stage

Pretraining from scratch on a mixed Korean/English corpus.

## Key Details

- **Steps**: 319,772 (Chinchilla ~93% budget)
- **Tokens**: ~55B tokens
- **Hardware**: 7× NVIDIA B200 GPUs (DDP)
- **Precision**: BF16
- **Architecture**: Transformer decoder, 3B parameters

## Metrics

| Metric | Value |
|--------|-------|
| Final train loss | — |
| Chinchilla efficiency | ~93% |

## Notes

This is the **raw pretrained model** with no instruction tuning or alignment applied.
It is not suitable for chat/instruction use directly — use one of the fine-tuned variants below.

## Variants

| Variant | Description |
|---------|-------------|
| [sft-v2](../sft-v2/) | Instruction-tuned (recommended starting point) |
| [slerp](../slerp/) | SLERP merge — best overall (recommended) |
| [dpo-r1](../dpo-r1/) | DPO alignment round 1 |

## Main Model Card

See the [main README](../../README.md) for full project details, architecture, and training history.

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("path/to/pretrain", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("path/to/pretrain")
```