EVAFRILL-Mo-3B / pretrain /README.md
pathcosmos's picture
Upload pretrain/README.md with huggingface_hub
9d4531b verified
---
language:
- ko
- en
license: apache-2.0
tags:
- pretrained
- causal-lm
- korean
- llm
pipeline_tag: text-generation
---
# EVAFRILL-Mo 3B — Pretrained Base
Raw pretrained language model, the foundation for all EVAFRILL-Mo downstream variants.
## Training Stage
Pretraining from scratch on a mixed Korean/English corpus.
## Key Details
- **Steps**: 319,772 (Chinchilla ~93% budget)
- **Tokens**: ~55B tokens
- **Hardware**: 7× NVIDIA B200 GPUs (DDP)
- **Precision**: BF16
- **Architecture**: Transformer decoder, 3B parameters
## Metrics
| Metric | Value |
|--------|-------|
| Final train loss | — |
| Chinchilla efficiency | ~93% |
## Notes
This is the **raw pretrained model** with no instruction tuning or alignment applied.
It is not suitable for chat/instruction use directly — use one of the fine-tuned variants below.
## Variants
| Variant | Description |
|---------|-------------|
| [sft-v2](../sft-v2/) | Instruction-tuned (recommended starting point) |
| [slerp](../slerp/) | SLERP merge — best overall (recommended) |
| [dpo-r1](../dpo-r1/) | DPO alignment round 1 |
## Main Model Card
See the [main README](../../README.md) for full project details, architecture, and training history.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("path/to/pretrain", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("path/to/pretrain")
```