EVAFRILL-Mo-3B / pretrain /README.md
pathcosmos's picture
Upload pretrain/README.md with huggingface_hub
9d4531b verified
metadata
language:
  - ko
  - en
license: apache-2.0
tags:
  - pretrained
  - causal-lm
  - korean
  - llm
pipeline_tag: text-generation

EVAFRILL-Mo 3B — Pretrained Base

Raw pretrained language model, the foundation for all EVAFRILL-Mo downstream variants.

Training Stage

Pretraining from scratch on a mixed Korean/English corpus.

Key Details

  • Steps: 319,772 (Chinchilla ~93% budget)
  • Tokens: ~55B tokens
  • Hardware: 7× NVIDIA B200 GPUs (DDP)
  • Precision: BF16
  • Architecture: Transformer decoder, 3B parameters

Metrics

Metric Value
Final train loss
Chinchilla efficiency ~93%

Notes

This is the raw pretrained model with no instruction tuning or alignment applied. It is not suitable for chat/instruction use directly — use one of the fine-tuned variants below.

Variants

Variant Description
sft-v2 Instruction-tuned (recommended starting point)
slerp SLERP merge — best overall (recommended)
dpo-r1 DPO alignment round 1

Main Model Card

See the main README for full project details, architecture, and training history.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("path/to/pretrain", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("path/to/pretrain")