EVAFRILL-Mo-3B / pretrain /README.md

pathcosmos

Upload pretrain/README.md with huggingface_hub

9d4531b verified 20 days ago

preview code

raw

history blame contribute delete

1.47 kB

metadata

language:
  - ko
  - en
license: apache-2.0
tags:
  - pretrained
  - causal-lm
  - korean
  - llm
pipeline_tag: text-generation

EVAFRILL-Mo 3B — Pretrained Base

Raw pretrained language model, the foundation for all EVAFRILL-Mo downstream variants.

Training Stage

Pretraining from scratch on a mixed Korean/English corpus.

Key Details

Steps: 319,772 (Chinchilla ~93% budget)
Tokens: ~55B tokens
Hardware: 7× NVIDIA B200 GPUs (DDP)
Precision: BF16
Architecture: Transformer decoder, 3B parameters

Metrics

Metric	Value
Final train loss	—
Chinchilla efficiency	~93%

Notes

This is the raw pretrained model with no instruction tuning or alignment applied. It is not suitable for chat/instruction use directly — use one of the fine-tuned variants below.

Variants

Variant	Description
sft-v2	Instruction-tuned (recommended starting point)
slerp	SLERP merge — best overall (recommended)
dpo-r1	DPO alignment round 1

Main Model Card

See the main README for full project details, architecture, and training history.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("path/to/pretrain", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("path/to/pretrain")