pretrain/README.md · pathcosmos/EVAFRILL-Mo-3B at main

Upload pretrain/README.md with huggingface_hub

9d4531b verified 21 days ago

1.47 kB

	---
	language:
	- ko
	- en
	license: apache-2.0
	tags:
	- pretrained
	- causal-lm
	- korean
	- llm
	pipeline_tag: text-generation
	---

	# EVAFRILL-Mo 3B — Pretrained Base

	Raw pretrained language model, the foundation for all EVAFRILL-Mo downstream variants.

	## Training Stage

	Pretraining from scratch on a mixed Korean/English corpus.

	## Key Details

	- Steps: 319,772 (Chinchilla ~93% budget)
	- Tokens: ~55B tokens
	- Hardware: 7× NVIDIA B200 GPUs (DDP)
	- Precision: BF16
	- Architecture: Transformer decoder, 3B parameters

	## Metrics

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Final train loss \| — \|
	\| Chinchilla efficiency \| ~93% \|

	## Notes

	This is the raw pretrained model with no instruction tuning or alignment applied.
	It is not suitable for chat/instruction use directly — use one of the fine-tuned variants below.

	## Variants

	\| Variant \| Description \|
	\|---------\|-------------\|
	\| [sft-v2](../sft-v2/) \| Instruction-tuned (recommended starting point) \|
	\| [slerp](../slerp/) \| SLERP merge — best overall (recommended) \|
	\| [dpo-r1](../dpo-r1/) \| DPO alignment round 1 \|

	## Main Model Card

	See the [main README](../../README.md) for full project details, architecture, and training history.

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	model = AutoModelForCausalLM.from_pretrained("path/to/pretrain", torch_dtype="bfloat16")
	tokenizer = AutoTokenizer.from_pretrained("path/to/pretrain")
	```