language:
- ko
- en
license: apache-2.0
tags:
- merge
- slerp
- best
- instruction-tuned
- alignment
- korean
- llm
pipeline_tag: text-generation
EVAFRILL-Mo 3B — SLERP Merge (Recommended)
Spherical linear interpolation (SLERP) merge of SFT v2 and DPO R2. This is the recommended variant for general use.
Training Stage
Model merge — SLERP interpolation between SFT v2 (50%) and DPO R2 (50%). No additional training was performed; this is a post-hoc weight interpolation.
Key Details
- Merge method: SLERP (spherical linear interpolation)
- Sources: SFT v2 (50%) + DPO R2 (50%)
- Inference: temp=0.7, repetition_penalty=1.2 recommended
Metrics
| Metric | Value |
|---|---|
| Repetition rate | 74.5% (lowest among all variants) |
| HellaSwag | 34.6% |
| ARC-Easy | 32.0% |
Why SLERP
SLERP merging interpolates weights along the unit sphere, better preserving the learned representations from both checkpoints compared to naive linear averaging. The 50/50 split between SFT v2 and DPO R2 achieves the best trade-off between instruction-following quality and repetition reduction across all evaluated variants.
Main Model Card
See the main README for full project details, architecture, and training history.
Usage
Note: This is a custom Mamba-2 hybrid architecture —
AutoModelForCausalLMis not supported. Use direct safetensors loading with the EVAFRILL-Mo source code.
# Prerequisites
git clone https://github.com/pathcosmos/EVAFRILL-Mo
pip install torch safetensors tokenizers PyYAML
import json, torch
from model.config import LMConfig
from model.transformer import LLM
from tokenizers import Tokenizer
from safetensors.torch import load_file as load_safetensors
CKPT = "path/to/slerp" # this directory
with open(f"{CKPT}/config.json") as f:
data = json.load(f)
for k in ("model_type", "architectures", "_variant", "_description"):
data.pop(k, None)
cfg = LMConfig(**data)
cfg.use_flash_attn = False
model = LLM(cfg)
state = load_safetensors(f"{CKPT}/model.safetensors", device="cpu")
model.load_state_dict(state, strict=False)
model = model.to(device="cuda:0", dtype=torch.bfloat16).eval()
tok = Tokenizer.from_file(f"{CKPT}/tokenizer.json")
prompt = "<|user|>\n질문을 여기에 입력하세요\n<|assistant|>\n"
ids = torch.tensor([tok.encode(prompt).ids], device="cuda:0")
with torch.no_grad():
for _ in range(512):
logits, _ = model(ids)
logits = logits[:, -1, :].float()
for prev_id in set(ids[0].tolist()):
if logits[0, prev_id] > 0: logits[0, prev_id] /= 1.2
else: logits[0, prev_id] *= 1.2
probs = torch.softmax(logits / 0.7, dim=-1)
next_id = torch.multinomial(probs, 1)
ids = torch.cat([ids, next_id], dim=1)
if next_id.item() == tok.token_to_id("</s>"): break
print(tok.decode(ids[0].tolist()))
Alternatively, use the wrapped runner from frankenstallm_test:
from eval_framework.evafrill_runner import generate
result = generate("한국어로 인사해주세요.")
print(result["response"])