Luna-A0-12B / README.md
nbeerbower's picture
Update README.md
ab4c418 verified
metadata
library_name: transformers
tags:
  - merlina
  - text-generation
  - orpo
datasets:
  - schneewolflabs/Luna-DPO
base_model:
  - schneewolflabs/A0i-12B

Luna-A0-12B

2nd run of an experiment to give the "assistant" a better voice. Functional but significantly degrades writing quality and aligns the model closer to self-censorship.

Training Configuration

Parameter Value
Training Mode ORPO
Base Model schneewolflabs/A0i-12B
Learning Rate 5e-05
Epochs 1
Batch Size 1
Gradient Accumulation 4
Effective Batch Size 4
Max Sequence Length 2048
Optimizer paged_adamw_8bit
LR Scheduler cosine
Warmup Ratio 0.05
Weight Decay 0.01
Max Grad Norm 0.25
Seed 42
ORPO Beta 0.1
Max Prompt Length 1024
LoRA Rank (r) 16
LoRA Alpha 16
LoRA Dropout 0.05
Target Modules up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj
Quantization 4-bit (NF4)
GPU NVIDIA RTX A6000

Trained with Merlina

Merlina on GitHub