Luna-A0-12B / README.md

nbeerbower

Update README.md

ab4c418 verified 3 months ago

preview code

raw

history blame contribute delete

1.23 kB

metadata

library_name: transformers
tags:
  - merlina
  - text-generation
  - orpo
datasets:
  - schneewolflabs/Luna-DPO
base_model:
  - schneewolflabs/A0i-12B

Luna-A0-12B

2nd run of an experiment to give the "assistant" a better voice. Functional but significantly degrades writing quality and aligns the model closer to self-censorship.

Training Configuration

Parameter	Value
Training Mode	ORPO
Base Model	`schneewolflabs/A0i-12B`
Learning Rate	5e-05
Epochs	1
Batch Size	1
Gradient Accumulation	4
Effective Batch Size	4
Max Sequence Length	2048
Optimizer	paged_adamw_8bit
LR Scheduler	cosine
Warmup Ratio	0.05
Weight Decay	0.01
Max Grad Norm	0.25
Seed	42
ORPO Beta	0.1
Max Prompt Length	1024
LoRA Rank (r)	16
LoRA Alpha	16
LoRA Dropout	0.05
Target Modules	up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj
Quantization	4-bit (NF4)
GPU	NVIDIA RTX A6000

Merlina on GitHub