emojify-dpo

This model is a DPO (Direct Preference Optimization) fine-tuned version of marioparreno/emojify-sft for emojify conversion. It has been optimized to prefer high-quality, semantically accurate emojifications.

Model Description

This model further refines an SFT model by training on preference pairs. For each prompt, the model was shown a "chosen" (preferred) response and a "rejected" response, learning to align its outputs with human (or superior LLM) preferences for emojify conversion.

Training Details

Base Model

  • Model: marioparreno/emojify-sft
  • Architecture: Causal LM
  • Context Length: 256 tokens

LoRA Configuration

  • LoRA Rank (r): 16
  • LoRA Alpha: 16
  • LoRA Dropout: 0.0
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Quantization

  • 4-bit Quantization: True
  • 8-bit Quantization: False

Training Hyperparameters

  • Training Epochs: 2
  • Batch Size (per device): 8
  • Gradient Accumulation Steps: 1
  • Effective Batch Size: 8
  • Learning Rate: 3e-06
  • DPO Beta: 0.1
  • Max Length: 256
  • Max Prompt Length: 512
  • Optimizer: adamw_8bit
  • Weight Decay: 0.01
  • Warmup Ratio: 0.1
  • LR Scheduler: linear
  • Training Method: Direct Preference Optimization (DPO)
  • Gradient Checkpointing: unsloth
  • Training Random Seed: 3407
  • Random State (Model Init): 3407

Training Results

  • Total Training Steps: 450
  • Final Training Loss: 0.4935
  • Rewards / Chosen: -1.2774
  • Rewards / Rejected: -2.5320
  • Reward Accuracy: 0.8750
  • Reward Margin: 1.2546

Dataset

This model was trained on the marioparreno/emojify-dpo DPO dataset.

Dataset Statistics

  • Total Training Examples: 1800
  • Total Test Examples: 200

Usage

from unsloth import FastModel

# Load the fine-tuned model
model, tokenizer = FastModel.from_pretrained(
    model_name="marioparreno/emojify-dpo",
    max_seq_length=256,
    load_in_4bit=True,
)

# Inference
inputs = tokenizer.apply_chat_template(
    [
        {"role": "system", "content": "Translate this text to emoji:"},
        {"role": "user", "content": "I love coding with AI!"},
    ],
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to("cuda")

outputs = model.generate(input_ids=inputs, max_new_tokens=64)
response = tokenizer.batch_decode(outputs)

Related Models

Downloads last month
32
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for marioparreno/emojify-dpo

Adapter
(1)
this model

Paper for marioparreno/emojify-dpo