Model

Developed by: DARJYO
Base Type: Fine-tuned language model
Finetuned model : persadian_14B-GRPO
Base Architecture: Transformer-based/Phi-4

This model is fine-tuned on datasets for tasks with Unsloth and Huggingface's TRL library. It is based on the unsloth/Phi-4 model and uses reinforcement learning for improved performance.

Downloads last month: 1

GGUF

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Video Preview

Reinforcement Learning

Model tree for DARJYO/persadian_14B-GRPO

Base model

microsoft/phi-4

Finetuned

unsloth/phi-4

Quantized

(30)

this model

Quantizations

1 model