Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

preference-learning

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

180

Base only

Active filters: preference-learning

usama10/qwen-7b-reward-model

Text Classification • Updated Mar 23

yinuoy/MR2-Molmo2-4B-RM

Image-Text-to-Text • 5B • Updated Apr 24 • 144 • 1

usama10/grpo-tax-qwen-1.5b-dpo

Text Generation • Updated Mar 30

usama10/grpo-tax-qwen-3b-dpo

Text Generation • Updated Mar 30

czyarl/CogFlow-RM

Text Generation • 8B • Updated Apr 27 • 4

HwanChang0106/Mistral-7B-v0.1-ARC-SFT-SimPER

Text Generation • 7B • Updated Apr 27 • 2

rafiakedir/tenacious-bench-adapter

0.9B • Updated May 4 • 5

ChenLingD/Psy-Qwen-DPO-LoRA

Text Generation • Updated Apr 29 • 2

mistire37/tenacious-bench-lora-adapter

JeffCheng12138/qwen3-8b-dpo-ultrafeedback-zh

Updated May 2 • 2

Chalie-lijalem/tenacious-orpo-qwen3-4b

cartgr/embeddings-for-preferences-st5-xl

Sentence Similarity • 1B • Updated May 13 • 953 • 2

hiepphambk/lab22-dpo-vn

Text Generation • 3B • Updated May 8 • 4

datnguyennn/day22-dpo-alignment

Text Generation • Updated May 8

golfoscar/mistral-7b-v0.1-nca-arc

Text Generation • 7B • Updated May 10 • 10

golfoscar/mistral-7b-v0.1-nca-arc-lora

Text Generation • Updated May 10 • 4

Joni-121/gemma-3-1b-it-reasoning-grpo-lora-F16-GGUF

Text Generation • 52.2M • Updated May 13 • 66

caffeic/tinystarcoder-reward-tldr

Summarization • 0.2B • Updated May 27 • 4

ba144220/cs224r-default-project-ipo

Text Generation • 0.5B • Updated May 28

harrrshall/natscore-small-v0

Text-to-Speech • Updated May 29 • 3 • 1

Shaheer05/qwen3-0.6b-dpo-ultrafeedback

Updated May 30 • 2

krishy-d/prosify_qwen_1.5b_lora

Text Generation • Updated 25 days ago • 43

TuneJury/tunejury

Audio Classification • Updated 12 days ago • 500 • 1

yavuz-ai/qwen2.5-3b-dpo-finqa

Text Generation • Updated 12 days ago • 28

Hriday75/qwen2.5-3b-cardio-dpo-aligned

Text Generation • Updated 10 days ago • 60

Hriday75/qwen2.5-3b-oncology-dpo-aligned

Text Generation • Updated 10 days ago • 72

Hriday75/qwen2.5-3b-infectious-disease-dpo-aligned

Text Generation • Updated 10 days ago • 62

Jazhyc/Llama-3.1-8B-aims-le-dpo

Text Generation • Updated 9 days ago • 32

mradermacher/Finch-8B-KTO-GGUF

8B • Updated 1 day ago

mradermacher/Finch-8B-KTO-i1-GGUF

8B • Updated 1 day ago