Inference Providers
Active filters: verl
DATEXIS/DeepICD-R1-Llama-8B
Text Generation
• 8B • Updated • 419
• 1
mradermacher/DeepICD-R1-Llama-8B-GGUF
8B • Updated • 527
• 1
mradermacher/DeepICD-R1-Llama-8B-i1-GGUF
8B • Updated • 3.39k
• 1
junnyu/Qwen2.5-7B-Instruct-1M-GRPO_logic_KK_5PPL
Text Generation
• 8B • Updated • 3
sonyashijin/qwen3-32b-verilog-lora
LichengLiu03/Qwen2.5-3B-UFO
Text Generation
• 3B • Updated • 3
• 2
LichengLiu03/Qwen2.5-3B-UFO-1turn
Text Generation
• 3B • Updated • 1
• 2
mradermacher/Qwen2.5-3B-UFO-GGUF
3B • Updated • 73
• 1
mradermacher/Qwen2.5-3B-UFO-1turn-GGUF
3B • Updated • 30
• 1
Text Generation
• 0.6B • Updated • 4
• 2
Jasaxion/MathSmith-HC-Problem-Synthesizer-Qwen3-8B
Text Generation
• 8B • Updated • 22
• 1
Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B
Text Generation
• 8B • Updated • 30
• 1
thejaminator/grpo-feature-vector-step-1
Text Generation
• 8B • Updated • 428
• 9
orbit-ai/orbit-4b-ablation-training-mix-124-v0.1
Text Generation
• 4B • Updated • 24
Text Generation
• 4B • Updated • 83
Text Generation
• Updated karthik/verl-qwen2.5-0.5b-gsm8k-ppo-step360
Text Generation
• 0.5B • Updated samhitha2601/llama3.2-3b-ppo
Reinforcement Learning
• Updated samhitha2601/llama3.2-3b-ppo-critic
Reinforcement Learning
• Updated mradermacher/MathSmith-HC-Problem-Synthesizer-Qwen3-8B-GGUF
8B • Updated • 113
• 1
mradermacher/MathSmith-HC-Problem-Synthesizer-Qwen3-8B-i1-GGUF
8B • Updated • 254
• 1
mradermacher/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B-GGUF
8B • Updated • 116
• 1
mradermacher/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B-i1-GGUF
8B • Updated • 140
• 1
Time-HD-Anonymous/STReasoner-8B
Feature Extraction
• 8B • Updated • 1
archit11/qwen2.5-coder-3b-verl-track-a-lora
Text Generation
• Updated • 2
orbit-ai/infoseeker-repro-4b
Text Generation
• 4B • Updated • 102
mradermacher/infoseeker-repro-4b-GGUF
4B • Updated • 604
mradermacher/orbit-4b-v0.1-GGUF
4B • Updated • 510