Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

3,206

Base only

Active filters: ppo

baek26/billsum_4768_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 2

baek26/dialogsum_9789_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 2

baek26/billsum_6121_bart-billsum

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 3

baek26/bart-dialogsum-oracle

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 1

baek26/billsum_1703_bart-billsum

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 1

joen2010/ppo-CartPole-v1

Reinforcement Learning • Updated Apr 17, 2024

baek26/bart-billsum-oracle

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 1

baek26/cnn_dailymail_6849_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 18, 2024 • 1

baek26/cnn_dailymail_886_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 18, 2024 • 3

baek26/cnn_dailymail_7952_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 18, 2024 • 1

baek26/cnn_dailymail_4520_bart-cnndm

Reinforcement Learning • 0.1B • Updated Apr 19, 2024 • 5

baek26/cnn_dailymail_3418_bart-cnndm

Reinforcement Learning • 0.1B • Updated Apr 19, 2024 • 3

pkbiswas/Phi-1_5-Detoxified-PPO-LoRa

Reinforcement Learning • Updated Apr 20, 2024 • 3

PranavBP525/phi-2-storygen-rlGPTf

Reinforcement Learning • Updated Apr 21, 2024 • 2

baek26/all_5483_all_8657_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 21, 2024 • 1

baek26/all_9991_all_8657_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 21, 2024 • 2

baek26/all_9006_all_8657_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 21, 2024 • 3

baek26/all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 22, 2024 • 2

lzacchini/ppo-LunarLander-v2

Reinforcement Learning • Updated May 10, 2024 • 2

conlan/ppo-LunarLander-v3

Reinforcement Learning • Updated Apr 22, 2024

MLIsaac/ppo_from_scratch-LunarLander-v2

Reinforcement Learning • Updated Apr 22, 2024

IrwinD/log_sage_ppo_model

Summarization • 0.2B • Updated Apr 26, 2024 • 8

phoenixaiden33/PPO-LunarLander-v2

Reinforcement Learning • Updated Apr 23, 2024

PranavBP525/phi-2-storygen-rlhf

Reinforcement Learning • Updated Apr 24, 2024 • 4

jiaqianwu/ppo-CartPole-v1

Reinforcement Learning • Updated Apr 24, 2024

SparkleDark/PPO_cart

Reinforcement Learning • Updated Apr 24, 2024

jeliasherrero/LunarLander-v2

Reinforcement Learning • Updated Apr 24, 2024

tarpalsus/LunarLander-v2

Reinforcement Learning • Updated Apr 25, 2024

hossniper/SPPO-LunarLander-v2

Reinforcement Learning • Updated Apr 28, 2024

HusseinEid/ppo-LunarLander-v2-from-scratch

Reinforcement Learning • Updated Apr 28, 2024