-
-
-
-
-
-
Inference Providers
Active filters:
ppo
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-100
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-20
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-40
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-60
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-100
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3
Reinforcement Learning
•
1B
•
Updated
•
1
CatkinChen/nethack-ppo-ablation-no_hmm_rnd
Reinforcement Learning
•
Updated
CatkinChen/nethack-ppo-ablation-baseline_curiosity_dyn_only
Reinforcement Learning
•
Updated
joigalcar/ppo-LunarLander-v2_Scratch
Reinforcement Learning
•
Updated
joigalcar/ppo-LunarLander-v2_Scratch_2
Reinforcement Learning
•
Updated
rishiad/kinitro-metaworld-agent
Reinforcement Learning
•
Updated
CatkinChen/nethack-ppo-ablation-baseline_rnd
Reinforcement Learning
•
Updated
CatkinChen/nethack-ppo-ablation-baseline_curiosity_skill_only
Reinforcement Learning
•
Updated
CatkinChen/nethack-ppo-ablation-baseline_curiosity_trans_only
Reinforcement Learning
•
Updated
OxoGhost/ppo-LunarLander-v2-PPO
Reinforcement Learning
•
Updated
WillLedd/ppoCartPoleFromScratch
Reinforcement Learning
•
Updated
nabeelshan/rlhf-gpt2-pipeline
Text Generation
•
Updated
CatkinChen/nethack-ppo-ablation-baseline_full_curiosity
Reinforcement Learning
•
Updated
WillLedd/PPO-CleanRL-LunarLander-v2
Reinforcement Learning
•
Updated
tstenborg/unit8-LunarLander-v2
Reinforcement Learning
•
Updated
timflash/ppo-LunarLander-v2
Reinforcement Learning
•
Updated
forgedRice/ppo-LunarLander-v2
Reinforcement Learning
•
Updated
•
4
forgedRice/drl-course-unit-01-lunar-lander-v2
Reinforcement Learning
•
Updated
user05181824/ppo-LunarLander-v3
Reinforcement Learning
•
Updated
Text Generation
•
0.1B
•
Updated
•
1
ricardo-teixeira9/ppo-LunarLander-v2_unit8
Reinforcement Learning
•
Updated
Bavantha11/LunarLander-v2-unit8
Reinforcement Learning
•
Updated