Tanmay Jain

tanmayyyj

1 1

·

AI & ML interests

Computer Vision, NLP, Reinforcement Learning

Organizations

spaces 2

Reward Fn Inference

Reward Fn Trainer

Fine‑tune a language model with DPO using your dataset

models 13

tanmayyyj/ministral-8b-reward-fn-dpo

tanmayyyj/ministral-8b-reward-fn-sft

tanmayyyj/Qwen-3-math-reasoning

Updated Jun 2, 2025

tanmayyyj/dqn-SpaceInvadersNoFrameskip-v4

Reinforcement Learning • Updated Jul 4, 2023

tanmayyyj/Taxi-v3

Updated Jun 30, 2023

tanmayyyj/Taxi_v3

Reinforcement Learning • Updated Jun 30, 2023

tanmayyyj/q-FrozenLake-v2-4x4-Non_Slippery

Reinforcement Learning • Updated Jun 30, 2023 • 1

tanmayyyj/ppo-PyramidsRND

Reinforcement Learning • Updated Jun 20, 2023

tanmayyyj/ppo-SnowballTarget

Reinforcement Learning • Updated Jun 20, 2023

tanmayyyj/Cartpole-v1

Reinforcement Learning • Updated Jun 19, 2023

datasets 2

tanmayyyj/reward-fn-dpo

Viewer • Updated Feb 28 • 161 • 5

tanmayyyj/reward-fn-sft

Viewer • Updated Feb 28 • 33 • 5