R-PPO-SOTA — Robotics PPO Baseline Benchmarks

State-of-the-art PPO baselines across 6 robotics benchmark suites, trained as part of the Robotics track (R) of the ParamTatva Resonance Language Model.

Benchmark Results

MuJoCo (Gymnasium v5) — 10M steps each

Environment	Best Return	Steps
HalfCheetah-v5	5,803.9	10M
Walker2d-v5	4,918.5	10M
Hopper-v5	3,183.2	10M
Ant-v5	886.6	10M
Humanoid-v5	573.8	10M
Reacher-v5	-4.2	10M

MetaWorld (10 Tasks) — 500K steps each

Task	Best Return	Success Rate
drawer-close-v3	9.5	95%
reach-v3	8.3	25%
window-open-v3	8.0	95%
drawer-open-v3	7.7	95%
button-press-topdown-v3	6.1	95%
window-close-v3	5.2	95%
door-open-v3	4.8	25%
push-v3	2.7	10%
peg-insert-side-v3	1.7	0%
pick-place-v3	0.4	5%

CALVIN (5 Tasks) — ~3M steps each

Task	Best Return
place-in-drawer	-3.2
pick-up-block	-3.6
turn-on-lightbulb	-3.7
close-drawer	-4.2
open-drawer	-6.1

DM Control Suite (Running)

Task	Best Return
finger-spin	621.0
cartpole-swingup	616.1
(5 more tasks in progress on RTX 3090)

PyBullet (Running on RTX 3090)

RLBench (Pending — Docker container required)

Architecture

Standard PPO with:

Orthogonal weight initialization
GAE (λ=0.95, γ=0.99)
Linear LR annealing
Gradient clipping (max norm 0.5)
Centralized observation/reward normalization (SubprocVecEnv)
Our proprietary encoder for Sanskrit-conditioned variants

Hardware

T4×4 GPU (Google Cloud) — MuJoCo, MetaWorld, CALVIN
RTX 3090 (Local) — DM Control, PyBullet, RLBench

Citation

@misc{paramtatva2026rpposota,
  title={R-PPO-SOTA: Robotics PPO Baselines},
  author={ParamTatva.org},
  year={2026},
  url={https://huggingface.co/ParamTatva/R-PPO-SOTA}
}

License

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

Place In Drawer on ParamTatva/calvin-benchmark View evaluation results

-3.2
Pick Up Block on ParamTatva/calvin-benchmark View evaluation results

-3.6
Turn On Lightbulb on ParamTatva/calvin-benchmark View evaluation results

-3.7
Close Drawer on ParamTatva/calvin-benchmark View evaluation results

-4.2
Open Drawer on ParamTatva/calvin-benchmark View evaluation results

-6.1
Button Press Topdown V3 on ParamTatva/metaworld-benchmark View evaluation results

6.1
Drawer Close V3 on ParamTatva/metaworld-benchmark View evaluation results

9.5
Drawer Open V3 on ParamTatva/metaworld-benchmark View evaluation results

7.7