R-PPO-SOTA β€” Robotics PPO Baseline Benchmarks

Β© 2026 ParamTatva.org β€” All Rights Reserved

State-of-the-art PPO baselines across 6 robotics benchmark suites, trained as part of the Robotics track (R) of the ParamTatva Resonance Language Model.

Benchmark Results

MuJoCo (Gymnasium v5) β€” 10M steps each

Environment Best Return Steps
HalfCheetah-v5 5,803.9 10M
Walker2d-v5 4,918.5 10M
Hopper-v5 3,183.2 10M
Ant-v5 886.6 10M
Humanoid-v5 573.8 10M
Reacher-v5 -4.2 10M

MetaWorld (10 Tasks) β€” 500K steps each

Task Best Return Success Rate
drawer-close-v3 9.5 95%
reach-v3 8.3 25%
window-open-v3 8.0 95%
drawer-open-v3 7.7 95%
button-press-topdown-v3 6.1 95%
window-close-v3 5.2 95%
door-open-v3 4.8 25%
push-v3 2.7 10%
peg-insert-side-v3 1.7 0%
pick-place-v3 0.4 5%

CALVIN (5 Tasks) β€” ~3M steps each

Task Best Return
place-in-drawer -3.2
pick-up-block -3.6
turn-on-lightbulb -3.7
close-drawer -4.2
open-drawer -6.1

DM Control Suite (Running)

Task Best Return
finger-spin 621.0
cartpole-swingup 616.1
(5 more tasks in progress on RTX 3090)

PyBullet (Running on RTX 3090)

RLBench (Pending β€” Docker container required)

Architecture

Standard PPO with:

  • Orthogonal weight initialization
  • GAE (Ξ»=0.95, Ξ³=0.99)
  • Linear LR annealing
  • Gradient clipping (max norm 0.5)
  • Centralized observation/reward normalization (SubprocVecEnv)
  • Our proprietary encoder for Sanskrit-conditioned variants

Hardware

  • T4Γ—4 GPU (Google Cloud) β€” MuJoCo, MetaWorld, CALVIN
  • RTX 3090 (Local) β€” DM Control, PyBullet, RLBench

Citation

@misc{paramtatva2026rpposota,
  title={R-PPO-SOTA: Robotics PPO Baselines},
  author={ParamTatva.org},
  year={2026},
  url={https://huggingface.co/ParamTatva/R-PPO-SOTA}
}

License

Apache 2.0 β€” Β© 2026 ParamTatva.org

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results