R-PPO-SOTA β Robotics PPO Baseline Benchmarks
Β© 2026 ParamTatva.org β All Rights Reserved
State-of-the-art PPO baselines across 6 robotics benchmark suites, trained as part of the Robotics track (R) of the ParamTatva Resonance Language Model.
Benchmark Results
MuJoCo (Gymnasium v5) β 10M steps each
| Environment | Best Return | Steps |
|---|---|---|
| HalfCheetah-v5 | 5,803.9 | 10M |
| Walker2d-v5 | 4,918.5 | 10M |
| Hopper-v5 | 3,183.2 | 10M |
| Ant-v5 | 886.6 | 10M |
| Humanoid-v5 | 573.8 | 10M |
| Reacher-v5 | -4.2 | 10M |
MetaWorld (10 Tasks) β 500K steps each
| Task | Best Return | Success Rate |
|---|---|---|
| drawer-close-v3 | 9.5 | 95% |
| reach-v3 | 8.3 | 25% |
| window-open-v3 | 8.0 | 95% |
| drawer-open-v3 | 7.7 | 95% |
| button-press-topdown-v3 | 6.1 | 95% |
| window-close-v3 | 5.2 | 95% |
| door-open-v3 | 4.8 | 25% |
| push-v3 | 2.7 | 10% |
| peg-insert-side-v3 | 1.7 | 0% |
| pick-place-v3 | 0.4 | 5% |
CALVIN (5 Tasks) β ~3M steps each
| Task | Best Return |
|---|---|
| place-in-drawer | -3.2 |
| pick-up-block | -3.6 |
| turn-on-lightbulb | -3.7 |
| close-drawer | -4.2 |
| open-drawer | -6.1 |
DM Control Suite (Running)
| Task | Best Return |
|---|---|
| finger-spin | 621.0 |
| cartpole-swingup | 616.1 |
| (5 more tasks in progress on RTX 3090) |
PyBullet (Running on RTX 3090)
RLBench (Pending β Docker container required)
Architecture
Standard PPO with:
- Orthogonal weight initialization
- GAE (Ξ»=0.95, Ξ³=0.99)
- Linear LR annealing
- Gradient clipping (max norm 0.5)
- Centralized observation/reward normalization (SubprocVecEnv)
- Our proprietary encoder for Sanskrit-conditioned variants
Hardware
- T4Γ4 GPU (Google Cloud) β MuJoCo, MetaWorld, CALVIN
- RTX 3090 (Local) β DM Control, PyBullet, RLBench
Citation
@misc{paramtatva2026rpposota,
title={R-PPO-SOTA: Robotics PPO Baselines},
author={ParamTatva.org},
year={2026},
url={https://huggingface.co/ParamTatva/R-PPO-SOTA}
}
License
Apache 2.0 β Β© 2026 ParamTatva.org
Evaluation results
- Place In Drawer
on
ParamTatva/calvin-benchmark
-3.200
- Pick Up Block
on
ParamTatva/calvin-benchmark
-3.600
- Turn On Lightbulb
on
ParamTatva/calvin-benchmark
-3.700
- Close Drawer
on
ParamTatva/calvin-benchmark
-4.200
- Open Drawer
on
ParamTatva/calvin-benchmark
-6.100
- Button Press Topdown V3
on
ParamTatva/metaworld-benchmark
6.100
- Drawer Close V3
on
ParamTatva/metaworld-benchmark
9.500
- Drawer Open V3
on
ParamTatva/metaworld-benchmark
7.700
- Window Open V3
on
ParamTatva/metaworld-benchmark
8.000
- Window Close V3
on
ParamTatva/metaworld-benchmark
5.200