| | --- |
| | license: apache-2.0 |
| | tags: |
| | - reinforcement-learning |
| | - ppo |
| | - robotics |
| | - sanskrit |
| | - paramtatva |
| | - mujoco |
| | - pybullet |
| | - dm-control |
| | - calvin |
| | - rlbench |
| | - metaworld |
| | language: |
| | - sa |
| | --- |
| | |
| | # R-PPO-SOTA β Robotics PPO Baseline Benchmarks |
| |
|
| | **Β© 2026 ParamTatva.org β All Rights Reserved** |
| |
|
| | State-of-the-art PPO baselines across 6 robotics benchmark suites, trained as part of the **Robotics track (R)** of the ParamTatva Resonance Language Model. |
| |
|
| | ## Benchmark Results |
| |
|
| | ### MuJoCo (Gymnasium v5) β 10M steps each |
| | | Environment | Best Return | Steps | |
| | |---|---|---| |
| | | HalfCheetah-v5 | **5,803.9** | 10M | |
| | | Walker2d-v5 | **4,918.5** | 10M | |
| | | Hopper-v5 | **3,183.2** | 10M | |
| | | Ant-v5 | **886.6** | 10M | |
| | | Humanoid-v5 | **573.8** | 10M | |
| | | Reacher-v5 | **-4.2** | 10M | |
| |
|
| | ### MetaWorld (10 Tasks) β 500K steps each |
| | | Task | Best Return | Success Rate | |
| | |---|---|---| |
| | | drawer-close-v3 | **9.5** | 95% | |
| | | reach-v3 | **8.3** | 25% | |
| | | window-open-v3 | **8.0** | 95% | |
| | | drawer-open-v3 | **7.7** | 95% | |
| | | button-press-topdown-v3 | **6.1** | 95% | |
| | | window-close-v3 | **5.2** | 95% | |
| | | door-open-v3 | **4.8** | 25% | |
| | | push-v3 | **2.7** | 10% | |
| | | peg-insert-side-v3 | **1.7** | 0% | |
| | | pick-place-v3 | **0.4** | 5% | |
| |
|
| | ### CALVIN (5 Tasks) β ~3M steps each |
| | | Task | Best Return | |
| | |---|---| |
| | | place-in-drawer | **-3.2** | |
| | | pick-up-block | **-3.6** | |
| | | turn-on-lightbulb | **-3.7** | |
| | | close-drawer | **-4.2** | |
| | | open-drawer | **-6.1** | |
| |
|
| | ### DM Control Suite *(Running)* |
| | | Task | Best Return | |
| | |---|---| |
| | | finger-spin | **621.0** | |
| | | cartpole-swingup | **616.1** | |
| | | *(5 more tasks in progress on RTX 3090)* | | |
| |
|
| | ### PyBullet *(Running on RTX 3090)* |
| |
|
| | ### RLBench *(Pending β Docker container required)* |
| |
|
| | ## Architecture |
| |
|
| | Standard PPO with: |
| | - Orthogonal weight initialization |
| | - GAE (Ξ»=0.95, Ξ³=0.99) |
| | - Linear LR annealing |
| | - Gradient clipping (max norm 0.5) |
| | - Centralized observation/reward normalization (SubprocVecEnv) |
| | - Our proprietary encoder for Sanskrit-conditioned variants |
| |
|
| | ## Hardware |
| |
|
| | - **T4Γ4 GPU** (Google Cloud) β MuJoCo, MetaWorld, CALVIN |
| | - **RTX 3090** (Local) β DM Control, PyBullet, RLBench |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{paramtatva2026rpposota, |
| | title={R-PPO-SOTA: Robotics PPO Baselines}, |
| | author={ParamTatva.org}, |
| | year={2026}, |
| | url={https://huggingface.co/ParamTatva/R-PPO-SOTA} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache 2.0 β Β© 2026 ParamTatva.org |
| |
|