--- license: apache-2.0 tags: - reinforcement-learning - ppo - robotics - sanskrit - paramtatva - mujoco - pybullet - dm-control - calvin - rlbench - metaworld language: - sa --- # R-PPO-SOTA — Robotics PPO Baseline Benchmarks **© 2026 ParamTatva.org — All Rights Reserved** State-of-the-art PPO baselines across 6 robotics benchmark suites, trained as part of the **Robotics track (R)** of the ParamTatva Resonance Language Model. ## Benchmark Results ### MuJoCo (Gymnasium v5) — 10M steps each | Environment | Best Return | Steps | |---|---|---| | HalfCheetah-v5 | **5,803.9** | 10M | | Walker2d-v5 | **4,918.5** | 10M | | Hopper-v5 | **3,183.2** | 10M | | Ant-v5 | **886.6** | 10M | | Humanoid-v5 | **573.8** | 10M | | Reacher-v5 | **-4.2** | 10M | ### MetaWorld (10 Tasks) — 500K steps each | Task | Best Return | Success Rate | |---|---|---| | drawer-close-v3 | **9.5** | 95% | | reach-v3 | **8.3** | 25% | | window-open-v3 | **8.0** | 95% | | drawer-open-v3 | **7.7** | 95% | | button-press-topdown-v3 | **6.1** | 95% | | window-close-v3 | **5.2** | 95% | | door-open-v3 | **4.8** | 25% | | push-v3 | **2.7** | 10% | | peg-insert-side-v3 | **1.7** | 0% | | pick-place-v3 | **0.4** | 5% | ### CALVIN (5 Tasks) — ~3M steps each | Task | Best Return | |---|---| | place-in-drawer | **-3.2** | | pick-up-block | **-3.6** | | turn-on-lightbulb | **-3.7** | | close-drawer | **-4.2** | | open-drawer | **-6.1** | ### DM Control Suite *(Running)* | Task | Best Return | |---|---| | finger-spin | **621.0** | | cartpole-swingup | **616.1** | | *(5 more tasks in progress on RTX 3090)* | | ### PyBullet *(Running on RTX 3090)* ### RLBench *(Pending — Docker container required)* ## Architecture Standard PPO with: - Orthogonal weight initialization - GAE (λ=0.95, γ=0.99) - Linear LR annealing - Gradient clipping (max norm 0.5) - Centralized observation/reward normalization (SubprocVecEnv) - Our proprietary encoder for Sanskrit-conditioned variants ## Hardware - **T4×4 GPU** (Google Cloud) — MuJoCo, MetaWorld, CALVIN - **RTX 3090** (Local) — DM Control, PyBullet, RLBench ## Citation ```bibtex @misc{paramtatva2026rpposota, title={R-PPO-SOTA: Robotics PPO Baselines}, author={ParamTatva.org}, year={2026}, url={https://huggingface.co/ParamTatva/R-PPO-SOTA} } ``` ## License Apache 2.0 — © 2026 ParamTatva.org