R-PPO-SOTA / README.md
prabhatkr's picture
Upload README.md with huggingface_hub
99ba31a verified
---
license: apache-2.0
tags:
- reinforcement-learning
- ppo
- robotics
- sanskrit
- paramtatva
- mujoco
- pybullet
- dm-control
- calvin
- rlbench
- metaworld
language:
- sa
---
# R-PPO-SOTA β€” Robotics PPO Baseline Benchmarks
**Β© 2026 ParamTatva.org β€” All Rights Reserved**
State-of-the-art PPO baselines across 6 robotics benchmark suites, trained as part of the **Robotics track (R)** of the ParamTatva Resonance Language Model.
## Benchmark Results
### MuJoCo (Gymnasium v5) β€” 10M steps each
| Environment | Best Return | Steps |
|---|---|---|
| HalfCheetah-v5 | **5,803.9** | 10M |
| Walker2d-v5 | **4,918.5** | 10M |
| Hopper-v5 | **3,183.2** | 10M |
| Ant-v5 | **886.6** | 10M |
| Humanoid-v5 | **573.8** | 10M |
| Reacher-v5 | **-4.2** | 10M |
### MetaWorld (10 Tasks) β€” 500K steps each
| Task | Best Return | Success Rate |
|---|---|---|
| drawer-close-v3 | **9.5** | 95% |
| reach-v3 | **8.3** | 25% |
| window-open-v3 | **8.0** | 95% |
| drawer-open-v3 | **7.7** | 95% |
| button-press-topdown-v3 | **6.1** | 95% |
| window-close-v3 | **5.2** | 95% |
| door-open-v3 | **4.8** | 25% |
| push-v3 | **2.7** | 10% |
| peg-insert-side-v3 | **1.7** | 0% |
| pick-place-v3 | **0.4** | 5% |
### CALVIN (5 Tasks) β€” ~3M steps each
| Task | Best Return |
|---|---|
| place-in-drawer | **-3.2** |
| pick-up-block | **-3.6** |
| turn-on-lightbulb | **-3.7** |
| close-drawer | **-4.2** |
| open-drawer | **-6.1** |
### DM Control Suite *(Running)*
| Task | Best Return |
|---|---|
| finger-spin | **621.0** |
| cartpole-swingup | **616.1** |
| *(5 more tasks in progress on RTX 3090)* | |
### PyBullet *(Running on RTX 3090)*
### RLBench *(Pending β€” Docker container required)*
## Architecture
Standard PPO with:
- Orthogonal weight initialization
- GAE (Ξ»=0.95, Ξ³=0.99)
- Linear LR annealing
- Gradient clipping (max norm 0.5)
- Centralized observation/reward normalization (SubprocVecEnv)
- Our proprietary encoder for Sanskrit-conditioned variants
## Hardware
- **T4Γ—4 GPU** (Google Cloud) β€” MuJoCo, MetaWorld, CALVIN
- **RTX 3090** (Local) β€” DM Control, PyBullet, RLBench
## Citation
```bibtex
@misc{paramtatva2026rpposota,
title={R-PPO-SOTA: Robotics PPO Baselines},
author={ParamTatva.org},
year={2026},
url={https://huggingface.co/ParamTatva/R-PPO-SOTA}
}
```
## License
Apache 2.0 β€” Β© 2026 ParamTatva.org