---
license: apache-2.0
tags:
  - reinforcement-learning
  - ppo
  - robotics
  - sanskrit
  - paramtatva
  - mujoco
  - pybullet
  - dm-control
  - calvin
  - rlbench
  - metaworld
language:
  - sa
---

# R-PPO-SOTA — Robotics PPO Baseline Benchmarks

**© 2026 ParamTatva.org — All Rights Reserved**

State-of-the-art PPO baselines across 6 robotics benchmark suites, trained as part of the **Robotics track (R)** of the ParamTatva Resonance Language Model.

## Benchmark Results

### MuJoCo (Gymnasium v5) — 10M steps each
| Environment | Best Return | Steps |
|---|---|---|
| HalfCheetah-v5 | **5,803.9** | 10M |
| Walker2d-v5 | **4,918.5** | 10M |
| Hopper-v5 | **3,183.2** | 10M |
| Ant-v5 | **886.6** | 10M |
| Humanoid-v5 | **573.8** | 10M |
| Reacher-v5 | **-4.2** | 10M |

### MetaWorld (10 Tasks) — 500K steps each
| Task | Best Return | Success Rate |
|---|---|---|
| drawer-close-v3 | **9.5** | 95% |
| reach-v3 | **8.3** | 25% |
| window-open-v3 | **8.0** | 95% |
| drawer-open-v3 | **7.7** | 95% |
| button-press-topdown-v3 | **6.1** | 95% |
| window-close-v3 | **5.2** | 95% |
| door-open-v3 | **4.8** | 25% |
| push-v3 | **2.7** | 10% |
| peg-insert-side-v3 | **1.7** | 0% |
| pick-place-v3 | **0.4** | 5% |

### CALVIN (5 Tasks) — ~3M steps each
| Task | Best Return |
|---|---|
| place-in-drawer | **-3.2** |
| pick-up-block | **-3.6** |
| turn-on-lightbulb | **-3.7** |
| close-drawer | **-4.2** |
| open-drawer | **-6.1** |

### DM Control Suite *(Running)*
| Task | Best Return |
|---|---|
| finger-spin | **621.0** |
| cartpole-swingup | **616.1** |
| *(5 more tasks in progress on RTX 3090)* | |

### PyBullet *(Running on RTX 3090)*

### RLBench *(Pending — Docker container required)*

## Architecture

Standard PPO with:
- Orthogonal weight initialization
- GAE (λ=0.95, γ=0.99)
- Linear LR annealing
- Gradient clipping (max norm 0.5)
- Centralized observation/reward normalization (SubprocVecEnv)
- Our proprietary encoder for Sanskrit-conditioned variants

## Hardware

- **T4×4 GPU** (Google Cloud) — MuJoCo, MetaWorld, CALVIN
- **RTX 3090** (Local) — DM Control, PyBullet, RLBench

## Citation

```bibtex
@misc{paramtatva2026rpposota,
  title={R-PPO-SOTA: Robotics PPO Baselines},
  author={ParamTatva.org},
  year={2026},
  url={https://huggingface.co/ParamTatva/R-PPO-SOTA}
}
```

## License

Apache 2.0 — © 2026 ParamTatva.org