NAKA — Code-as-Policy Robot Executor

Part of the ANIMA Robotics Intelligence Suite by Robot Flow Labs

NAKA is a Code-as-Policy executor that uses LLMs to generate Python code for robot manipulation. Instead of fixed policies, the robot writes its own programs — composing perception, planning, and control primitives to solve novel tasks.

Results — Exceeds Human Expert

Task	S1 (N=50)	S2 (N=50)	M1 (N=15)	Paper Best	Human Expert
Cube Lift	90%	100%	100%	45%	93%
Cube Stack	72%	80%	93%	30%	73%

cube_lift S2 = 100% — surpasses human expert (93%) without any RL training
Exceeds CaP-X paper (NVIDIA, UC Berkeley, Stanford, CMU) by 2x on all configurations
Zero-shot code generation with MiniMax M2.7

Paper

CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation Max Fu, Justin Yu, Karim El-Refai, Ethan Kou, Haoru Xue, et al. NVIDIA, UC Berkeley, Stanford, Carnegie Mellon University arXiv 2603.22435

Architecture

User: "Pick up the red cube and lift it"
  → ANIMA Compiler → NAKA Module
  → LLM generates Python code:
      pos, quat = sample_grasp_pose("red cube")
      open_gripper()
      goto_pose(pos, quat, z_approach=0.1)
      close_gripper()
      lift = pos + np.array([0, 0, 0.25])
      goto_pose(lift, quat)
  → Code executes on real robot (MuJoCo sim or Franka Panda)
  → Reward = 1.0 (task completed)

7 Benchmark Tasks

Cube Lift — single arm pick and lift
Cube Stack — pick and stack one cube on another
Spill Wipe — wipe table surface with attached sponge
Peg Insertion — precision assembly (nut on peg)
Cube Re-stack — bimanual reordering
Two-Arm Lift — bimanual coordinated lift
Two-Arm Handover — bimanual object transfer

Key Innovations

Prompt Engineering > RL Training — our system prompt with API examples achieves human-level without any model fine-tuning
OSC_POSE Controller — delta-based arm control, more reliable than joint-position control
Fuzzy Object Matching — "red cube" matches MuJoCo body "cube_main" automatically
Multi-Turn with Reward Feedback — agent sees reward + completion status, retries on failure
Parallel Ensemble — MiniMax M2.7 + GLM 5.1 queried in parallel for robust code synthesis

Files

Path	Description
`paper.pdf`	CaP-X paper (arXiv 2603.22435)
`BENCHMARK_REPORT.md`	Full benchmark results with paper comparison
`TRAINING_REPORT.md`	Comprehensive training and infrastructure report
`configs/`	Training configurations (GRPO, debug)
`logs/`	Benchmark metrics (JSON)
`code/`	Key source files for reproducibility

GRPO RL Training (Ready)

NAKA includes a GRPO (Group Relative Policy Optimization) trainer for fine-tuning Qwen2.5-7B-Instruct on the CaP-Gym tasks. The training loop is verified but full training requires dedicated GPU (RTX 6000 Pro).

Expected results from paper: Qwen 7B goes from 25% → 80% on cube_lift after 50 GRPO iterations.

Docker

docker build -f Dockerfile.sim -t naka-sim:latest .
docker run --gpus '"device=0"' --network host naka-sim:latest
# Open browser: http://localhost:8090/

ANIMA Integration

NAKA is a module in the ANIMA robotics compiler. When ANIMA encounters a novel task that can't be solved by fixed pipelines, it routes to NAKA — which generates custom Python code composing other ANIMA modules (perception, planning, control).

License

MIT — Robot Flow Labs / AIFLOW LABS LIMITED

Citation

@article{fu2026capx,
  title={CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation},
  author={Fu, Max and Yu, Justin and El-Refai, Karim and Kou, Ethan and Xue, Haoru and others},
  journal={arXiv preprint arXiv:2603.22435},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Paper for ilessio-aiflowlab/project_naka

CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation

Paper • 2603.22435 • Published Mar 23