NAKA β€” Code-as-Policy Robot Executor

Part of the ANIMA Robotics Intelligence Suite by Robot Flow Labs

NAKA is a Code-as-Policy executor that uses LLMs to generate Python code for robot manipulation. Instead of fixed policies, the robot writes its own programs β€” composing perception, planning, and control primitives to solve novel tasks.

Results β€” Exceeds Human Expert

Task S1 (N=50) S2 (N=50) M1 (N=15) Paper Best Human Expert
Cube Lift 90% 100% 100% 45% 93%
Cube Stack 72% 80% 93% 30% 73%
  • cube_lift S2 = 100% β€” surpasses human expert (93%) without any RL training
  • Exceeds CaP-X paper (NVIDIA, UC Berkeley, Stanford, CMU) by 2x on all configurations
  • Zero-shot code generation with MiniMax M2.7

Paper

CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation Max Fu, Justin Yu, Karim El-Refai, Ethan Kou, Haoru Xue, et al. NVIDIA, UC Berkeley, Stanford, Carnegie Mellon University arXiv 2603.22435

Architecture

User: "Pick up the red cube and lift it"
  β†’ ANIMA Compiler β†’ NAKA Module
  β†’ LLM generates Python code:
      pos, quat = sample_grasp_pose("red cube")
      open_gripper()
      goto_pose(pos, quat, z_approach=0.1)
      close_gripper()
      lift = pos + np.array([0, 0, 0.25])
      goto_pose(lift, quat)
  β†’ Code executes on real robot (MuJoCo sim or Franka Panda)
  β†’ Reward = 1.0 (task completed)

7 Benchmark Tasks

  1. Cube Lift β€” single arm pick and lift
  2. Cube Stack β€” pick and stack one cube on another
  3. Spill Wipe β€” wipe table surface with attached sponge
  4. Peg Insertion β€” precision assembly (nut on peg)
  5. Cube Re-stack β€” bimanual reordering
  6. Two-Arm Lift β€” bimanual coordinated lift
  7. Two-Arm Handover β€” bimanual object transfer

Key Innovations

  1. Prompt Engineering > RL Training β€” our system prompt with API examples achieves human-level without any model fine-tuning
  2. OSC_POSE Controller β€” delta-based arm control, more reliable than joint-position control
  3. Fuzzy Object Matching β€” "red cube" matches MuJoCo body "cube_main" automatically
  4. Multi-Turn with Reward Feedback β€” agent sees reward + completion status, retries on failure
  5. Parallel Ensemble β€” MiniMax M2.7 + GLM 5.1 queried in parallel for robust code synthesis

Files

Path Description
paper.pdf CaP-X paper (arXiv 2603.22435)
BENCHMARK_REPORT.md Full benchmark results with paper comparison
TRAINING_REPORT.md Comprehensive training and infrastructure report
configs/ Training configurations (GRPO, debug)
logs/ Benchmark metrics (JSON)
code/ Key source files for reproducibility

GRPO RL Training (Ready)

NAKA includes a GRPO (Group Relative Policy Optimization) trainer for fine-tuning Qwen2.5-7B-Instruct on the CaP-Gym tasks. The training loop is verified but full training requires dedicated GPU (RTX 6000 Pro).

Expected results from paper: Qwen 7B goes from 25% β†’ 80% on cube_lift after 50 GRPO iterations.

Docker

docker build -f Dockerfile.sim -t naka-sim:latest .
docker run --gpus '"device=0"' --network host naka-sim:latest
# Open browser: http://localhost:8090/

ANIMA Integration

NAKA is a module in the ANIMA robotics compiler. When ANIMA encounters a novel task that can't be solved by fixed pipelines, it routes to NAKA β€” which generates custom Python code composing other ANIMA modules (perception, planning, control).

License

MIT β€” Robot Flow Labs / AIFLOW LABS LIMITED

Citation

@article{fu2026capx,
  title={CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation},
  author={Fu, Max and Yu, Justin and El-Refai, Karim and Kou, Ethan and Xue, Haoru and others},
  journal={arXiv preprint arXiv:2603.22435},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Paper for ilessio-aiflowlab/project_naka