SO-101 single-arm pick-orange-and-place benchmark — same task, many policy families (strict 20-round eval).