Add Metal rotary embedding kernel matching vLLM interface 949658a robtaylor-chipflow commited on Mar 2