Reduce thinking model max_new_tokens to fix slow inference 0699c5f KinetoLabs Claude Opus 4.5 commited on 3 days ago