can we control the thinking to n tokens or turn off if needed ?
Not yet. We plan to release a 'Flash' version that matches the latency of the instruct model while retaining current capabilities.
· Sign up or log in to comment