How to limit the thinking duration?

#38
by Spankyboy - opened

Hi, how can I limit the duration of the thinking?
I tried this:

reasoning: {
"enabled": reasoningEnabled,
"effort": "low",
"exclude": !reasoningEnabled,
"max_tokens": 1000
}

(And yes I used first both variables, which caused an error. But then used only one, effort or max tokens)
One time with effort and one time with max tokens. It ignored both.
For a 2.000 tokens prompt did it think for over 4.900 tokens until I stopped it.

Any way how I can set a limiter, that small texts doesn't make me poor? (I pay per token, so it costs me all money 🥹)

Sign up or log in to comment