Mist-ic commited on
Commit
ff0696e
·
1 Parent(s): 14cf714

Increase max_tokens to 4000 for thinking model compatibility

Browse files

Thinking models (Gemini 3.1 Pro, o3-mini) consume tokens internally
for reasoning before generating output. max_tokens=200 caused content=None.
4000 ensures both thought budget and response fit within the limit.

Files changed (1) hide show
  1. inference.py +3 -1
inference.py CHANGED
@@ -111,8 +111,10 @@ def _call_llm(
111
  model=provider["model"],
112
  messages=messages,
113
  temperature=0.2,
114
- max_tokens=200,
115
  )
 
 
116
  return completion.choices[0].message.content or ""
117
  except Exception as e:
118
  last_err = e
 
111
  model=provider["model"],
112
  messages=messages,
113
  temperature=0.2,
114
+ max_tokens=4000, # Thinking models (Gemini 3.1 Pro, o3) use tokens for reasoning
115
  )
116
+ # content can be None for thinking models if limit was too low;
117
+ # 4000 ensures thinking budget + response both fit
118
  return completion.choices[0].message.content or ""
119
  except Exception as e:
120
  last_err = e