hermes3-llama-cpp

Running

Jodaro commited on about 22 hours ago

Commit

e6b8d52

verified ·

1 Parent(s): 398f222

Reduce max_new_tokens to 64 for faster replies

Files changed (1) hide show

app.py CHANGED Viewed

@@ -20,7 +20,7 @@ def respond(message: str, history: list[tuple[str, str]]):
     out = llm(
         prompt,
-        max_new_tokens=256,
         temperature=0.7,
         top_p=0.9,
     )

     out = llm(
         prompt,
+        max_new_tokens=64,
         temperature=0.7,
         top_p=0.9,
     )