Spaces:

visamram02
/

VisamIntelli-Flash

Sleeping

visamram02 commited on Mar 16

Commit

6299d73

verified ·

1 Parent(s): 9ac4dcc

Upload folder using huggingface_hub

Files changed (1) hide show

app.py CHANGED Viewed

@@ -11,8 +11,10 @@ model_path = "model.gguf"
 print(f"Loading model from {model_path}...")
 llm = Llama(
     model_path=model_path,
-    n_ctx=4096,
-    n_threads=4,
     verbose=False
 )

 print(f"Loading model from {model_path}...")
 llm = Llama(
     model_path=model_path,
+    n_ctx=1024,      # Drastically reduced context size (saves memory/time on CPU)
+    n_threads=8,     # Maximize all available vCPUs
+    n_threads_batch=8, # Speed up prompt processing
+    n_batch=256,     # Optimize batch size for prompt evaluation
     verbose=False
 )