Spaces:

tech-daskalos
/

CyberSecChatbot

Paused

Andrew McCracken Claude commited on Oct 13, 2025

Commit

3f2ee19

1 Parent(s): 2a55dc3

Increase threads to 8 for faster inference

- Use all 8 vCPUs for maximum inference speed
- Should reduce response time from ~15s to ~10-12s

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show

llm_handler.py +1 -1

llm_handler.py CHANGED Viewed

@@ -53,7 +53,7 @@ class CybersecurityLLM:
             model_path=model_path,
             n_ctx=4096,  # Context window
             n_batch=512,  # Batch size for prompt processing
-            n_threads=6,  # Use 6 of 8 vCPUs (leave 2 for system/API)
             n_gpu_layers=0,  # CPU only
             seed=-1,  # Random seed
             f16_kv=True,  # Use f16 for key/value cache (saves memory)

             model_path=model_path,
             n_ctx=4096,  # Context window
             n_batch=512,  # Batch size for prompt processing
+            n_threads=8,  # Use all 8 vCPUs for maximum inference speed
             n_gpu_layers=0,  # CPU only
             seed=-1,  # Random seed
             f16_kv=True,  # Use f16 for key/value cache (saves memory)