Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
mitkox 
posted an update 4 days ago
Post
4577
I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.

With local AI, I don’t have /fast CC switch, but I have /absurdlyfast:
- 100’499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707’200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; It’s not the car. It’s the driver.

Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.

My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.

Amazing !! 👍🏻
Some curiosity:
What is the biggest llm model that could be run locally on a laptop with following configuration, without losing laptop performance:

  • 1 GPU NVIDIA RTX 5090 24GB vRAM
  • CPU INTEL ULTRA CORE 9
  • 64 GB RAM

Thanks in advance for sharing your insights!

Thanks!
Prince Arora

·

You should be able to run Qwen 3 32B at Q4_K_M pretty comfortably.