@mitkox on Hugging Face: "I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

mitkox

posted an update Feb 8

Post

4786

I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.

With local AI, I don’t have /fast CC switch, but I have /absurdlyfast:
- 100’499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707’200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; It’s not the car. It’s the driver.

Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.

My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.

prince-arora

Feb 8

•

edited Feb 9

Amazing !! 👍🏻
Some curiosity:
What is the biggest llm model that could be run locally on a laptop with following configuration, without losing laptop performance:

1 GPU NVIDIA RTX 5090 24GB vRAM
CPU INTEL ULTRA CORE 9
64 GB RAM

Thanks in advance for sharing your insights!

Thanks!
Prince Arora

spl16

Feb 12

You should be able to run Qwen 3 32B at Q4_K_M pretty comfortably.

In this post