Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
witcheer's picture
Open to Work
1

witcheer PRO

witcheer
webxos's profile picture
·
https://x.com/witcheer
  • witcheer
  • notwitcheer

AI & ML interests

Local AI Maxxing.

Recent Activity

posted an update about 18 hours ago
updated my MoE offload bench dataset + collection. >>> previous finding: Qwen3.6-35B-A3B via full expert offload on RTX 4060 Ti 8GB + 32GB RAM → 7.4 tok/sec. RAM-ceilinged, disk-bound. >>> new finding: built llama.cpp from source inside WSL2, swept -ncmoe values for partial offload. ``` ncmoe 32, 16K ctx → 29.7 tok/sec ncmoe 30, 16K ctx → 32.0 tok/sec ncmoe 30, 32K ctx → 35.4 tok/sec ncmoe 28, 16K ctx → 16.3 tok/sec (VRAM cliff) ncmoe 30, 65K ctx → 17.4 tok/sec (VRAM cliff) ``` 4.8x faster than full offload. 8GB VRAM cliff is sharp - crossing ~7 GB halves throughput instantly. the hybrid SSM+attention architecture means 32K context is nearly free (KV cache only scales for 10/40 layers). dataset: https://huggingface.co/datasets/witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05 collection: https://hf.co/collections/witcheer/8gb-vram-local-llms-practitioner-tested
updated a dataset about 18 hours ago
witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05
updated a collection 2 days ago
8GB VRAM Local LLMs - Practitioner Tested
View all activity

Organizations

None yet

witcheer 's models

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs