Spaces:

GOBA-AI-Labs
/

README

Configuration error

App Files Files Community

TOk-Atsuru commited on Feb 21

Commit

639bfd9

verified ·

1 Parent(s): 13f05d1

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +33 -7

README.md CHANGED Viewed

@@ -1,10 +1,36 @@
 ---
-title: README
-emoji: 🐨
-colorFrom: pink
-colorTo: pink
-sdk: static
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: GOBA-AI-Labs
+emoji: 🧠
+colorFrom: blue
+colorTo: purple
 ---
+# GOBA-AI-Labs
+**Making large AI models accessible on consumer hardware.**
+We develop open-source tools for compressing Mixture-of-Experts (MoE) AI models. Our expert pruning technology reduces model sizes by 50-90% while preserving quality — enabling 400B+ parameter models to run on laptops with 24GB RAM.
+## PrunedHub Models
+Calibration-based expert pruning with zero retraining. Drop-in replacements for llama.cpp.
+| Model | Base | Size | Quality | Highlights |
+|-------|------|------|---------|------------|
+| [PrunedHub GPT-OSS-20B-28x](https://huggingface.co/GOBA-AI-Labs/PrunedHub-GPT-OSS-20B-28x) | GPT-OSS-20B | 10.4 GB | MMLU 78% (lossless) | Zero quality loss, fits 16GB RAM |
+| [PrunedHub GPT-OSS-20B-27x-Zerobias](https://huggingface.co/GOBA-AI-Labs/PrunedHub-GPT-OSS-20B-27x-Zerobias) | GPT-OSS-20B | ~9.4 GB | MMLU 77% (-1pp) | Experimental router optimization |
+| [PrunedHub Qwen3-30B-A3B-JP-80pct](https://huggingface.co/GOBA-AI-Labs/PrunedHub-Qwen3-30B-A3B-JP-80pct) | Qwen3-30B-A3B | 14.0 GB | MMLU 79% (think-ON) | Language-aware pruning, Japanese quality preserved |
+| [PrunedHub Qwen3-Coder-Next-50pct](https://huggingface.co/GOBA-AI-Labs/PrunedHub-Qwen3-Coder-Next-50pct) | Qwen3-Coder-Next | 24.4 GB | MMLU 72% | 80B model in 24GB, outperforms Q2 quantization |
+## Our Approach
+Traditional model compression relies on aggressive quantization, which degrades all computations uniformly. Our expert pruning takes a fundamentally different approach — removing entire redundant computation paths from MoE models while keeping the remaining experts at full precision.
+- **Calibration-based importance scoring** — Expert importance measured through actual inference behavior, not static weight analysis
+- **Layer-adaptive expert allocation** — Each layer retains a dynamically determined number of experts based on its contribution to quality
+- **Language-aware optimization** — Automatic detection and protection of language-specialized experts
+- **Zerobias router optimization** — Post-pruning router bias correction that extends the lossless compression frontier
+## Links
+- [Support us on Ko-fi](https://ko-fi.com/gobaailabs)