TOk-Atsuru commited on
Commit
639bfd9
·
verified ·
1 Parent(s): 13f05d1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +33 -7
README.md CHANGED
@@ -1,10 +1,36 @@
1
  ---
2
- title: README
3
- emoji: 🐨
4
- colorFrom: pink
5
- colorTo: pink
6
- sdk: static
7
- pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: GOBA-AI-Labs
3
+ emoji: 🧠
4
+ colorFrom: blue
5
+ colorTo: purple
 
 
6
  ---
7
 
8
+ # GOBA-AI-Labs
9
+
10
+ **Making large AI models accessible on consumer hardware.**
11
+
12
+ We develop open-source tools for compressing Mixture-of-Experts (MoE) AI models. Our expert pruning technology reduces model sizes by 50-90% while preserving quality — enabling 400B+ parameter models to run on laptops with 24GB RAM.
13
+
14
+ ## PrunedHub Models
15
+
16
+ Calibration-based expert pruning with zero retraining. Drop-in replacements for llama.cpp.
17
+
18
+ | Model | Base | Size | Quality | Highlights |
19
+ |-------|------|------|---------|------------|
20
+ | [PrunedHub GPT-OSS-20B-28x](https://huggingface.co/GOBA-AI-Labs/PrunedHub-GPT-OSS-20B-28x) | GPT-OSS-20B | 10.4 GB | MMLU 78% (lossless) | Zero quality loss, fits 16GB RAM |
21
+ | [PrunedHub GPT-OSS-20B-27x-Zerobias](https://huggingface.co/GOBA-AI-Labs/PrunedHub-GPT-OSS-20B-27x-Zerobias) | GPT-OSS-20B | ~9.4 GB | MMLU 77% (-1pp) | Experimental router optimization |
22
+ | [PrunedHub Qwen3-30B-A3B-JP-80pct](https://huggingface.co/GOBA-AI-Labs/PrunedHub-Qwen3-30B-A3B-JP-80pct) | Qwen3-30B-A3B | 14.0 GB | MMLU 79% (think-ON) | Language-aware pruning, Japanese quality preserved |
23
+ | [PrunedHub Qwen3-Coder-Next-50pct](https://huggingface.co/GOBA-AI-Labs/PrunedHub-Qwen3-Coder-Next-50pct) | Qwen3-Coder-Next | 24.4 GB | MMLU 72% | 80B model in 24GB, outperforms Q2 quantization |
24
+
25
+ ## Our Approach
26
+
27
+ Traditional model compression relies on aggressive quantization, which degrades all computations uniformly. Our expert pruning takes a fundamentally different approach — removing entire redundant computation paths from MoE models while keeping the remaining experts at full precision.
28
+
29
+ - **Calibration-based importance scoring** — Expert importance measured through actual inference behavior, not static weight analysis
30
+ - **Layer-adaptive expert allocation** — Each layer retains a dynamically determined number of experts based on its contribution to quality
31
+ - **Language-aware optimization** — Automatic detection and protection of language-specialized experts
32
+ - **Zerobias router optimization** — Post-pruning router bias correction that extends the lossless compression frontier
33
+
34
+ ## Links
35
+
36
+ - [Support us on Ko-fi](https://ko-fi.com/gobaailabs)