Spaces:

GOBA-AI-Labs
/

README

Configuration error

App Files Files Community

README / README.md

TOk-Atsuru

Upload README.md with huggingface_hub

639bfd9 verified 4 days ago

preview code

raw

history blame contribute delete

2.24 kB

	---
	title: GOBA-AI-Labs
	emoji: 🧠
	colorFrom: blue
	colorTo: purple
	---

	# GOBA-AI-Labs

	Making large AI models accessible on consumer hardware.

	We develop open-source tools for compressing Mixture-of-Experts (MoE) AI models. Our expert pruning technology reduces model sizes by 50-90% while preserving quality — enabling 400B+ parameter models to run on laptops with 24GB RAM.

	## PrunedHub Models

	Calibration-based expert pruning with zero retraining. Drop-in replacements for llama.cpp.

	\| Model \| Base \| Size \| Quality \| Highlights \|
	\|-------\|------\|------\|---------\|------------\|
	\| [PrunedHub GPT-OSS-20B-28x](https://huggingface.co/GOBA-AI-Labs/PrunedHub-GPT-OSS-20B-28x) \| GPT-OSS-20B \| 10.4 GB \| MMLU 78% (lossless) \| Zero quality loss, fits 16GB RAM \|
	\| [PrunedHub GPT-OSS-20B-27x-Zerobias](https://huggingface.co/GOBA-AI-Labs/PrunedHub-GPT-OSS-20B-27x-Zerobias) \| GPT-OSS-20B \| ~9.4 GB \| MMLU 77% (-1pp) \| Experimental router optimization \|
	\| [PrunedHub Qwen3-30B-A3B-JP-80pct](https://huggingface.co/GOBA-AI-Labs/PrunedHub-Qwen3-30B-A3B-JP-80pct) \| Qwen3-30B-A3B \| 14.0 GB \| MMLU 79% (think-ON) \| Language-aware pruning, Japanese quality preserved \|
	\| [PrunedHub Qwen3-Coder-Next-50pct](https://huggingface.co/GOBA-AI-Labs/PrunedHub-Qwen3-Coder-Next-50pct) \| Qwen3-Coder-Next \| 24.4 GB \| MMLU 72% \| 80B model in 24GB, outperforms Q2 quantization \|

	## Our Approach

	Traditional model compression relies on aggressive quantization, which degrades all computations uniformly. Our expert pruning takes a fundamentally different approach — removing entire redundant computation paths from MoE models while keeping the remaining experts at full precision.

	- Calibration-based importance scoring — Expert importance measured through actual inference behavior, not static weight analysis
	- Layer-adaptive expert allocation — Each layer retains a dynamically determined number of experts based on its contribution to quality
	- Language-aware optimization — Automatic detection and protection of language-specialized experts
	- Zerobias router optimization — Post-pruning router bias correction that extends the lossless compression frontier

	## Links

	- [Support us on Ko-fi](https://ko-fi.com/gobaailabs)