SEO refresh: April 2026 tags + canonical GT v13 + Founders CTA

f451ae7 verified about 2 hours ago

8.2 kB

license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
base_model: Qwen/Qwen2.5-7B-Instruct
base_model_relation: adapter
tags:
  - mixture-of-experts
  - moe
  - ternary
  - quantized
  - qwen2.5
  - outlier
  - local-llm
  - on-device
  - edge-ai
  - energy-efficient
  - sparse
  - overlay
  - research
  - apple-silicon
  - mac
language:
  - en
datasets: []
widget:
  - example_title: Widget math
    messages:
      - role: user
        content: >-
          If it takes 5 machines 5 minutes to make 5 widgets, how long would it
          take 100 machines to make 100 widgets? Show reasoning step by step.
  - example_title: Logic grid
    messages:
      - role: user
        content: >-
          Alice, Bob, and Carol each own a different pet (dog, cat, parrot).
          Alice does not own the dog. Bob owns the parrot. Who owns what?
  - example_title: Word problem
    messages:
      - role: user
        content: >-
          A train leaves station A at 60 mph and another leaves station B, 180
          miles away, at 40 mph traveling toward each other. When do they meet?
model-index:
  - name: Outlier-10B
    results:
      - task:
          type: text-generation
        dataset:
          name: MMLU
          type: cais/mmlu
        metrics:
          - type: accuracy
            value: 70.87
            name: MMLU 5-shot

Outlier-10B V3.3

Ternary Mixture-of-Experts overlay for Qwen2.5-7B-Instruct. Sparse architecture: shared full-precision FFN plus a gated ternary expert FFN per layer. Built by a solo founder on a Mac Studio as part of the Outlier research line feeding the Outlier desktop app for Apple Silicon.

Quick facts

Scale: 30.4B total / 13.3B active
Architecture: Ternary MoE overlay (shared FFN + alpha times expert FFN, TQ1_0 packing)
Experts: 88
Frozen base: Qwen2.5-7B-Instruct
MMLU 5-shot: 70.87% plus or minus 0.37% (n=14042, lm-evaluation-harness v0.4.9.1)
License: Apache 2.0
Intended use: research, benchmarking, derivative fine-tunes
Production Mac deployment: use an MLX 4-bit shipping tier instead

V3.3 alpha-fixed. Path A composition verified on Mac Studio at 0.24 tok/s CPU float16 (research pipeline, flagship MLX quantization queued).

What this is

An overlay checkpoint. This repo contains expert delta weights, router, modeling file, and config. Inference loads the base model separately and applies this overlay via the reference runtime.

What this is NOT

Not a standalone checkpoint; the frozen base above is required
Not calibrated for production throughput; shipping-tier MLX 4-bit variants exist for that
Not claiming to beat the base Qwen on raw MMLU at every scale; the product thesis is MMLU per GB of RAM, not raw MMLU (see comparison below)

Honest comparison vs base Qwen

Metric	Outlier-10B V3.3	Qwen2.5-7B-Instruct FP16
MMLU 5-shot	70.87%	Qwen published values (see base card)

The overlay compresses the expert path to ~1.6 bits per weight via TQ1_0 while preserving the shared FFN at full precision. This trades some raw accuracy for a smaller expert memory footprint when paired with int4 base quantization.

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_id = "Qwen/Qwen2.5-7B-Instruct"
overlay_id = "Outlier-Ai/Outlier-10B"

tok = AutoTokenizer.from_pretrained(base_id)
model = AutoModelForCausalLM.from_pretrained(
    overlay_id,
    trust_remote_code=True,
    torch_dtype=torch.float16,
)

prompt = "Explain ternary mixture-of-experts in one paragraph."
inputs = tok(prompt, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=256)
print(tok.decode(out[0], skip_special_tokens=True))

Apple Silicon production use: pick an MLX 4-bit repo from the Outlier-Ai organization. Those load via mlx_lm.generate.

Benchmarks (verified)

Benchmark	Score	n	Harness
MMLU 5-shot	70.87% plus or minus 0.37%	14042	lm-evaluation-harness v0.4.9.1
HellaSwag 10-shot	78.30%	14042	lm-evaluation-harness v0.4.9.1
ARC-Challenge 25-shot	62.88%	14042	lm-evaluation-harness v0.4.9.1
ARC-Easy 25-shot	85.98%	14042	lm-evaluation-harness v0.4.9.1
Winogrande 5-shot	73.80%	14042	lm-evaluation-harness v0.4.9.1
TruthfulQA 0-shot	62.11%	14042	lm-evaluation-harness v0.4.9.1

Evaluation artifacts retained on Outlier infrastructure; ground-truth summary at outlier.host.

Provenance

Harness: lm-evaluation-harness v0.4.9.1
Sample size: n = 14042
Config: 5-shot, bf16, batch_size=1
Date produced: 2026-04-13
Canonical ground truth: OUTLIER_GROUND_TRUTH_v13.md

Supersedes

Production Mac shipping tiers: Outlier-Ai organization
Consumer Edition collection: Outlier-Ai/outlier-consumer-edition-69e2fb4a0df119ea1747275e
Research collection: Outlier-Ai/outlier-research-69e2fb3a71984614b3c7a279
Server V3.2 collection: Outlier-Ai/outlier-server-v32-69e2fb4b71984614b3c7a4a3
Desktop app: https://outlier.host
Founders lifetime ($200, 500-seat cap): https://buy.polar.sh/polar_cl_mJfYZsEpEMDcYrgxzvTdnahSeSQNq1UYLqV0l08CUhW
Discord: discord.gg/Hapennmdn9

What is Outlier?

Outlier is a Mac-native, offline by default AI platform. One desktop app, a curated model library, Bring-Your-Own-Model support, projects with codebase indexing, a 9-tool coding agent, an artifacts panel, SQLite memory, and an OpenAI-compatible local API - all Apache 2.0, all local, no cloud round-trip, no per-token billing.

Desktop app: https://outlier.host
Founders lifetime (500-seat cap, $200 one-time): support Outlier
Discord: discord.gg/Hapennmdn9
Org: huggingface.co/Outlier-Ai

Built solo in Grand Rapids, Michigan. 19 days to 35+ HF repos and 8,305 downloads, under $1,200 in compute spend. Mission: a local AI model that codes as well as the cloud flagships, funded by users who want it - not investors who need exits.

Known limits

Overlay checkpoint; the frozen base above is required.
Shared FFN runs at full precision on load - plan RAM accordingly if you compose this without int4 quantization on the base.
Raw MMLU at 70.87% is compared above against the same-scale Qwen FP16 release; the product thesis is MMLU per GB of RAM, not raw MMLU.
English-tuned. Multilingual behavior inherits the base model directly.

Attribution

Base weights by the Qwen team at Alibaba, released under Apache 2.0. Outlier contributes the ternary MoE overlay, alpha-fix scalars, and training pipeline. All capability credit for the base is upstream. Qwen team: qwenlm.github.io.

Patents and citation

Architecture, training pipeline, and inference engine covered by US provisional patents 64/026,886, 64/030,368, and 64/034,028 (Kerr & Company LLC, 2026).

@misc{kerr2026outlier10b,
  title={ Outlier-10B V3.3: Ternary Mixture-of-Experts Overlay for Qwen2.5 },
  author={ Kerr, Matthew },
  year={ 2026 },
  howpublished={ \url{ https://huggingface.co/Outlier-Ai/Outlier-10B } },
}

Contact

Matt Kerr - outlier.host - @outlier_ai

Outlier-Ai
/

Outlier-10B