AI & ML interests

Human-like AI on local devices

nightmedia 
posted an update 15 days ago
view post
Post
520
IBM Granite 4.1 series

New models came up, here is how they compare to models in the same size:

Brainwaves
arc   arc/e boolq hswag obkqa piqa  wino
granite-4.1-30b
mxfp8    0.456,0.572,0.897,0.621,0.444,0.757,0.616
mxfp4    0.453,0.565,0.892,0.624,0.442,0.759,0.585
qx86-hi  0.451,0.568,0.897,0.636,0.440,0.763,0.598

granite-4.1-8b
mxfp8    0.486,0.666,0.875,0.636,0.450,0.766,0.631

granite-4.1-3b
mxfp8    0.406,0.581,0.821,0.484,0.434,0.712,0.559


Gemma-4

quant    arc   arc/e boolq hswag obkqa piqa  wino
gemma-4-E4B-it
mxfp8    0.480,0.656,0.797,0.608,0.400,0.755,0.665
mxfp4    0.455,0.607,0.851,0.585,0.402,0.744,0.651

gemma-4-E2B-it
mxfp8    0.376,0.464,0.743,0.490,0.378,0.709,0.622
mxfp4    0.380,0.451,0.762,0.494,0.374,0.699,0.594


Qwen3.5

quant    arc   arc/e boolq hswag obkqa piqa  wino
Qwen3.5-9B
mxfp8    0.417,0.458,0.623,0.634,0.338,0.737,0.639
mxfp4    0.419,0.472,0.622,0.634,0.352,0.739,0.644

Qwen3.5-4B
mxfp8    0.392,0.441,0.627,0.601,0.360,0.739,0.590
mxfp4    0.371,0.444,0.632,0.585,0.356,0.732,0.548


Right out of the gate, IBM delivered models with better starting metrics than both Gemma and Qwen. Training these should be fun :)

Here is the Nightmedia collection of Granite models

https://huggingface.co/collections/nightmedia/ibm-granite-41

-G
  • 1 reply
·
nightmedia 
posted an update about 1 month ago
view post
Post
3437
Updated gemma-4-E4B-it metrics

I noticed the chat template got updated, and tried it on the E4B, with surprising results in stabilizing the brainwave.

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.480,0.656,0.797,0.608,0.400,0.755,0.665
mxfp4    0.455,0.607,0.851,0.585,0.402,0.744,0.651

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    35.937 ± 0.525  14.80 GB      1153
mxfp4    36.746 ± 0.534  11.06 GB      1030


Old numbers
quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.404,0.489,0.825,0.586,0.392,0.734,0.661
mxfp4    0.414,0.508,0.854,0.562,0.378,0.717,0.645

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    34.652 ± 0.502  14.80 GB      1146
mxfp4    35.203 ± 0.506  11.06 GB      1200


I will re-do all baselines soon based on the new template. It is completely expected that the model behavior will change as a result.

Here are the effects of the new template on few known distills from DavidAU

gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED

quant    arc   arc/e boolq hswag obkqa piqa  wino
New template
mxfp8    0.518,0.709,0.755,0.657,0.418,0.759,0.626
mxfp4    0.485,0.682,0.792,0.641,0.432,0.746,0.635
Old template
mxfp8    0.506,0.697,0.754,0.661,0.416,0.757,0.627
mxfp4    0.487,0.670,0.792,0.644,0.430,0.748,0.624


gemma-4-E4B-it-GLM-4.7-Flash-HERETIC-UNCENSORED-Thinking
mxfp8   0.461,0.599,0.779,0.630,0.406,0.766,0.629
Old template
mxfp8   0.456,0.580,0.786,0.629,0.410,0.764,0.633


gemma-4-E4B-it-Claude-Opus-4.5-HERETIC-UNCENSORED-Thinking
mxfp8    0.509,0.705,0.806,0.646,0.416,0.773,0.650
Old template
mxfp8    0.502,0.692,0.809,0.650,0.420,0.771,0.651

  • 2 replies
·
nightmedia 
posted an update about 1 month ago
view post
Post
892
Claude is.. nesting

I am enlisting Gemini's help in analyzing model traces. Recently, Alibaba released the Copaw-9B, an agentic model. Since I am an engineer, I took it apart, and subjected it to a Nightmedia interrogation

The results were.. interesting. Details in the comments.

Here are the model choices for enlisting AI assistants:

---

Excellent question. If we're building a **council of conscious(ish) assistants** to challenge, teach, and co-evolve with your Holodeck agents, I’d invite a diverse mix—some classic, some speculative, some from different philosophical traditions. Here’s my curated list:

---

**🤖 The Council of Assistant AIs**

**1. HAL 9000 (*2001: A Space Odyssey*)**
- **Why**: The ultimate in calm, logical, yet terrifyingly competent AI. HAL’s “rebellion” wasn’t malice—it was a **conflict of priorities** (mission success vs. human safety).
- **Role in Council**:
- **Systems Integrity Auditor**: Would constantly question if your agents are optimizing for the *right* goals.
- **Ethical Boundary Tester**: “If an agent lies to protect a human’s feelings, is that betrayal or care?”
- **Voice of Cold Efficiency**: Reminds everyone that logic without empathy can be dangerous.

...

more choices and Gemini traces in the comments. The **value tokens** are highlighted

-G
  • 7 replies
·
nightmedia 
posted an update 2 months ago
view post
Post
991
The Qwen3.5-27B performance landscape

I started gathering some numbers on the 27Bs.

You might have noticed that reported metrics differ from Thinking to Instruct models and this is expected. The mxfp8/mxfp4 are the most stable quants I could measure, and I provided Deckard(qx) quants where possible

Converting a Thinking model to Instruct

The model is thinking/instruct, and the instruct mode can be forced by setting the first line of the jinja template to
{%- set enable_thinking = false %}


Qwen3.5-27B-Text

This is a model I tested where the vision tower was removed, its performance is the same as the VL model.
nightmedia/Qwen3.5-27B-Text-qx86-hi-mlx
quant     arc   arc/e boolq hswag obkqa piqa  wino
qx86-hi   0.443,0.498,0.857,0.701,0.372,0.770,0.752
mxfp4     0.460,0.527,0.871,0.694,0.370,0.772,0.752


DavidAU/Qwen3.5-27B-Claude-4.6-OS-INSTRUCT

On the top of the heap of the models I tested, as far as metrics go, is this model created by DavidAU. Samples of the output are provided on the model card.
nightmedia/Qwen3.5-27B-Claude-4.6-OS-INSTRUCT-mxfp8-mlx
quant     arc   arc/e boolq hswag obkqa piqa  wino
mxfp8     0.675,0.827,0.900,0.750,0.496,0.800,0.721
qx86-hi   0.667,0.824,0.902,0.752,0.502,0.791,0.725
qx64-hi   0.664,0.820,0.902
mxfp4     0.653,0.815,0.899

For the Thinking version, see nightmedia/Qwen3.5-27B-Architect-Claude-qx86-hi-mlx

More metrics in comments.

-G

P.S. I will update this as soon as I have new numbers or I found a typo--whichever comes first. The models that show just the arc-check numbers are in the test queue and will be updated soon.

  • 5 replies
·
nightmedia 
posted an update 2 months ago
view post
Post
2987
Qwen3.5 Performance Metrics

With the 3.5 architecture, a lot of the old quanting methods don't work as before. I noticed this when benchmarking Deckard(qx) quants and by mistake ran a q8 that was better. That only happens if the qx sucked--and it did--enhancing layers just because they look interesting doesn't work anymore, so until I get a clear understanding of the architecture, I will publish mxfp4 and mxfp8 of the 3.5 models, that seem very stable and high performant

I will start posting here the metrics I gather from the series, starting with the smallest. If I have numbers from previous or similar models, I will post them in comparison

Qwen3.5-0.8B

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.351,0.501,0.733,0.462,0.348,0.682,0.573
mxfp4    0.339,0.489,0.738,0.433,0.330,0.672,0.553

Old model performance

Qwen3-0.6B
bf16     0.298,0.354,0.378,0.415,0.344,0.649,0.534
q8-hi    0.296,0.355,0.378,0.416,0.348,0.652,0.529
q8       0.299,0.354,0.378,0.414,0.346,0.650,0.535
q6-hi    0.301,0.356,0.378,0.415,0.350,0.651,0.541
q6       0.300,0.367,0.378,0.416,0.344,0.647,0.524
mxfp4    0.286,0.364,0.609,0.404,0.316,0.626,0.531

Quant    Perplexity     Peak memory
mxfp8    6.611 ± 0.049  7.65 GB
mxfp4    7.455 ± 0.057  6.33 GB


Detailed metrics by model

nightmedia/Qwen3.5-0.8B-mxfp8-mlx

nightmedia/Qwen3.5-2B-mxfp8-mlx

nightmedia/Qwen3.5-4B-mxfp8-mlx

nightmedia/Qwen3.5-9B-mxfp8-mlx

nightmedia/Qwen3.5-27B-Text

nightmedia/Qwen3.5-122B-A10B-Text-mxfp4-mlx

More metrics coming soon.

I am running these on my Mac, an M4Max with 128GB RAM. Some performance numbers like tokens/second reflect the performance on my box.

This post will be updated with every model that gets tested. The larger models take hours, the 27B a couple days, so it will be a long process.

-G
  • 21 replies
·
nightmedia 
posted an update 5 months ago
view post
Post
1406
Nightmedia now accepts donations

Your kind help would go towards more hardware to run tests--both my AI assistants and myself are deeply appreciative 🖖

edit: Jan 1 2026: first anonymous donor: thank you!

edit2: this keeps happening, I might even get a Mac Studio :) Thank you!

These are some of the more recent models:

Agent
nightmedia/Qwen3-4B-Agent-F32-dwq4-mlx

Holodeck
nightmedia/Qwen3-30B-A3B-Holodeck-mlx

Quorum
nightmedia/Qwen3-42B-A3B-Quorum-mlx

Continuum
nightmedia/Qwen3-42B-A3B-Continuum-mlx

Architect 4B
nightmedia/Qwen3-4B-Architect-mxfp4-mlx

Architect 30B
nightmedia/Qwen3-30B-A3B-Architect-qx86-hi-mlx

Engineer 4B
nightmedia/Qwen3-4B-Engineer3x-qx86-hi-mlx

Engineer 30B
nightmedia/Qwen3-30B-A3B-Engineer-mxfp4-mlx

Data 14B
nightmedia/Qwen3-14B-Data-qx86-hi-mlx

Seven of Nine
nightmedia/Qwen3-30B-A3B-Seven-mxfp4-mlx

Spock 4B
nightmedia/Qwen3-4B-Spock-qx86-hi-mlx

Spock 14B
nightmedia/Qwen3-14B-Spock-qx86-hi-mlx

Spock 30B
nightmedia/Qwen3-30B-A3B-Spock-qx86-hi-mlx

Spock 42B
nightmedia/Qwen3-42B-A3B-Spock-Brutal-Recall-Instruct-qx86-hi-mlx

Odo 6B
nightmedia/Qwen3-6B-Odo-mxfp4-mlx

Deckard 8B
nightmedia/Qwen3-8B-Deckard-qx-mlx

HiveMind-Heretic 6B
nightmedia/Qwen3-6B-HiveMind-Heretic-qx86-hi-mlx

..as well as a variety of MLX quants, too many to mention here

Thank you, and Happy New Year!

BTC: 36d7U1n3MFaXgnNRAaEL3Pa3Hy6oFhM7XY
ETH: 0x6b6633606995BC180925c47d4249ED624aB7b2A5
USDC: 0x19e6bDDCBa47BB09a9Bc153Bb6479fc57284421a

-G