16 6 27

ManniX PRO

ManniX-ITA

https://github.com/mann1x

mann1x

AI & ML interests

None yet

Recent Activity

reacted to mlabonne's post with 👍 about 19 hours ago

Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs. > Added many new datasets > New "thinking" column > Refreshed recommended tools. Thanks to everyone who told me they used it for their research at ICLR, you motivated this update!

new activity about 21 hours ago

ManniX-ITA/Qwen3.5-4B-MicroCoder-GGUF:❌ incorrect

reacted to DedeProGames's post with 🔥 about 21 hours ago

GRaPE 2 Pro is now available. https://huggingface.co/SL-AI/GRaPE-2-Pro This is the flagship model of the GRaPE 2 family and the largest model I have trained to date, sitting at 27B parameters. It is built on Qwen3.5-27B and trained on a closed-source proprietary dataset, with roughly half of post-training focused on code and the rest split between STEAM subjects and structured logical reasoning. It punches seriously above its weight class. GRaPE 2 Pro supports multimodal input (image + text) and features 6 thinking modes via the `<thinking_mode>` tag. This gives you real control over how hard the model thinks, from skipping the reasoning phase entirely with `minimal`, all the way up to `xtra-Hi` for deep, extended thought on hard problems. For most agentic use, `auto` or `low` is the move to keep things snappy. It also runs on consumer hardware. You can get it going with as low as 12GB of VRAM on a quantized build. If you want to try it out and give feedback, that would be really appreciated. Email us at `contact@skinnertopia.com`

View all activity

Organizations

None yet

reacted to mlabonne's post with 👍 about 19 hours ago

Post

955

Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs.

> Added many new datasets
> New "thinking" column
> Refreshed recommended tools.

Thanks to everyone who told me they used it for their research at ICLR, you motivated this update!

2 replies

reacted to DedeProGames's post with 🔥 about 21 hours ago

Post

5513

GRaPE 2 Pro is now available.

SL-AI/GRaPE-2-Pro

This is the flagship model of the GRaPE 2 family and the largest model I have trained to date, sitting at 27B parameters. It is built on Qwen3.5-27B and trained on a closed-source proprietary dataset, with roughly half of post-training focused on code and the rest split between STEAM subjects and structured logical reasoning. It punches seriously above its weight class.

GRaPE 2 Pro supports multimodal input (image + text) and features 6 thinking modes via the tag. This gives you real control over how hard the model thinks, from skipping the reasoning phase entirely with minimal, all the way up to xtra-Hi for deep, extended thought on hard problems. For most agentic use, auto or low is the move to keep things snappy.

It also runs on consumer hardware. You can get it going with as low as 12GB of VRAM on a quantized build.

If you want to try it out and give feedback, that would be really appreciated. Email us at contact@skinnertopia.com

1 reply

posted an update about 22 hours ago

Post

🚀 Exciting week, 2 new research projects and 2 new tools!

▶ Mythic-RDT - OpenMythos blueprint with a retrofit-recurrence fine-tune
https://github.com/mann1x/Mythic-RDT

▶ cross-tokenizer-distill (CTD) - knowledge distillation across different tokenizer vocabularies
https://github.com/mann1x/cross-tokenizer-distill

For Mythic-RDT, I have chosen the pretty outdated DS-Coder-V2 16B.
It's small enough to not need more than 48GB VRAM but once I leaned on KL for depth recurring fine-tune (couldn't go above parity to T=1 with T=4, not the best for 4x inference time), started investigating the KL recipe and questioned the teacher, same DS-Coder-V2 but at BF16.
For a better teacher the option would have been just one, DS-Coder-V2-236B. Not only so big that I'd need 4xH100 to run but also surpassed even by Qwen3-Coder-32B on HE/MBPP.
Hence here's CTD tool, validated but still in development to find a good recipe for Qwen->DS distill.

▶ Qwen3.5-4B-MicroCoder - code-leaning and reasoning merge of Qwen3.5-4B
ManniX-ITA/Qwen3.5-4B-MicroCoder

▶ Omnimergekit - merge toolkit, merge & quantization scripts, experiments logs
https://github.com/mann1x/omnimergekit

You can find my merge toolkit and scripts in the repo, so they don't get scattered over the HF repos.
Interesting experiment with MicroCoder; only a couple of base, reasoning broken, coding fine-tune to merge with the excellent instruct reasoning JackRong-v2.
The result is not truly exciting but manages to improve LiveCodeBench above JR-v2, improve MBPP and not completely breaking reasoning.
This is achieved with omnimergekit using differential signals generated by the delta vs the base model from the good and wrong answers delta between the sources (HE/MBPP/AIME).
The very long eval sessions proved that the method does not just bias the scores of these evals but improve others even above the baseline.

replied to their post 5 days ago

🔄 Follow-up to the M1..M5 study above — re-ran M4 against the updated PR #682 head ("turbo" branch by @Tusm11, HEAD 8d989f6) with rebalanced hyperparams (equal weights w=1/1, density 0.7, closer to the PR's worked example). Same AttnLRP signal as M4-orig, same sources.

▶ Qwen3.5-4B-M4-v2-ex-LRP-turbo
ManniX-ITA/Qwen3.5-4B-M4-v2-ex-LRP-turbo

Q6_K HE / MBPP pass@1, M4-v2 inserted:

M1 Vanilla DARE-TIES → 51.22 / 47.00
M2 OMv2 (no signal) → 52.44 / 49.40
M3 OMv2 + Fisher → 57.93 🥇 / 48.80
M4 ex-LRP (PR #682 orig) → 51.22 / 49.40
M4-v2 ex-LRP (PR #682 turbo) → 55.49 / 52.20 🥇
M5 OMv2 + LRP → 53.05 / 51.40

Δ M4-v2 vs M4-orig: +4.27 pp HE, +2.80 pp MBPP. M4-v2 takes the MBPP medal of the whole study (overtakes M5) while staying competitive on HumanEval. The turbo code path + rebalanced hyperparams clearly beat the original PR head on this configuration.

Findings refresh: Fisher still leads HE; ex-LRP (turbo) now leads MBPP, narrowly ahead of OMv2+LRP. Both LRP variants land within 1 pp on MBPP — strong signal that LRP-driven sparsification is doing real work for code-gen on small Qwen merges.

Big thanks to @Tusm11 for the supercharged ex-LRP turbo head — multimodal support + Iron-Man stabilization + in-place math are a real upgrade. Posted full results + the 6 patches needed to run it against Qwen3_5ForConditionalGeneration on the PR thread:
https://github.com/arcee-ai/mergekit/pull/682

posted an update 6 days ago

Post

2971

🚀 Two releases this week pushing merge methodology forward.

▶ Qwen3.6-27B-Omnimerge-v4-MLP
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4

Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election).

Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in
layers 27–52. Fix: MLP-passthrough surgery — copy MLPs verbatim from base, keep merged attn + linear_attn. Leak → 0%.

Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5):
• HumanEval: 84.76% (= base, +5.49 pp vs v2)
• MBPP corrected: 73.40% (+15.80 pp vs base, ≈ v2)
• GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2)

▶ Qwen3.5-4B Importance-Signal Study (M1..M5)

Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies.

Q6_K HE / MBPP pass@1:
• M1 Vanilla DARE-TIES → 51.22 / 47.00
• M2 OMv2 (no signal) → 52.44 / 49.40
• M3 OMv2 + Fisher → 57.93 🥇 / 48.80
• M4 mergekit ex-LRP (PR #682) → 51.22 / 49.40
• M5 OMv2 + LRP → 53.05 / 51.40 🥇

Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed
5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt.

All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.

1 reply

posted an update 10 days ago

Post

222

Two custom releases — both unusual takes on common problems, on a single RTX 3090 + a vast.ai pod.

🔹 ManniX-ITA/Qwen3.5-27B-Omnimerge-v2

3-source weight-space merge over Qwen3.5-27B combining OBIM-lite magnitude masking + DAREx rescaling + EMR election (sign from consensus, amplitude from max-abs across sources). GPU-accelerated, ~35× over CPU.

Sources: Claude-4.6-Opus-distill (0.40), Esper3.1 code (0.35), Gemini-3.1-Pro-distill (0.25). density 0.53, DAREx q 0.75.

Q6_K vs best source:
• GPQA Diamond: 53.03 → 69.19 (+16.16 pp)
• MBPP pass@1: 71.20 → 74.60 (+3.40)
• HumanEval pass@1: 76.22 → 79.27 (+3.05)

vs Omnimerge v1 (vanilla DARE-TIES): +8.08 pp GPQA, +2.80 MBPP. Amplitude-from-max + sign-from-consensus is what unlocked the GPQA jump.

🔹 ManniX-ITA/gemma-4-A4B-98e-v3-it

Gemma 4 26B-A4B pruned 128 → 98 experts/layer (-23.4% MoE capacity, -5.2B params), zero GPQA degradation.

GPQA Diamond:
• 128e reference: 75.25%
• 98e v3 (this): 75.25% — +0.00 pp despite -23.4% capacity, -5.2B params
• 109e v3 (older): 71.72% — -3.53 pp

The win over 109e v3 came from changing the importance map: aggregate per-expert contribution across math/logic/code/science/creative via 128-token teacher-force, instead of GPQA-specific per-question top-16 (which overfitted). Result: more experts dropped, quality preserved.

Findings worth flagging:
• Experts NOT topic-specialized — 28/32 overlap math/creative top-32.
• Expert weight cosine ≈ 0.05 max → merging destroys the model. Dropping is the only viable structural compression here.
• Contribution Gini ≈ 0.38 → ~75 experts/layer carry 80% of signal.

Eval: lm-eval gpqa_diamond_cot_zeroshot, llama-server --reasoning-format deepseek --reasoning-budget 8192, Gemma 4 official sampling. Feedback welcome.

reacted to mlabonne's post with 🚀 4 months ago

Post

10332

New family of 1B models just dropped!

> LiquidAI/LFM2.5-1.2B-Base: 10T → 28T tokens
> LiquidAI/LFM2.5-1.2B-Instruct: new large-scale multi-stage RL
> LiquidAI/LFM2.5-1.2B-JP: our most polite model
> LiquidAI/LFM2.5-VL-1.6B: multi-image multilingual
> LiquidAI/LFM2.5-Audio-1.5B: 8x times faster, no quality loss

Super proud of this release 🤗

3 replies

reacted to hesamation's post with ❤️ 8 months ago

Post

13478

a senior engineer at google just dropped a 400-page free book on docs for review: agentic design patterns.

the table of contents looks like everything you need to know about agents + code:
> advanced prompt techniques
> multi-agent patterns
> tool use and MCP
> you name it

read it here: https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/edit?tab=t.0#heading=h.pxcur8v2qagu

you can also pre-order on Amazon (published by Springer) and the royalties goes to Save the Children: https://www.amazon.com/Agentic-Design-Patterns-Hands-Intelligent/dp/3032014018/

reacted to ehristoforu's post with 👍 almost 2 years ago

Post

6416

🤗 Hello from the Project Fluently team!

🥏 We are ready to announce a new series of Supple Diffusion models, these are new generation diffusion models (about 1-2 weeks left before release).

🦾 The new series aims to take diffusion models to the next level, with performance and versatility as the main goal.

🧐 How will our models be better than others? Firstly, we worked on the CLIP models, now they understand your requests better, it will become easier to process. Secondly, we trained the models with high quality, even better than all our previous ones. Thirdly, you won’t have to keep 20 models on your disk; only 4-6 will be enough.

🗺️ Roadmap:
1. Create Supple Diffusion Small
2. Creating Supple Diffusion Medium
3. Create Supple Diffusion Large

🎆 Our models are universal for realism, and for cartoons, and for anime, and for caricatures.

💖 The project really needs your support and your recommendations and reviews, please do not hesitate to write comments under this post, thank you!

🖼️ Below are demo images made with the pre-release version of Supple Diffusion Small.

4 replies

reacted to leonardlin's post with 👍 almost 2 years ago

Post

2139

My weekened project ended up being doing some testing between torchtune, axolotl, and unsloth. I *think* it's a 1:1 comparison of what LoRA fine-tuning performance looks like between the different hardware I have in my dev boxes (4090, 3090, 7900 XTX, W7900) with a few other interesting tidbits.

Tonight I wrote up a WandB report (the panel editor is super broken in Firefox 😔) that sums up some of the more interesting bits from the results: https://wandb.ai/augmxnt/train-bench/reports/torchtune-vs-axolotl-vs-unsloth-Trainer-Comparison--Vmlldzo4MzU3NTAx

1 reply

reacted to mlabonne's post with ❤️ about 2 years ago

Post

9602

⚡ AutoQuant

AutoQuant is the evolution of my previous AutoGGUF notebook (https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu). It allows you to quantize your models in five different formats:

- GGUF: perfect for inference on CPUs (and LM Studio)
- GPTQ/EXL2: fast inference on GPUs
- AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
- HQQ: extreme quantization with decent 2-bit and 3-bit models

Once the model is converted, it automatically uploads it on the Hugging Face Hub. To quantize a 7B model, GGUF only needs a T4 GPU, while the other methods require an A100 GPU.

Here's an example of a model I quantized using HQQ and AutoQuant: mlabonne/AlphaMonarch-7B-2bit-HQQ

I hope you'll enjoy it and quantize lots of models! :)

💻 AutoQuant: https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4

19 replies

ManniX PRO

AI & ML interests

Recent Activity

Organizations

ManniX-ITA's activity