Image-Text-to-Text
Transformers
Safetensors
qwen3_5
qwen3.5
Merge
omnimerge
task-arithmetic
code
conversational
Instructions to use ManniX-ITA/Qwen3.5-4B-MicroCoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ManniX-ITA/Qwen3.5-4B-MicroCoder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="ManniX-ITA/Qwen3.5-4B-MicroCoder") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ManniX-ITA/Qwen3.5-4B-MicroCoder") model = AutoModelForImageTextToText.from_pretrained("ManniX-ITA/Qwen3.5-4B-MicroCoder") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ManniX-ITA/Qwen3.5-4B-MicroCoder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ManniX-ITA/Qwen3.5-4B-MicroCoder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ManniX-ITA/Qwen3.5-4B-MicroCoder", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/ManniX-ITA/Qwen3.5-4B-MicroCoder
- SGLang
How to use ManniX-ITA/Qwen3.5-4B-MicroCoder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ManniX-ITA/Qwen3.5-4B-MicroCoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ManniX-ITA/Qwen3.5-4B-MicroCoder", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ManniX-ITA/Qwen3.5-4B-MicroCoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ManniX-ITA/Qwen3.5-4B-MicroCoder", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use ManniX-ITA/Qwen3.5-4B-MicroCoder with Docker Model Runner:
docker model run hf.co/ManniX-ITA/Qwen3.5-4B-MicroCoder
| base_model: Qwen/Qwen3.5-4B | |
| license: apache-2.0 | |
| library_name: transformers | |
| tags: | |
| - qwen3.5 | |
| - merge | |
| - omnimerge | |
| - task-arithmetic | |
| - code | |
| # Qwen3.5-4B-MicroCoder | |
| A 4B-parameter code-leaning merge of Qwen3.5-4B that beats every individual | |
| source on LCB-medium-55, while preserving full GSM8K parity with the strongest | |
| reasoning fine-tune in the pool. | |
| This card documents `Qwen3.5-4B-MicroCoder` (internally `v2i-jv-base-task-arith`), | |
| the chosen frontier point of a 19-variant ablation that swept merge methods, | |
| density, importance signals, AIME-protection masks, and skip-layer surgery. | |
| Built with [**OmniMergeKit**](https://github.com/mann1x/omnimergekit) — the | |
| open-source merge engine developed for this work. | |
| ## Headline numbers (Q6_K, greedy) | |
| | Benchmark | base Qwen3.5-4B | jackrong-v2 (best source) | **MicroCoder** | Δ vs source | | |
| |---|---:|---:|---:|---:| | |
| | HumanEval (164q) | 60.37 | 60.37 | **57.32** | −3.05 | | |
| | MBPP (500q) | 46.00 | 45.00 | **52.00** | **+7.00** | | |
| | LiveCodeBench-30 (medium, post-2024-10-01) | 3.33 | 23.33 | **26.67** | **+3.34** | | |
| | LiveCodeBench-55 (full medium pool) | — | 25.45 | **27.27** | **+1.82** | | |
| | HumanEvalPlus (164q) | — | 54.88 | 50.00 | −4.88 | | |
| | GSM8K (100q) | — | 83.00 | **83.00** | 0.00 | | |
| | MMLU-Pro (200q) | — | 56.81 | 52.46 | −4.35 | | |
| | AIME (30q) | — | 26.67 | 3.33 | −23.34 | | |
| **Net:** +7pp MBPP, +3.3pp LCB-30, +1.8pp LCB-55, GSM8K parity. Trade-offs are | |
| HumanEval (−3pp), MMLU-Pro (−4.4pp), and the AIME math-reasoning floor | |
| (see "Why no AIME?" below). | |
| ## Recipe | |
| ```bash | |
| python omnimergekit.py \ | |
| --base Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2 \ | |
| --task-base Qwen/Qwen3.5-4B \ | |
| --source coder_eval/continuum-code-forged \ | |
| --source coder_eval/jackrong-python \ | |
| --method omnimerge_v2 --v2-features fisher,darex \ | |
| --weights 0.55,0.45 --density 0.53 --darex-q 0.85 \ | |
| --fisher continuum-forged.safetensors,jackrong-python.safetensors \ | |
| --pr682-turbo \ | |
| --seed 42 --device cuda | |
| ``` | |
| This is a **task-arithmetic** merge: | |
| ``` | |
| MicroCoder = jackrong-v2 + 0.55·DARE(continuum-code-forged − base) + 0.45·DARE(jackrong-python − base) | |
| ``` | |
| - **`jackrong-v2` is the merge base** — its full output style and reasoning | |
| policy survive intact at zero deltas. The two coding teachers contribute | |
| only their *delta from the official Qwen3.5-4B base*, not their absolute | |
| representations. This isolates "what the coder fine-tunes added on top of | |
| the public base" and grafts that onto the reasoning-distilled model. | |
| - **DAREx-q 0.85** drops the bottom 85% of cf/jp deltas by magnitude | |
| (per-tensor quantile) before random pruning, then rescales by 1/density. | |
| This kills low-magnitude noise while preserving the high-amplitude | |
| code-skill structure. | |
| - **Fisher importance** from forward-pass gradient maps over the coder | |
| fine-tunes' own training-style data weights the EMR election so dominant | |
| per-element directions win when the two coding teachers disagree. | |
| - **PR682-turbo** protects critical layers (norms, embeddings, lm_head, | |
| biases) at density 1.0 and falls back gracefully on shape mismatch. | |
| ## Sources | |
| | Model | Role | Weight | | |
| |---|---|---:| | |
| | [`Qwen/Qwen3.5-4B`](https://huggingface.co/Qwen/Qwen3.5-4B) | task base (delta reference) | — | | |
| | [`Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2`](https://huggingface.co/Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2) | merge base | 1.0 (passthrough at δ=0) | | |
| | `continuum-code-forged` | code teacher (delta) | 0.55 | | |
| | `jackrong-python` | code teacher (delta) | 0.45 | | |
| ## Evaluation methodology | |
| All evaluations: [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), | |
| llama.cpp `llama-server` with the published Q6_K quantization, | |
| `/v1/completions` raw endpoint, greedy decoding (`temperature=0.0, | |
| top_p=1.0`), `max_gen_toks=2048` for HE/MBPP, `max_gen_toks=8192` for LCB, | |
| `--parallel 2 --cache-type-k q8_0 --cache-type-v q8_0`. | |
| LiveCodeBench: medium-difficulty functional problems with | |
| `min_date=2024-10-01` (post-Qwen3.5 training cutoff to avoid contamination). | |
| LCB-30 = first 30 problems of that pool, LCB-55 = full pool of 55. | |
| ## Experiment trail (why this recipe?) | |
| 19 variants were ablated over a multi-week sweep. Summary table for the | |
| informative subset: | |
| | variant | merge form | AIME | HE | MBPP | LCB-30 | verdict | | |
| |---|---|---:|---:|---:|---:|---| | |
| | base | Qwen3.5-4B | 0.00 | 60.4 | 46.0 | 3.33 | floor | | |
| | jackrong-v2 | source | **26.67** | 60.4 | 45.0 | 23.3 | strong reasoning, weak LCB | | |
| | v2g | 3-src DARE-TIES, fisher+darex | 0.00 | 56.1 | **54.0** | 26.7 | code champion (no AIME) | | |
| | **v2i = MicroCoder** | task-arith on jv-base | 3.33 | **57.3** | 52.0 | **26.7** | **balanced — picked** | | |
| | v2j | v2i + skip mlp.gate_proj 18-25, darex 0.92 | 10.00 | — | — | — | first AIME signal | | |
| | v2k | v2j + wider skip 14-27 | 0.00 | — | — | — | over-blocked, collapsed | | |
| | v2l | v2j + full MLP skip 18-25 | 3.33 | — | — | — | up/down_proj carry code skill | | |
| | v2m | v2j + density 0.45 | 3.33 | — | — | — | lower density hits jv harder | | |
| | v2n | v2j + darex 0.95 | **13.33** | 55.5 | 50.8 | 20.0 | reasoning ceiling | | |
| | v2o | v2n + darex 0.97 | 13.33 | 56.7 | 51.0 | 16.7 | saturated | | |
| | v2p | v2n + jv-AIME fisher mask α=1.0 | 13.33 | 55.5 | 50.8 | 20.0 | mask redundant | | |
| | v2q | v2n + jv-AIME mask α=0.5 | 13.33 | 54.9 | 51.2 | 20.0 | mask redundant | | |
| | v2r | mask α=1.0 alone, no skip | 3.33 | — | — | — | per-element scaling cannot replace layer skip | | |
| ### Key findings (apply to future merge work) | |
| 1. **Task-arithmetic with the strong source as merge_base wins over symmetric | |
| DARE-TIES** when one source is much stronger on the target axis (here: | |
| reasoning). v2g and v2i tie on LCB-55 (27.27%) but v2i wins HE/HE+/GSM8K | |
| and retains a small AIME signal that pure DARE-TIES kills. | |
| 2. **Skip mlp.gate_proj layers 18-25 is the load-bearing AIME-recovery knob** | |
| (+6.7pp). This maps from Qwen3.6's think-policy band 27-52/64 → 32-layer | |
| Qwen3.5 = 14-26 conservative narrow 18-25. Wider bands (v2k 14-27) | |
| collapse; full-MLP skip (v2l) destroys code skill. | |
| 3. **DAREx-q 0.92 → 0.95 adds 3.3pp AIME on top of the skip** by killing more | |
| low-magnitude cf/jp deltas in the protected reasoning band. **0.95 → 0.97 | |
| saturates** (v2n=v2o on AIME). | |
| 4. **The jv-AIME fisher suppression mask is fully redundant with skip-layers** | |
| (v2n=v2p=v2q at AIME 13.33 *and* code metrics within noise). Per-element | |
| scaling cannot substitute for layer-level passthrough — jv's reasoning | |
| lives in the *coherent per-layer behavior* of mlp.gate_proj 18-25, not in | |
| the highest-importance individual cells. Mask alone (v2r) gives nothing. | |
| 5. **The 13.33% AIME ceiling is structural, not a tuning problem.** Three | |
| different mechanisms (high darex, higher darex, mask) all converge at | |
| the same number. Closing the remaining 13.34pp gap to jv source requires | |
| SFT distillation, not more merge tuning. | |
| ### Why no AIME on the chosen variant? | |
| MicroCoder (v2i) is the **code-leaning frontier point**. The skip-layer | |
| recipe (v2n) recovers AIME to 13.33% but at a 6.7pp LCB-30 regression. | |
| v2i preserves the better LCB; the trade is real and structural. A | |
| reasoning-leaning sibling exists internally (v2n) but is not published — | |
| LCB regression makes it strictly worse than `jackrong-v2` for math users | |
| who already have access to the original. | |
| ## Files | |
| - Full-precision safetensors weights (BF16). Use [`ManniX-ITA/Qwen3.5-4B-MicroCoder-GGUF`](https://huggingface.co/ManniX-ITA/Qwen3.5-4B-MicroCoder) for the Q6_K quantization. | |
| ## Use | |
| ```bash | |
| llama-server -m Qwen3.5-4B-MicroCoder-Q6_K.gguf \ | |
| --port 8099 -c 32768 -t 12 -ngl 99 \ | |
| --parallel 2 --cache-type-k q8_0 --cache-type-v q8_0 | |
| ``` | |
| Greedy `temperature=0.0, top_p=1.0` recommended for code tasks. | |
| ## Citation | |
| If you use this model or the OmniMergeKit recipes in your work: | |
| ``` | |
| @misc{mannix2026microcoder, | |
| title = {Qwen3.5-4B-MicroCoder: a task-arithmetic merge for code}, | |
| author = {Mannix, F.}, | |
| year = {2026}, | |
| url = {https://huggingface.co/ManniX-ITA/Qwen3.5-4B-MicroCoder}, | |
| note = {Built with OmniMergeKit, https://github.com/mann1x/omnimergekit} | |
| } | |
| ``` | |
| ## License | |
| Apache 2.0, inherited from Qwen3.5-4B and the source fine-tunes. | |