Instructions to use MegawizCo/typhoon-ocr-7b-mlx-q4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use MegawizCo/typhoon-ocr-7b-mlx-q4 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("MegawizCo/typhoon-ocr-7b-mlx-q4") config = load_config("MegawizCo/typhoon-ocr-7b-mlx-q4") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Typhoon-OCR-7B — MLX q4
MLX-quantized port of typhoon-ai/typhoon-ocr-7b for native Apple Silicon inference. Larger sibling of the 3b variant — same model family, more parameters → more robust on tricky handwriting at the cost of latency.
Quantization: 4-bit affine, group size 64. Effective rate 5.439 bits/weight (vision-projector + layer-norm stay full precision). Size on disk: 5.3 GB.
⚠️ Note: config.json patched
The upstream typhoon-ai/typhoon-ocr-7b config.json is missing the vision_config fields that mlx-vlm's Qwen2.5-VL loader requires (out_hidden_size, depth, intermediate_size, num_heads, patch_size, spatial_merge_size, window_size, fullatt_block_indexes, temporal_patch_size). The conversion fails with a shape mismatch otherwise. We patched the config by copying those fields verbatim from the upstream Qwen/Qwen2.5-VL-7B-Instruct config — the base vision tower is identical, only the fields were omitted from the upstream serialization.
The patched config.json ships with this repo. If you re-run the conversion command below on the upstream model, you need to apply the same patch first (see end of this README).
Benchmark
7-image internal smoke set (2 synthetic printed Thai/English + 5 synthetic mixed-Thai handwriting). Same prompt (Extract all text from this image.) on Mac mini Apple Silicon, 2026-05-13:
| Backend | HW CER median | HW CER max | Wall median | Generation TPS | Peak RAM |
|---|---|---|---|---|---|
| 3b MLX q4 | 0.009 | 0.081 | 1.95 s | ~107 | ~3.5 GB |
| 3b MLX q8 | 0.000 | 0.081 | 2.34 s | ~65 | ~5 GB |
| 7b MLX q4 (this) | 0.012 | 0.037 | 3.55 s | ~56 | ~6 GB |
| 7b MLX q8 | 0.012 | 0.028 | 4.66 s | ~32 | ~9 GB |
Read: the 7b's median CER (0.012) is slightly worse than 3b q8 (0.000) on this synthetic set — but the CER ceiling is meaningfully lower (0.037 vs 0.081). 7b is the right pick when you care about worst-case behaviour on hard handwriting; 3b q4 is the right pick when you care about throughput and have a fallback.
The test set is synthetic — generated Thai medical-style text rendered with handwriting-style fonts — not real PHI. Real photographed handwriting is harder; expect higher CER until partner-hospital evaluation lands.
Usage
uv pip install mlx-vlm
from mlx_vlm import generate, load
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model, processor = load("MegawizCo/typhoon-ocr-7b-mlx-q4")
config = load_config("MegawizCo/typhoon-ocr-7b-mlx-q4")
prompt = apply_chat_template(processor, config, "Extract all text from this image.", num_images=1)
out = generate(model, processor, prompt, image=["prescription.png"], max_tokens=512)
print(out.text) # 7b wraps in {"natural_text": "..."} (3b uses {"text": "..."})
⚠️ Output shape differs from 3b: 7b wraps in {"natural_text": "..."} (3b uses {"text": "..."}). Parse with both keys in a fallback.
CLI:
mlx_vlm.generate \
--model MegawizCo/typhoon-ocr-7b-mlx-q4 \
--image prescription.png \
--prompt "Extract all text from this image." \
--max-tokens 512
Reproduce the conversion
The upstream config patch + conversion in three commands:
# 1) Pull upstream
hf download typhoon-ai/typhoon-ocr-7b --local-dir typhoon-ocr-7b
# 2) Patch vision_config (copy missing fields from Qwen2.5-VL-7B-Instruct)
python3 -c "
import json
with open('typhoon-ocr-7b/config.json') as f: cfg = json.load(f)
cfg['vision_config'].update({
'depth': 32, 'hidden_act': 'silu', 'hidden_size': 1280,
'intermediate_size': 3420, 'num_heads': 16, 'in_chans': 3,
'out_hidden_size': 3584, 'patch_size': 14, 'spatial_merge_size': 2,
'spatial_patch_size': 14, 'window_size': 112,
'fullatt_block_indexes': [7, 15, 23, 31],
'tokens_per_second': 2, 'temporal_patch_size': 2,
})
with open('typhoon-ocr-7b/config.json', 'w') as f: json.dump(cfg, f, indent=2)
"
# 3) Convert
mlx_vlm.convert --hf-path ./typhoon-ocr-7b -q --q-bits 4 --mlx-path typhoon-ocr-7b-mlx-q4
License & attribution
- License: Apache 2.0 — inherited from upstream
typhoon-ai/typhoon-ocr-7b. - Base model: SCB 10X / Typhoon AI.
- Vision architecture: Qwen2.5-VL.
- Config patch + quantization + repackaging: MegawizCo (2026-05-13).
Related
MegawizCo/typhoon-ocr-7b-mlx-q8— higher-fidelity 7b variantMegawizCo/typhoon-ocr-3b-mlx-q4— faster, smaller, throughput-defaultMegawizCo/typhoon-ocr-3b-mlx-q8— 3b high-fidelity
- Downloads last month
- 70
4-bit