LiquidAI
/

LFM2.5-VL-450M

@@ -71,6 +71,7 @@ LFM2.5-VL-450M is a general-purpose vision-language model with the following fea
 | Model | Description |
 |-------|-------------|
 | [**LFM2.5-VL-450M**](https://huggingface.co/LiquidAI/LFM2.5-VL-450M) | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM. |
 | [LFM2.5-VL-450M-GGUF](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-GGUF) | Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage. |
 | [LFM2.5-VL-450M-ONNX](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-ONNX) | ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile). |
@@ -211,6 +212,7 @@ response = processor.tokenizer.decode(outputs[0, input_ids.shape[1]:], skip_spec
 | [vLLM](https://github.com/vllm-project/vllm) | High-throughput production deployments with GPU. | <a href="https://docs.liquid.ai/deployment/gpu-inference/vllm#vision-models">Link</a> | <a href="https://colab.research.google.com/drive/1sUfQlqAvuAVB4bZ6akYVQPGmHtTDUNpF?usp=sharing"><img src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png" width="110" alt="Colab link"></a> |
 | [SGLang](https://github.com/vllm-project/vllm) | High-throughput production deployments with GPU. | <a href="https://docs.liquid.ai/deployment/gpu-inference/sglang#vision-models">Link</a> | <a href="https://colab.research.google.com/drive/1qJlAFag223yFOZGzuMIkYUFhybM9ao5g?usp=sharing"><img src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png" width="110" alt="Colab link"></a> |
 | [llama.cpp](https://github.com/ggml-org/llama.cpp) | Cross-platform inference with CPU offloading. | <a href="https://docs.liquid.ai/lfm/inference/llama-cpp#vision-models">Link</a> | <a href="https://colab.research.google.com/drive/1q2PjE6O_AahakRlkTNJGYL32MsdUcj7b?usp=sharing"><img src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png" width="110" alt="Colab link"></a> |
 ## 🔧 Fine-tuning

 | Model | Description |
 |-------|-------------|
 | [**LFM2.5-VL-450M**](https://huggingface.co/LiquidAI/LFM2.5-VL-450M) | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM. |
+| [LFM2.5-VL-450M-MLX-8bit](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-MLX-8bit) | MLX format for Apple Silicon. Optimized for fast on-device inference on Mac with [mlx-vlm](https://github.com/Blaizzy/mlx-vlm). Also available in [4bit](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-MLX-4bit), [5bit](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-MLX-5bit), [6bit](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-MLX-6bit), and [bf16](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-MLX-bf16). |
 | [LFM2.5-VL-450M-GGUF](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-GGUF) | Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage. |
 | [LFM2.5-VL-450M-ONNX](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-ONNX) | ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile). |
 | [vLLM](https://github.com/vllm-project/vllm) | High-throughput production deployments with GPU. | <a href="https://docs.liquid.ai/deployment/gpu-inference/vllm#vision-models">Link</a> | <a href="https://colab.research.google.com/drive/1sUfQlqAvuAVB4bZ6akYVQPGmHtTDUNpF?usp=sharing"><img src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png" width="110" alt="Colab link"></a> |
 | [SGLang](https://github.com/vllm-project/vllm) | High-throughput production deployments with GPU. | <a href="https://docs.liquid.ai/deployment/gpu-inference/sglang#vision-models">Link</a> | <a href="https://colab.research.google.com/drive/1qJlAFag223yFOZGzuMIkYUFhybM9ao5g?usp=sharing"><img src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png" width="110" alt="Colab link"></a> |
 | [llama.cpp](https://github.com/ggml-org/llama.cpp) | Cross-platform inference with CPU offloading. | <a href="https://docs.liquid.ai/lfm/inference/llama-cpp#vision-models">Link</a> | <a href="https://colab.research.google.com/drive/1q2PjE6O_AahakRlkTNJGYL32MsdUcj7b?usp=sharing"><img src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png" width="110" alt="Colab link"></a> |
+| [mlx-vlm](https://github.com/Blaizzy/mlx-vlm) | Apple Silicon inference with MLX. | [LFM2.5-VL-450M-MLX-8bit](https://huggingface.co/LiquidAI/LFM2.5-VL-450M-MLX-8bit) | - |
 ## 🔧 Fine-tuning