Susant Achary
Susant-Achary
AI & ML interests
Tiny to Small Language Models, Building from India. Quantization and MLX
Recent Activity
liked a model 10 days ago
ggml-org/NVIDIA-Nemotron-3-Nano-Omni liked a model 6 months ago
mlx-community/medgemma-27b-it-8bit upvoted an article 6 months ago
We’re open-sourcing our text-to-image model and the process behind itOrganizations
<7B Best of MoE 🧠
Collection of Small size big impact MoE.
-
LiquidAI/LFM2-8B-A1B
Text Generation • 8B • Updated • 104k • 352 -
ibm-granite/granite-4.0-h-tiny
Text Generation • 7B • Updated • 80.1k • 201 -
microsoft/Phi-4-multimodal-instruct
Automatic Speech Recognition • 6B • Updated • 388k • 1.6k -
google/gemma-3n-E4B-it
Image-Text-to-Text • Updated • 42.4k • • 912
Audio Features
Feature Extraction with 🧠 Text Embeddings
models for turning text, images, audio (and combos) into useful vectors or feature maps. Ideal for search/RAG, clustering, recommendation, retrieval.
🪶 Sept’25 <Text Generation Language Models >(Top Releases)
coding models and pipelines released this month that boost repo-level reasoning, GUI automation, and tool use. Focused on practical editing.
🖼️ **Text2Image, i2i ** September ’25 (Top Releases)
Cutting-edge image generation & VLM updates from September ’25. This collection spotlights models that improved text rendering, layout control & more.
📄➡️🔊 Text-to-Speech (TTS)
Speech synthesis models that turn text into natural audio. Includes multilingual TTS, low-latency real-time models, and voice-cloning variants.
📚➡️🎨Text-to-Image
State-of-the-art diffusion and generative models that turn text prompts into detailed images. Includes lightweight CPU-friendly and photorealistic mdl
🎨➡️✍️ Image-to-Text
OCR, captioning, and visual QA models that turn pure images into descriptive or structured text.
-
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2.39M • 852 -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 764k • 1.47k -
nlpconnect/vit-gpt2-image-captioning
Image-to-Text • Updated • 261k • 929 -
microsoft/trocr-base-handwritten
Image-to-Text • 0.3B • Updated • 166k • 494
🌀 Any-to-Any Multimodal Models
Models that can flexibly convert across modalities (text, image, audio, video). Ideal for researchers exploring unified multimodal-AI.
👨💻Mathematical Reasoning 🧮
Datasets tackling AI Toughest Challenges
🧩 Long-Context Models (≥128k) CODING
10 CODING models that support ≥128k context (native or via officially documented scaling)
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.39M • • 5.81k -
google/gemma-3-4b-it
Image-Text-to-Text • 4B • Updated • 2.2M • • 1.32k -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 2.71M • • 1.05k -
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Text Generation • 16B • Updated • 1.14M • • 595
🧩 Long-Context Models (≥128k) under 8B
Qwen3
Best of Qwen3 Series of Models
-
Qwen/Qwen3-30B-A3B-Instruct-2507
Text Generation • Updated • 1.19M • • 807 -
Qwen/Qwen3-Next-80B-A3B-Thinking
Text Generation • Updated • 32.9k • • 487 -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 2.71M • • 1.05k -
Qwen/Qwen3-Omni-30B-A3B-Instruct
Any-to-Any • 35B • Updated • 972k • 920
🛩️Qwen3-VL
the most powerful vision-language model in the Qwen series to date. Available in Dense and MoE architectures
-
Qwen/Qwen3-VL-30B-A3B-Thinking
Image-Text-to-Text • 31B • Updated • 42.6k • • 198 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit
Image-Text-to-Text • Updated • 406 • 7 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-8bit
Image-Text-to-Text • Updated • 135 • 3 -
mlx-community/Qwen3-VL-8B-Instruct-4bit
Image-Text-to-Text • Updated • 1.69k • 5
🍎 MLX-Quantized Models (3/4/5/6-bit) Mac & iOS
Curated MLX-ready quantized LLMs that run fast on Apple Silicon (and some on iOS). Every card lists Bits · Group size · Peak UM (GB) · Stable context.
-
mlx-community/Apriel-1.5-15b-Thinker-3bit-MLX
Image-Text-to-Text • Updated • 19 -
mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX
Image-Text-to-Text • Updated • 40 • 1 -
mlx-community/granite-4.0-h-tiny-3bit-MLX
Text Generation • 0.9B • Updated • 115 • 2 -
mlx-community/granite-4.0-tiny-preview-4bit
Text Generation • Updated • 9
🖼️ Vision Backbones & Image Embeddings
-
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 3.21M • 180 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 10.7M • 306 -
google/siglip-so400m-patch14-384
Zero-Shot Image Classification • 0.9B • Updated • 2.11M • 674 -
BAAI/EVA-CLIP-8B
Feature Extraction • Updated • 1.54k • 50
🧊Sept 25 <Image-to-3D> [Top Releases]
Models that turn a single image (or image+prompt) into 3D assets meshes, Gaussians, or point clouds suited for AR/VR, product turntables, game props.
🎬 ✍️ Sept 25 <Video & Text2Video> (Top Releases)
open T2V & animation models emphasizing temporal coherence, controllability, and real-time playback. Great starting point for creative tools, Ads.
Top Apache 2.0 License
Free and Open Source provided you don't source model and claim right
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 5.07M • • 5.66k -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 1.24M • 397 -
openai/whisper-small
Automatic Speech Recognition • Updated • 2.33M • 556 -
openai/whisper-tiny
Automatic Speech Recognition • Updated • 792k • 427
✍️➡️🎬 Text-to-Video
Models that create short videos from written prompts. Perfect for experimentation in generative video and creative storytelling.
🖌️ Image-to-Image
Image editing and transformation models :- from style transfer to super-resolution, inpainting, and diffusion-based edits.
🖼️➡️📚 Image-Text-to-Text
Multimodal models that take image + text as input and produce natural language output. Use cases: chart QA, visual document reasoning, VQA.
-
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 8.84M • • 1.52k -
Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text • 4B • Updated • 3.48M • 643 -
google/gemma-3-4b-it
Image-Text-to-Text • 4B • Updated • 2.2M • • 1.32k -
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
Image-Text-to-Text • Updated • 1.08M • 177
✍️ Text Generation
Collection of top open LLMs for writing, summarization, chat, reasoning, and document drafting. Includes small SLMs for devices and large models .
🧠General Purpose Dataset < 10M samples
Dataset that can 🌐chat, ⚡code and 🧮reasoning
🍎 MLX-Ready LLMs
MLX weights and proven for MLX inference
-
mlx-community/gpt-oss-20b-MXFP4-Q8
Text Generation • 21B • Updated • 547k • 57 -
lmstudio-community/Seed-OSS-36B-Instruct-MLX-4bit
Text Generation • 36B • Updated • 45.5k -
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit
Text Generation • 0.6B • Updated • 66.5k • 11 -
mlx-community/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • Updated • 349k • 40
📱 OnDevice -Ready SLMs (≤4B)
Tiny, fast models that run on iPhone/iPad or Mac with very low memory. Great for quick replies, offline note-assist, and routing
-
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit
Text Generation • 1B • Updated • 65.5k • 7 -
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit
Text Generation • 1B • Updated • 455k • 10 -
lmstudio-community/gemma-3n-E4B-it-MLX-4bit
Image-Text-to-Text • Updated • 159k • 2 -
mlx-community/gemma-3-4b-it-qat-4bit
Image-Text-to-Text • Updated • 243k • 8
GPT2-JungleBook-from-Scratch-Models
The primary objective of project is to explore & analyze the impact of model size on text generation quality with GPT-2 arch trained from scratch.
Vision-LM
🛩️Qwen3-VL
the most powerful vision-language model in the Qwen series to date. Available in Dense and MoE architectures
-
Qwen/Qwen3-VL-30B-A3B-Thinking
Image-Text-to-Text • 31B • Updated • 42.6k • • 198 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit
Image-Text-to-Text • Updated • 406 • 7 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-8bit
Image-Text-to-Text • Updated • 135 • 3 -
mlx-community/Qwen3-VL-8B-Instruct-4bit
Image-Text-to-Text • Updated • 1.69k • 5
<7B Best of MoE 🧠
Collection of Small size big impact MoE.
-
LiquidAI/LFM2-8B-A1B
Text Generation • 8B • Updated • 104k • 352 -
ibm-granite/granite-4.0-h-tiny
Text Generation • 7B • Updated • 80.1k • 201 -
microsoft/Phi-4-multimodal-instruct
Automatic Speech Recognition • 6B • Updated • 388k • 1.6k -
google/gemma-3n-E4B-it
Image-Text-to-Text • Updated • 42.4k • • 912
🍎 MLX-Quantized Models (3/4/5/6-bit) Mac & iOS
Curated MLX-ready quantized LLMs that run fast on Apple Silicon (and some on iOS). Every card lists Bits · Group size · Peak UM (GB) · Stable context.
-
mlx-community/Apriel-1.5-15b-Thinker-3bit-MLX
Image-Text-to-Text • Updated • 19 -
mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX
Image-Text-to-Text • Updated • 40 • 1 -
mlx-community/granite-4.0-h-tiny-3bit-MLX
Text Generation • 0.9B • Updated • 115 • 2 -
mlx-community/granite-4.0-tiny-preview-4bit
Text Generation • Updated • 9
Audio Features
🖼️ Vision Backbones & Image Embeddings
-
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 3.21M • 180 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 10.7M • 306 -
google/siglip-so400m-patch14-384
Zero-Shot Image Classification • 0.9B • Updated • 2.11M • 674 -
BAAI/EVA-CLIP-8B
Feature Extraction • Updated • 1.54k • 50
Feature Extraction with 🧠 Text Embeddings
models for turning text, images, audio (and combos) into useful vectors or feature maps. Ideal for search/RAG, clustering, recommendation, retrieval.
🧊Sept 25 <Image-to-3D> [Top Releases]
Models that turn a single image (or image+prompt) into 3D assets meshes, Gaussians, or point clouds suited for AR/VR, product turntables, game props.
🪶 Sept’25 <Text Generation Language Models >(Top Releases)
coding models and pipelines released this month that boost repo-level reasoning, GUI automation, and tool use. Focused on practical editing.
🎬 ✍️ Sept 25 <Video & Text2Video> (Top Releases)
open T2V & animation models emphasizing temporal coherence, controllability, and real-time playback. Great starting point for creative tools, Ads.
🖼️ **Text2Image, i2i ** September ’25 (Top Releases)
Cutting-edge image generation & VLM updates from September ’25. This collection spotlights models that improved text rendering, layout control & more.
Top Apache 2.0 License
Free and Open Source provided you don't source model and claim right
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 5.07M • • 5.66k -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 1.24M • 397 -
openai/whisper-small
Automatic Speech Recognition • Updated • 2.33M • 556 -
openai/whisper-tiny
Automatic Speech Recognition • Updated • 792k • 427
📄➡️🔊 Text-to-Speech (TTS)
Speech synthesis models that turn text into natural audio. Includes multilingual TTS, low-latency real-time models, and voice-cloning variants.
✍️➡️🎬 Text-to-Video
Models that create short videos from written prompts. Perfect for experimentation in generative video and creative storytelling.
📚➡️🎨Text-to-Image
State-of-the-art diffusion and generative models that turn text prompts into detailed images. Includes lightweight CPU-friendly and photorealistic mdl
🖌️ Image-to-Image
Image editing and transformation models :- from style transfer to super-resolution, inpainting, and diffusion-based edits.
🎨➡️✍️ Image-to-Text
OCR, captioning, and visual QA models that turn pure images into descriptive or structured text.
-
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2.39M • 852 -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 764k • 1.47k -
nlpconnect/vit-gpt2-image-captioning
Image-to-Text • Updated • 261k • 929 -
microsoft/trocr-base-handwritten
Image-to-Text • 0.3B • Updated • 166k • 494
🖼️➡️📚 Image-Text-to-Text
Multimodal models that take image + text as input and produce natural language output. Use cases: chart QA, visual document reasoning, VQA.
-
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 8.84M • • 1.52k -
Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text • 4B • Updated • 3.48M • 643 -
google/gemma-3-4b-it
Image-Text-to-Text • 4B • Updated • 2.2M • • 1.32k -
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
Image-Text-to-Text • Updated • 1.08M • 177
🌀 Any-to-Any Multimodal Models
Models that can flexibly convert across modalities (text, image, audio, video). Ideal for researchers exploring unified multimodal-AI.
✍️ Text Generation
Collection of top open LLMs for writing, summarization, chat, reasoning, and document drafting. Includes small SLMs for devices and large models .
👨💻Mathematical Reasoning 🧮
Datasets tackling AI Toughest Challenges
🧠General Purpose Dataset < 10M samples
Dataset that can 🌐chat, ⚡code and 🧮reasoning
🧩 Long-Context Models (≥128k) CODING
10 CODING models that support ≥128k context (native or via officially documented scaling)
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.39M • • 5.81k -
google/gemma-3-4b-it
Image-Text-to-Text • 4B • Updated • 2.2M • • 1.32k -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 2.71M • • 1.05k -
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Text Generation • 16B • Updated • 1.14M • • 595
🍎 MLX-Ready LLMs
MLX weights and proven for MLX inference
-
mlx-community/gpt-oss-20b-MXFP4-Q8
Text Generation • 21B • Updated • 547k • 57 -
lmstudio-community/Seed-OSS-36B-Instruct-MLX-4bit
Text Generation • 36B • Updated • 45.5k -
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit
Text Generation • 0.6B • Updated • 66.5k • 11 -
mlx-community/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • Updated • 349k • 40
🧩 Long-Context Models (≥128k) under 8B
📱 OnDevice -Ready SLMs (≤4B)
Tiny, fast models that run on iPhone/iPad or Mac with very low memory. Great for quick replies, offline note-assist, and routing
-
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit
Text Generation • 1B • Updated • 65.5k • 7 -
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit
Text Generation • 1B • Updated • 455k • 10 -
lmstudio-community/gemma-3n-E4B-it-MLX-4bit
Image-Text-to-Text • Updated • 159k • 2 -
mlx-community/gemma-3-4b-it-qat-4bit
Image-Text-to-Text • Updated • 243k • 8
Qwen3
Best of Qwen3 Series of Models
-
Qwen/Qwen3-30B-A3B-Instruct-2507
Text Generation • Updated • 1.19M • • 807 -
Qwen/Qwen3-Next-80B-A3B-Thinking
Text Generation • Updated • 32.9k • • 487 -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 2.71M • • 1.05k -
Qwen/Qwen3-Omni-30B-A3B-Instruct
Any-to-Any • 35B • Updated • 972k • 920
GPT2-JungleBook-from-Scratch-Models
The primary objective of project is to explore & analyze the impact of model size on text generation quality with GPT-2 arch trained from scratch.