Feedback from a Power User: The Missing 12B Model & The Need for a Dedicated Mobile Engine.

#31
by deleted - opened

Hello Qwen Team,

I am a dedicated fan of Qwen, currently using version 3.5-9B on an iPad Air M4 with 12GB RAM. I am writing to share critical feedback regarding a growing product gap and a strategic opportunity you are missing.

I want to emphasize that these are not theoretical observations. I have been systematically testing offline AI models on iPad Air M4 for several months, comparing quantizations, temperatures, and context settings across multiple models. My feedback is based on direct, hands-on experience โ€” not benchmarks.

  1. The Hardware Gap (7B vs. 14B)
    There is a significant void in your lineup for mobile devices with 12โ€“16 GB of RAM. The jump from 7B to 14B is too steep; the 14B model often suffocates system resources on these devices, while the 7B lacks depth. Competitors have already seized this opportunity: Mistral Nemo 12B and the newly released Gemma 4-12B fit perfectly in this slot. It is disappointing that there is no Qwen 3.6 or 3.7 in a 12B variant to compete in this "goldilocks" zone.

  2. The Case for Specialized Models
    Current models try to be everything at once, resulting in enormous hardware requirements. Why not create a range of specialized, smaller models focused on specific domains like medicine, law, mathematics, or coding? A specialized 11โ€“12B model trained deeply on legal or medical data would outperform a generic 70B model in those fields and could run locally on laptops and tablets. This approach respects the user's hardware limits while delivering expert-level performance.

  3. The Missing Link: A Dedicated "Qwen Engine"
    Running models locally still requires too much manual configuration (quantization, context length, GPU layers). When users run models via apps like Local AI on iOS, they rely on MLX-optimized builds that just work.
    Proposal: Qwen should develop its own dedicated inference engine (app/framework) for iOS and Android.
    Auto-Optimization: The engine should automatically detect hardware (e.g., M4 iPad with 12GB RAM) and load the optimal model version and settings without user intervention.
    Stability: It should manage memory dynamically to prevent crashes during long sessions.
    Most users want to "install and think," not spend weeks tweaking parameters. Winning the mobile market isn't just about having the smartest model; it's about providing the smoothest, most professional experience out of the box.

Summary of Recommendations:
Release Qwen 3.7-12B: A flagship model for edge devices that balances performance and memory usage perfectly.
Develop Specialized Variants: Create domain-specific models (Law, Med, Code) in the 10โ€“12B range.
Build a Native Engine: Launch an official app that automates optimization for iOS/Android, removing the barrier to entry for non-technical users.

A Final Note on Quality:
Despite the competition, I want to share something you might not hear elsewhere: In my direct testing, Gemma 4-12B has not yet surpassed Qwen 9B. Gemma is catching up, but Qwen still holds the lead in reasoning quality and nuance. Users prefer Qwen's intelligence, but they are being forced to switch to Mistral or Gemma simply because those models fit their hardware better.

Don't let hardware limitations drive your best users away. Give us the latest Qwen intelligence in the perfect size, wrapped in an engine that just works.

Best regards,
A loyal Qwen user & Ambassador

Sign up or log in to comment