DataSapien Lab Report: What’s the Best Local LLM?

Community Article

Published March 6, 2026

Upvote

Arda Dogantemur

Liandas

DataSapien

Last week, we were asked “What’s the best local LLM that you are using?” and it’s a good question.

And here’s the thing: there is no single “best” local LLM. The answer is always contextual, shaped by the tension between what’s theoretically possible and what actually works in the real world. A powerful model means nothing if it drains battery life, takes minutes to respond, or requires users to download gigabytes of data over patchy mobile networks on a five-year-old device.

Why “Best” Is the Wrong Question

When evaluating local LLMs, “best” depends entirely on your constraints. What’s your audiences available RAM? How much can users realistically download? What latency can your use case tolerate? And critically, what are the battery and thermal limits of the devices you’re targeting?

A flagship phone with 12GB RAM and unlimited Wi-Fi can handle an 8GB model. But most real-world deployments can’t assume those conditions. The right model isn’t the most impressive one, it’s the one that reliably completes the task within your specific constraints.

Our Three-Tier Approach

At DataSapien, we’ve tested dozens of local LLMs (also called Small Language Models, SLMs and EdgeAI) across real production use cases. Here’s what we’ve learned works:

For high-quality reasoning, we use Gemma-3n-e4b-it (Q4_K_M). When complex analysis matters, like our YouTube Persona journey that requires genuine understanding and nuanced reasoning, this model delivers strong instruction following and thoughtful outputs. We've also explored this with multimodal models running directly on-device.

For fast, efficient inference, Qwen 2.5 SLM is our workhorse. It powers our Happier journey summarization and dynamic screen generation, delivering excellent performance at a fraction of the size (download the SandboxApp to play with this). When you need real-time responses without sacrificing quality, this is where we start.

For ultra-lightweight tasks, Gemma 3 270M (Q8_0) surprises everyone. Classification, structured data extraction, and straightforward summarization – all with minimal resource usage. Sometimes the smallest model is exactly what you need.

The principle? Start small and then scale when quality demands it. These aren’t theoretical choices pulled from benchmarks; they’re battle-tested in production, delivering private personalisation to real users on real devices.

This is an evolving approach, and we welcome your ideas, thoughts, and suggestions about improving it.

How We Actually Select Models

Our selection methodology is straightforward: we start from the task itself. Is it classification? Summarization? Complex reasoning? Then we test the smallest model that could plausibly handle it. If quality isn’t sufficient, we step up incrementally to the next size. The goal is always to successfully complete the task with the lowest possible model size.

Yes, 8GB flagship models exist and they’re genuinely powerful. But they’re rarely the right answer for real-world deployment where users expect instant responses, reasonable battery life, and apps that don’t monopolize their device’s resources.

The Real Paradigm Shift

Here’s what we’ve learned: on-device AI isn’t about cramming the biggest model onto a phone. It’s about intelligent orchestration: matching the right model to the right task for the right audience. Model fit, task fit, and audience fit working together.

This pragmatic, not dogmatic, approach is how we deliver upto 44X engagement improvements to app users. Private personalisation requires exactly this kind of thoughtful engineering: respecting device constraints while delivering genuine intelligence that serves users without compromising their privacy.

The best local LLM? It’s not a model. It’s an orchestration strategy.

We’d love to hear your experience is with local LLMs. Have you found models or approaches that work particularly well for your use cases?

Source

Originally published on the DataSapien blog:
https://datasapien.com/datasapien-lab-reprwhats-the-best-local-llm/

Pocket Models for iOS: Explore On-Device AI with GGUF Models, Data Memory, and Journeys

March 18, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote