Emo — on-device emoji suggestions from text

Takes a short text string and returns the best-matching emoji. Tuned for to-dos, calendar entries, notes, and message drafts across 23 languages (including CJK, Arabic, Thai, Hindi, and more). The whole thing — model and tokenizer — is about 5 MB and runs in well under 2 ms on device.

"Dentist appointment" → 🦷 · "réserver un vol pour Tokyo" → ✈️ · "犬の散歩" → 🐕 · "จองโรงแรม" → 🏨

Try it

Live demo: desert-ant-labs/emo-demo — type a phrase, get emojis, fully in your browser.
iOS / macOS: emo-swift — the Swift SDK with a built-in demo app.
Android / Kotlin / JVM: emo-kotlin — the Kotlin SDK (via JitPack), with an Android demo app.
JavaScript / TypeScript: emo-js — the npm package (Node + browser).

Files

File	Format	Size	Contents
`Emo.mlmodelc`	Compiled Core ML	~4.2 MB	4-bit-palettized model, ready to load on Apple platforms
`emo_tokenizer.bin`	Pruned unigram tokenizer	~0.75 MB	48k SentencePiece pieces + scores; token ids = semantic-table rows
`emo_meta.json`	JSON	tiny	emoji labels + n-gram hashing config the runtime needs
`emo.pt`	PyTorch checkpoint	~40 MB	Full-precision weights + semantic table + tokenizer (for retraining / other runtimes)

Architecture

A compact two-stream classifier — no transformer, no large encoder:

Lexical stream — script-aware character/word n-grams (Latin, Han·Kana, Hangul jamo, Devanagari clusters, SE-Asian, …) hashed into a fixed multi-hash signed embedding table. Its size is independent of the number of languages.
Semantic stream — a frozen multilingual static embedding (Model2Vec potion-multilingual-128M, distilled from BAAI bge-m3), PCA-reduced to 128 dims and vocab-pruned to the 48k tokens that matter for the 22 target languages. Gives cross-lingual generalization and handles out-of-vocabulary words. The matching ~0.75 MB unigram tokenizer ships alongside (emo_tokenizer.bin).
Head — a small MLP fusing the two streams into a softmax over a data-driven vocabulary of ~300 emojis (the emojis that actually come up most across the training phrases). Trained with n-gram dropout so the head relies on the semantic stream, which is what makes it generalize across languages.

Inputs and outputs

Input: a plain text string. Best on short, intent-oriented text.
Output: a probability distribution over the ~300-emoji vocabulary; take the top-1 (or top-k). Optimized for top-1 relevance.

Languages

English, Spanish, Portuguese, French, German, Italian, Dutch, Russian, Polish, Turkish, Arabic, Chinese (Simplified & Traditional), Japanese, Korean, Hindi, Indonesian, Thai, Vietnamese, Ukrainian, Swedish, Danish, Czech.

Limitations

Tuned for short, intent-oriented text; long-form text produces noisier suggestions.
Emoji semantics are imprecise; near-ties at the top of the ranking are expected.
Per-language quality varies; lower-resource languages in the set are somewhat weaker.

Built on

minishlab/potion-multilingual-128M — MIT — semantic embedding stream (PCA-reduced, vocab-pruned derivative) + tokenizer lineage.
BAAI/bge-m3 — MIT — teacher the static embedding was distilled from.
Model2Vec — MIT — static-embedding distillation method.
Unicode CLDR emoji annotations — multilingual keyword grounding in the training data.

See THIRD_PARTY_NOTICES.md.

License

Released under the Desert Ant Labs Source-Available License v1.0 (see LICENSE.md).

Free for commercial use up to 100,000 Monthly Active Users (MAU).
Above 100,000 MAU a commercial license is required. Contact licensing@desertant.ai.

Citation

@software{emo_2026,
  title  = {Emo: on-device emoji suggestions from text},
  author = {Desert Ant Labs},
  year   = {2026},
  url    = {https://huggingface.co/desert-ant-labs/emo},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

desert-ant-labs
/

emo