dish-embed
A domain-specialized food embedding model built for menu intelligence at scale. Designed for food delivery platforms, cloud kitchen operators, and restaurant aggregators worldwide.
What It Does
dish-embed turns menu item text into dense vector representations optimized for food-specific tasks:
- Menu deduplication -- identify the same dish across different restaurants ("Murgh Makhani" = "Butter Chicken", but "Butter Chicken" ≠ "Butter Naan")
- Food search -- rank menu items by relevance to natural language queries, including noisy/misspelled input
- Cuisine classification -- classify items into 19 cuisine categories from embeddings alone
- Synonym retrieval -- find equivalent dish names across naming conventions and regional variants
- Cross-restaurant price comparison -- match identical items to compare pricing
- Cart recommendations -- suggest complementary items based on what's already in the cart
Benchmark Results
Evaluated at 384 dimensions against general-purpose embedding models on food-domain benchmarks.
| Benchmark | dish-embed | OpenAI TE3L | BAAI/bge-m3 | e5-large |
|---|---|---|---|---|
| Menu Dedup (Global) F1 | 0.781 | 0.675 | 0.563 | 0.696 |
| Menu Dedup (Indian) F1 | 0.899 | 0.711 | 0.628 | 0.655 |
| Cuisine Classification | 0.889 | 0.822 | 0.762 | 0.298 |
| Synonym Retrieval R@5 | 0.808 | 0.749 | 0.707 | 0.661 |
| Food Search NDCG@10 | 0.943 | 0.936 | 0.925 | 0.933 |
| Noisy Query Search NDCG@10 | 0.920 | 0.890 | 0.907 | 0.865 |
| Concept Search NDCG@10 | 0.828 | 0.849 | 0.754 | 0.782 |
| Regional Variants R@1 | 0.909 | 0.909 | 0.814 | 0.793 |
Full interactive benchmark report: dish-embed Benchmarks
Coverage
- 20+ cuisines: Indian (North/South), Chinese, Japanese, Korean, Thai, Vietnamese, Italian, Mexican, American, Middle Eastern, Mediterranean, and more
- Multilingual support: English, Hindi, Japanese, Korean, Arabic, Spanish, Vietnamese, Thai, Chinese
- Handles real-world menu noise: pricing text, promotional tags, piece counts, abbreviations, misspellings
Access
dish-embed is available as a hosted API. No model download required.
- API: embed.statode.com
- Documentation: embed.statode.com/docs
- Contact: aditya@statode.com
API Endpoints
| Endpoint | What It Does |
|---|---|
/embed |
Get embeddings for menu items |
/embed/batch |
Batch embed up to 5,000 items |
/match |
Check if two items are the same dish |
/search |
Semantic search across a menu corpus |
/dedup |
Deduplicate a list of menu items |
/classify |
Classify items into cuisine categories |
/report |
Full menu health report (duplicates, categories, insights) |
/suggest |
Cart-based item recommendations |
Use Cases
Food delivery platforms: Deduplicate menus across partner restaurants to build a unified catalog. Power semantic search so customers find what they want even with typos or informal queries.
Cloud kitchen operators: Compare pricing for identical items across locations. Identify menu gaps and category distribution.
Restaurant aggregators: Classify menu items by cuisine for filtering and discovery. Generate menu health reports for onboarding QA.
Menu analytics: Understand what items overlap across competitors, track pricing trends for equivalent dishes, identify underserved categories.
Technical Details
- Embedding dimension: 1024 native, served at 384 (Matryoshka)
- Multilingual: 100+ languages supported at the base level, fine-tuned on food vocabulary across 9 major scripts
- Preprocessing: Built-in noise stripping and food-term normalization applied server-side
- Latency: Sub-second for single items, batch-optimized for bulk operations
- Infrastructure: Self-hosted, no data leaves the server
License
dish-embed is a commercial product. The model weights are not publicly available. Access is provided through the hosted API.
For licensing inquiries, partnership, or enterprise access, contact aditya@statode.com.