--- language: - en - hi - ja - ko - ar - es - vi - th - zh - multilingual tags: - food - embedding - menu - restaurant - deduplication - sentence-similarity - search - classification - food-delivery - menu-intelligence library_name: sentence-transformers pipeline_tag: sentence-similarity license: other license_name: commercial license_link: https://embed.statode.com --- # dish-embed A domain-specialized food embedding model built for menu intelligence at scale. Designed for food delivery platforms, cloud kitchen operators, and restaurant aggregators worldwide. ## What It Does dish-embed turns menu item text into dense vector representations optimized for food-specific tasks: - **Menu deduplication** -- identify the same dish across different restaurants ("Murgh Makhani" = "Butter Chicken", but "Butter Chicken" ≠ "Butter Naan") - **Food search** -- rank menu items by relevance to natural language queries, including noisy/misspelled input - **Cuisine classification** -- classify items into 19 cuisine categories from embeddings alone - **Synonym retrieval** -- find equivalent dish names across naming conventions and regional variants - **Cross-restaurant price comparison** -- match identical items to compare pricing - **Cart recommendations** -- suggest complementary items based on what's already in the cart ## Benchmark Results Evaluated at 384 dimensions against general-purpose embedding models on food-domain benchmarks. | Benchmark | dish-embed | OpenAI TE3L | BAAI/bge-m3 | e5-large | |---|:-:|:-:|:-:|:-:| | Menu Dedup (Global) F1 | **0.781** | 0.675 | 0.563 | 0.696 | | Menu Dedup (Indian) F1 | **0.899** | 0.711 | 0.628 | 0.655 | | Cuisine Classification | **0.889** | 0.822 | 0.762 | 0.298 | | Synonym Retrieval R@5 | **0.808** | 0.749 | 0.707 | 0.661 | | Food Search NDCG@10 | **0.943** | 0.936 | 0.925 | 0.933 | | Noisy Query Search NDCG@10 | **0.920** | 0.890 | 0.907 | 0.865 | | Concept Search NDCG@10 | 0.828 | **0.849** | 0.754 | 0.782 | | Regional Variants R@1 | **0.909** | **0.909** | 0.814 | 0.793 | Full interactive benchmark report: [dish-embed Benchmarks](https://huggingface.co/spaces/adityapatni/dish-embed-benchmarks) ## Coverage - 20+ cuisines: Indian (North/South), Chinese, Japanese, Korean, Thai, Vietnamese, Italian, Mexican, American, Middle Eastern, Mediterranean, and more - Multilingual support: English, Hindi, Japanese, Korean, Arabic, Spanish, Vietnamese, Thai, Chinese - Handles real-world menu noise: pricing text, promotional tags, piece counts, abbreviations, misspellings ## Access dish-embed is available as a hosted API. No model download required. - **API**: [embed.statode.com](https://embed.statode.com) - **Documentation**: [embed.statode.com/docs](https://embed.statode.com/docs) - **Contact**: aditya@statode.com ### API Endpoints | Endpoint | What It Does | |---|---| | `/embed` | Get embeddings for menu items | | `/embed/batch` | Batch embed up to 5,000 items | | `/match` | Check if two items are the same dish | | `/search` | Semantic search across a menu corpus | | `/dedup` | Deduplicate a list of menu items | | `/classify` | Classify items into cuisine categories | | `/report` | Full menu health report (duplicates, categories, insights) | | `/suggest` | Cart-based item recommendations | ## Use Cases **Food delivery platforms**: Deduplicate menus across partner restaurants to build a unified catalog. Power semantic search so customers find what they want even with typos or informal queries. **Cloud kitchen operators**: Compare pricing for identical items across locations. Identify menu gaps and category distribution. **Restaurant aggregators**: Classify menu items by cuisine for filtering and discovery. Generate menu health reports for onboarding QA. **Menu analytics**: Understand what items overlap across competitors, track pricing trends for equivalent dishes, identify underserved categories. ## Technical Details - Embedding dimension: 1024 native, served at 384 (Matryoshka) - Multilingual: 100+ languages supported at the base level, fine-tuned on food vocabulary across 9 major scripts - Preprocessing: Built-in noise stripping and food-term normalization applied server-side - Latency: Sub-second for single items, batch-optimized for bulk operations - Infrastructure: Self-hosted, no data leaves the server ## License dish-embed is a commercial product. The model weights are not publicly available. Access is provided through the hosted API. For licensing inquiries, partnership, or enterprise access, contact [aditya@statode.com](mailto:aditya@statode.com).