dish-embed

A domain-specialized food embedding model built for menu intelligence at scale. Designed for food delivery platforms, cloud kitchen operators, and restaurant aggregators worldwide.

What It Does

dish-embed turns menu item text into dense vector representations optimized for food-specific tasks:

  • Menu deduplication -- identify the same dish across different restaurants ("Murgh Makhani" = "Butter Chicken", but "Butter Chicken" ≠ "Butter Naan")
  • Food search -- rank menu items by relevance to natural language queries, including noisy/misspelled input
  • Cuisine classification -- classify items into 19 cuisine categories from embeddings alone
  • Synonym retrieval -- find equivalent dish names across naming conventions and regional variants
  • Cross-restaurant price comparison -- match identical items to compare pricing
  • Cart recommendations -- suggest complementary items based on what's already in the cart

Benchmark Results

Evaluated at 384 dimensions against general-purpose embedding models on food-domain benchmarks.

Benchmark dish-embed OpenAI TE3L BAAI/bge-m3 e5-large
Menu Dedup (Global) F1 0.781 0.675 0.563 0.696
Menu Dedup (Indian) F1 0.899 0.711 0.628 0.655
Cuisine Classification 0.889 0.822 0.762 0.298
Synonym Retrieval R@5 0.808 0.749 0.707 0.661
Food Search NDCG@10 0.943 0.936 0.925 0.933
Noisy Query Search NDCG@10 0.920 0.890 0.907 0.865
Concept Search NDCG@10 0.828 0.849 0.754 0.782
Regional Variants R@1 0.909 0.909 0.814 0.793

Full interactive benchmark report: dish-embed Benchmarks

Coverage

  • 20+ cuisines: Indian (North/South), Chinese, Japanese, Korean, Thai, Vietnamese, Italian, Mexican, American, Middle Eastern, Mediterranean, and more
  • Multilingual support: English, Hindi, Japanese, Korean, Arabic, Spanish, Vietnamese, Thai, Chinese
  • Handles real-world menu noise: pricing text, promotional tags, piece counts, abbreviations, misspellings

Access

dish-embed is available as a hosted API. No model download required.

API Endpoints

Endpoint What It Does
/embed Get embeddings for menu items
/embed/batch Batch embed up to 5,000 items
/match Check if two items are the same dish
/search Semantic search across a menu corpus
/dedup Deduplicate a list of menu items
/classify Classify items into cuisine categories
/report Full menu health report (duplicates, categories, insights)
/suggest Cart-based item recommendations

Use Cases

Food delivery platforms: Deduplicate menus across partner restaurants to build a unified catalog. Power semantic search so customers find what they want even with typos or informal queries.

Cloud kitchen operators: Compare pricing for identical items across locations. Identify menu gaps and category distribution.

Restaurant aggregators: Classify menu items by cuisine for filtering and discovery. Generate menu health reports for onboarding QA.

Menu analytics: Understand what items overlap across competitors, track pricing trends for equivalent dishes, identify underserved categories.

Technical Details

  • Embedding dimension: 1024 native, served at 384 (Matryoshka)
  • Multilingual: 100+ languages supported at the base level, fine-tuned on food vocabulary across 9 major scripts
  • Preprocessing: Built-in noise stripping and food-term normalization applied server-side
  • Latency: Sub-second for single items, batch-optimized for bulk operations
  • Infrastructure: Self-hosted, no data leaves the server

License

dish-embed is a commercial product. The model weights are not publicly available. Access is provided through the hosted API.

For licensing inquiries, partnership, or enterprise access, contact aditya@statode.com.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support