dish-embed

A domain-specialized food embedding model built for menu intelligence at scale. Designed for food delivery platforms, cloud kitchen operators, and restaurant aggregators worldwide.

What It Does

dish-embed turns menu item text into dense vector representations optimized for food-specific tasks:

Menu deduplication -- identify the same dish across different restaurants ("Murgh Makhani" = "Butter Chicken", but "Butter Chicken" ≠ "Butter Naan")
Food search -- rank menu items by relevance to natural language queries, including noisy/misspelled input
Cuisine classification -- classify items into 19 cuisine categories from embeddings alone
Synonym retrieval -- find equivalent dish names across naming conventions and regional variants
Cross-restaurant price comparison -- match identical items to compare pricing
Cart recommendations -- suggest complementary items based on what's already in the cart

Benchmark Results

Evaluated at 384 dimensions against general-purpose embedding models on food-domain benchmarks.

Benchmark	dish-embed	OpenAI TE3L	BAAI/bge-m3	e5-large
Menu Dedup (Global) F1	0.781	0.675	0.563	0.696
Menu Dedup (Indian) F1	0.899	0.711	0.628	0.655
Cuisine Classification	0.889	0.822	0.762	0.298
Synonym Retrieval R@5	0.808	0.749	0.707	0.661
Food Search NDCG@10	0.943	0.936	0.925	0.933
Noisy Query Search NDCG@10	0.920	0.890	0.907	0.865
Concept Search NDCG@10	0.828	0.849	0.754	0.782
Regional Variants R@1	0.909	0.909	0.814	0.793

Full interactive benchmark report: dish-embed Benchmarks

Coverage

20+ cuisines: Indian (North/South), Chinese, Japanese, Korean, Thai, Vietnamese, Italian, Mexican, American, Middle Eastern, Mediterranean, and more
Multilingual support: English, Hindi, Japanese, Korean, Arabic, Spanish, Vietnamese, Thai, Chinese
Handles real-world menu noise: pricing text, promotional tags, piece counts, abbreviations, misspellings

Access

dish-embed is available as a hosted API. No model download required.

API: embed.statode.com
Documentation: embed.statode.com/docs
Contact: aditya@statode.com

API Endpoints

Endpoint	What It Does
`/embed`	Get embeddings for menu items
`/embed/batch`	Batch embed up to 5,000 items
`/match`	Check if two items are the same dish
`/search`	Semantic search across a menu corpus
`/dedup`	Deduplicate a list of menu items
`/classify`	Classify items into cuisine categories
`/report`	Full menu health report (duplicates, categories, insights)
`/suggest`	Cart-based item recommendations

Use Cases

Food delivery platforms: Deduplicate menus across partner restaurants to build a unified catalog. Power semantic search so customers find what they want even with typos or informal queries.

Cloud kitchen operators: Compare pricing for identical items across locations. Identify menu gaps and category distribution.

Restaurant aggregators: Classify menu items by cuisine for filtering and discovery. Generate menu health reports for onboarding QA.

Menu analytics: Understand what items overlap across competitors, track pricing trends for equivalent dishes, identify underserved categories.

Technical Details

Embedding dimension: 1024 native, served at 384 (Matryoshka)
Multilingual: 100+ languages supported at the base level, fine-tuned on food vocabulary across 9 major scripts
Preprocessing: Built-in noise stripping and food-term normalization applied server-side
Latency: Sub-second for single items, batch-optimized for bulk operations
Infrastructure: Self-hosted, no data leaves the server

License

dish-embed is a commercial product. The model weights are not publicly available. Access is provided through the hosted API.

For licensing inquiries, partnership, or enterprise access, contact aditya@statode.com.

Downloads last month: -; Downloads are not tracked for this model. How to track