Universal Recipe Data Structure
This document defines a simple, universal data structure for recipe storage that works efficiently with both ChromaDB and MongoDB Atlas for ingredient-based recipe recommendations.
Core Principles
- Ingredient-focused: Primary search is by ingredients
- Universal compatibility: Same structure works for ChromaDB and MongoDB
- Simple and clean: Easy to understand and maintain
- Efficient retrieval: Optimized for RAG performance
Universal Recipe Structure
Required Fields
{
"title": "String - Recipe name",
"ingredients": ["Array of strings - Individual ingredients"],
"instructions": "String - Step-by-step cooking instructions",
"metadata": {
"cook_time": "String - Optional cooking time",
"difficulty": "String - Optional difficulty level",
"servings": "String - Optional number of servings",
"category": "String - Optional recipe category",
"image_url": "String - Optional recipe image URL"
}
}
Example Document
{
"title": "Mixed Seafood Coconut Fried Rice",
"ingredients": [
"jasmine rice",
"cooked shrimp",
"prawns",
"scallops",
"coconut milk",
"fish sauce",
"soy sauce",
"garlic",
"onion",
"ginger",
"green onions",
"cilantro",
"lime",
"vegetable oil",
"salt",
"pepper"
],
"instructions": "1. Heat vegetable oil in large pan. 2. Add garlic, onion, ginger and stir-fry until fragrant. 3. Add cooked rice and mix well. 4. Add seafood and cook until heated through. 5. Pour in coconut milk and season with fish sauce and soy sauce. 6. Garnish with green onions and cilantro. 7. Serve with lime wedges.",
"metadata": {
"cook_time": "25 minutes",
"difficulty": "medium",
"servings": "4",
"category": "seafood",
"image_url": "https://example.com/images/mixed-seafood-coconut-fried-rice.jpg"
}
}
Key Features
1. Clean Ingredients Format
- Array structure: Each ingredient as separate string
- Individual embedding: Each ingredient can be embedded separately
- Easy matching: Simple array operations for ingredient search
- No duplicates: Each ingredient appears once in the array
2. Universal Compatibility
- ChromaDB: Automatically creates embeddings from full document
- MongoDB Atlas: Can use pre-computed embeddings or text search
- Same structure: No provider-specific modifications needed
3. Efficient Search Patterns
Primary: Ingredient-based Search
User: "I have shrimp, rice, and coconut milk"
Search: ingredients array for ["shrimp", "rice", "coconut"]
Result: Mixed Seafood Coconut Fried Rice (high relevance)
Secondary: Title-based Search
User: "How to make fried rice"
Search: title field for "fried rice"
Result: All fried rice recipes
Fallback: Full-text Search
User: "Quick dinner recipes"
Search: Full document for "quick dinner"
Result: Recipes mentioning quick preparation
Implementation Guidelines
For ChromaDB
# Documents are automatically embedded as full text
ingredients_text = ", ".join(recipe['ingredients'])
document = Document(
page_content=f"Title: {recipe['title']}. Ingredients: {ingredients_text}. Instructions: {recipe['instructions']}",
metadata=recipe['metadata']
)
For MongoDB Atlas
# Can use array search or vector search on the same structure
# Array search on ingredients
{"ingredients": {"$in": user_ingredients_list}}
# Or vector search if embeddings are pre-computed
{"ingredients_vector": {"$near": query_embedding}}
Data Preparation
Ingredient Processing Rules
- Clean individual items: "2 cups rice" → "rice"
- Remove measurements: "1 lb chicken breast" → "chicken breast"
- Lowercase: "Fresh Basil" → "fresh basil"
- Array format: ["rice", "chicken breast", "fresh basil"]
- No duplicates: Remove duplicate ingredients from array
Example Transformation
Raw: "2 lbs fresh shrimp, 1 cup jasmine rice (cooked), 1/2 cup coconut milk"
Clean: ["fresh shrimp", "jasmine rice", "coconut milk"]
Benefits
1. Simplicity
- Single structure for all providers
- Easy to understand and maintain
- No complex transformations needed
2. Performance
- Optimized for ingredient matching
- Fast text and vector search
- Minimal processing overhead
3. Flexibility
- Works with existing MongoDB data
- Compatible with ChromaDB auto-embedding
- Supports both search types (text/vector)
4. Scalability
- Easy to add new recipes
- Simple data validation
- Consistent across providers
This universal structure ensures maximum compatibility and efficiency for ingredient-based recipe recommendations across all vector store providers.