File size: 4,806 Bytes
c59d808
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
# Universal Recipe Data Structure

This document defines a simple, universal data structure for recipe storage that works efficiently with both ChromaDB and MongoDB Atlas for ingredient-based recipe recommendations.

## Core Principles

1. **Ingredient-focused**: Primary search is by ingredients
2. **Universal compatibility**: Same structure works for ChromaDB and MongoDB
3. **Simple and clean**: Easy to understand and maintain
4. **Efficient retrieval**: Optimized for RAG performance

## Universal Recipe Structure

### Required Fields

```json
{
  "title": "String - Recipe name",
  "ingredients": ["Array of strings - Individual ingredients"], 
  "instructions": "String - Step-by-step cooking instructions",
  "metadata": {
    "cook_time": "String - Optional cooking time",
    "difficulty": "String - Optional difficulty level",
    "servings": "String - Optional number of servings",
    "category": "String - Optional recipe category",
    "image_url": "String - Optional recipe image URL"
  }
}
```

### Example Document

```json
{
  "title": "Mixed Seafood Coconut Fried Rice",
  "ingredients": [
    "jasmine rice",
    "cooked shrimp", 
    "prawns",
    "scallops",
    "coconut milk",
    "fish sauce",
    "soy sauce",
    "garlic",
    "onion",
    "ginger",
    "green onions",
    "cilantro",
    "lime",
    "vegetable oil",
    "salt",
    "pepper"
  ],
  "instructions": "1. Heat vegetable oil in large pan. 2. Add garlic, onion, ginger and stir-fry until fragrant. 3. Add cooked rice and mix well. 4. Add seafood and cook until heated through. 5. Pour in coconut milk and season with fish sauce and soy sauce. 6. Garnish with green onions and cilantro. 7. Serve with lime wedges.",
  "metadata": {
    "cook_time": "25 minutes",
    "difficulty": "medium", 
    "servings": "4",
    "category": "seafood",
    "image_url": "https://example.com/images/mixed-seafood-coconut-fried-rice.jpg"
  }
}
```

## Key Features

### 1. Clean Ingredients Format
- **Array structure**: Each ingredient as separate string
- **Individual embedding**: Each ingredient can be embedded separately
- **Easy matching**: Simple array operations for ingredient search
- **No duplicates**: Each ingredient appears once in the array

### 2. Universal Compatibility
- **ChromaDB**: Automatically creates embeddings from full document
- **MongoDB Atlas**: Can use pre-computed embeddings or text search
- **Same structure**: No provider-specific modifications needed

### 3. Efficient Search Patterns

#### Primary: Ingredient-based Search
```
User: "I have shrimp, rice, and coconut milk"
Search: ingredients array for ["shrimp", "rice", "coconut"]
Result: Mixed Seafood Coconut Fried Rice (high relevance)
```

#### Secondary: Title-based Search  
```
User: "How to make fried rice"
Search: title field for "fried rice"
Result: All fried rice recipes
```

#### Fallback: Full-text Search
```
User: "Quick dinner recipes"
Search: Full document for "quick dinner"
Result: Recipes mentioning quick preparation
```

## Implementation Guidelines

### For ChromaDB
```python
# Documents are automatically embedded as full text
ingredients_text = ", ".join(recipe['ingredients'])
document = Document(
    page_content=f"Title: {recipe['title']}. Ingredients: {ingredients_text}. Instructions: {recipe['instructions']}",
    metadata=recipe['metadata']
)
```

### For MongoDB Atlas
```python
# Can use array search or vector search on the same structure
# Array search on ingredients
{"ingredients": {"$in": user_ingredients_list}}

# Or vector search if embeddings are pre-computed
{"ingredients_vector": {"$near": query_embedding}}
```

## Data Preparation

### Ingredient Processing Rules
1. **Clean individual items**: "2 cups rice" → "rice"  
2. **Remove measurements**: "1 lb chicken breast" → "chicken breast"
3. **Lowercase**: "Fresh Basil" → "fresh basil"
4. **Array format**: ["rice", "chicken breast", "fresh basil"]
5. **No duplicates**: Remove duplicate ingredients from array

### Example Transformation
```
Raw: "2 lbs fresh shrimp, 1 cup jasmine rice (cooked), 1/2 cup coconut milk"
Clean: ["fresh shrimp", "jasmine rice", "coconut milk"]
```

## Benefits

### 1. Simplicity
- Single structure for all providers
- Easy to understand and maintain
- No complex transformations needed

### 2. Performance  
- Optimized for ingredient matching
- Fast text and vector search
- Minimal processing overhead

### 3. Flexibility
- Works with existing MongoDB data
- Compatible with ChromaDB auto-embedding
- Supports both search types (text/vector)

### 4. Scalability
- Easy to add new recipes
- Simple data validation
- Consistent across providers

This universal structure ensures maximum compatibility and efficiency for ingredient-based recipe recommendations across all vector store providers.