Ahmed-Alghamdi commited on
Commit
42febe9
·
verified ·
1 Parent(s): f62e41f

Update search_engine.py

Browse files
Files changed (1) hide show
  1. search_engine.py +0 -32
search_engine.py CHANGED
@@ -71,35 +71,3 @@ class SearchEngine:
71
  except Exception as e:
72
  logger.error(f"Error searching documents: {e}")
73
  return pd.DataFrame()
74
- ```
75
-
76
- ---
77
-
78
- ## What Was Added (All marked with "# NEW"):
79
-
80
- 1. ✅ **Store embeddings** - Keep reference for future use
81
- 2. ✅ **IndexFlatIP** - Changed from `IndexFlatL2` to `IndexFlatIP` for cosine similarity
82
- 3. ✅ **Normalize embeddings** - Required for cosine similarity to work
83
- 4. ✅ **Normalize query** - Query must also be normalized
84
- 5. ✅ **Search more results** - Get 2x TOP_K to filter from
85
- 6. ✅ **Filter by threshold** - Only keep results ≥ MIN_SIMILARITY_SCORE
86
- 7. ✅ **Limit to TOP_K** - After filtering, keep only top K
87
- 8. ✅ **Handle no results** - Return empty if nothing matches
88
- 9. ✅ **Add scores to results** - Include similarity scores in dataframe
89
- 10. ✅ **Sort by score** - Best matches first
90
- 11. ✅ **Better logging** - Show score range
91
-
92
- ---
93
-
94
- ## Impact on Accuracy:
95
-
96
- **Before:**
97
- ```
98
- Query: "نسبة الحضور"
99
- Results: 5 chunks (some irrelevant, scores unknown)
100
- ```
101
-
102
- **After:**
103
- ```
104
- Query: "نسبة الحضور"
105
- Results: 3 chunks (all relevant, scores: 0.72 - 0.85)
 
71
  except Exception as e:
72
  logger.error(f"Error searching documents: {e}")
73
  return pd.DataFrame()