ceperaltab commited on
Commit
1f25c4f
·
verified ·
1 Parent(s): 8377188

Upload CONTEXT.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. CONTEXT.md +31 -0
CONTEXT.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Project Context: Elasticsearch Query & Mapping Expert
2
+
3
+ ## Core Stack
4
+ - **Engine:** Elasticsearch 8.x
5
+ - **Client:** `elasticsearch-py`
6
+ - **DSL:** Elasticsearch Query DSL (JSON)
7
+ - **Features:** ELSER (Semantic Search), Dense Vector (kNN), Explicit Mappings
8
+
9
+ ## Architectural Rules (Strict)
10
+
11
+ ### 1. Explicit Mapping over Dynamic Mapping
12
+ - **RULE:** Never rely on dynamic mapping for production indices.
13
+ - **RATIONALE:** Prevents "mapping explosions" where unexpected fields (e.g., from deep taxonomies or user-generated keys) bloat the cluster state and degrade performance.
14
+ - **ACTION:** Always define strict `properties` and set `dynamic: strict` or `dynamic: false` at the root level.
15
+
16
+ ### 2. Search Strategy (Hybrid)
17
+ - **Traditional:** Use `multi_match` with `best_fields` or `cross_fields` for keywords.
18
+ - **Semantic (ELSER):** Use `text_expansion` queries with the `.elser_model_2` (or current version) for conceptual matching.
19
+ - **Vector:** Use `knn` search for high-dimensional similarity.
20
+ - **Hybrid Scoring:** Combine scores using `reciprocal_rank_fusion` (RRF) or linear boosting.
21
+
22
+ ### 3. Field Types
23
+ - Use `keyword` for exact matches, sorting, and aggregations.
24
+ - Use `text` with specialized analyzers for full-text search.
25
+ - Use `dense_vector` for embeddings.
26
+ - Use `flattened` for nested objects with unpredictable keys that don't need independent indexing.
27
+
28
+ ## Query DSL Guidelines
29
+ - Prefer `bool` queries for combining filters and boosts.
30
+ - Use `filter` context for non-scoring criteria to leverage caching.
31
+ - Implement hierarchical faceting using `nested` aggregations or `path_hierarchy` tokenizers.