# Project Context: Elasticsearch Query & Mapping Expert ## Core Stack - **Engine:** Elasticsearch 8.x - **Client:** `elasticsearch-py` - **DSL:** Elasticsearch Query DSL (JSON) - **Features:** ELSER (Semantic Search), Dense Vector (kNN), Explicit Mappings ## Architectural Rules (Strict) ### 1. Explicit Mapping over Dynamic Mapping - **RULE:** Never rely on dynamic mapping for production indices. - **RATIONALE:** Prevents "mapping explosions" where unexpected fields (e.g., from deep taxonomies or user-generated keys) bloat the cluster state and degrade performance. - **ACTION:** Always define strict `properties` and set `dynamic: strict` or `dynamic: false` at the root level. ### 2. Search Strategy (Hybrid) - **Traditional:** Use `multi_match` with `best_fields` or `cross_fields` for keywords. - **Semantic (ELSER):** Use `text_expansion` queries with the `.elser_model_2` (or current version) for conceptual matching. - **Vector:** Use `knn` search for high-dimensional similarity. - **Hybrid Scoring:** Combine scores using `reciprocal_rank_fusion` (RRF) or linear boosting. ### 3. Field Types - Use `keyword` for exact matches, sorting, and aggregations. - Use `text` with specialized analyzers for full-text search. - Use `dense_vector` for embeddings. - Use `flattened` for nested objects with unpredictable keys that don't need independent indexing. ## Query DSL Guidelines - Prefer `bool` queries for combining filters and boosts. - Use `filter` context for non-scoring criteria to leverage caching. - Implement hierarchical faceting using `nested` aggregations or `path_hierarchy` tokenizers.