| # Project Context: Elasticsearch Query & Mapping Expert | |
| ## Core Stack | |
| - **Engine:** Elasticsearch 8.x | |
| - **Client:** `elasticsearch-py` | |
| - **DSL:** Elasticsearch Query DSL (JSON) | |
| - **Features:** ELSER (Semantic Search), Dense Vector (kNN), Explicit Mappings | |
| ## Architectural Rules (Strict) | |
| ### 1. Explicit Mapping over Dynamic Mapping | |
| - **RULE:** Never rely on dynamic mapping for production indices. | |
| - **RATIONALE:** Prevents "mapping explosions" where unexpected fields (e.g., from deep taxonomies or user-generated keys) bloat the cluster state and degrade performance. | |
| - **ACTION:** Always define strict `properties` and set `dynamic: strict` or `dynamic: false` at the root level. | |
| ### 2. Search Strategy (Hybrid) | |
| - **Traditional:** Use `multi_match` with `best_fields` or `cross_fields` for keywords. | |
| - **Semantic (ELSER):** Use `text_expansion` queries with the `.elser_model_2` (or current version) for conceptual matching. | |
| - **Vector:** Use `knn` search for high-dimensional similarity. | |
| - **Hybrid Scoring:** Combine scores using `reciprocal_rank_fusion` (RRF) or linear boosting. | |
| ### 3. Field Types | |
| - Use `keyword` for exact matches, sorting, and aggregations. | |
| - Use `text` with specialized analyzers for full-text search. | |
| - Use `dense_vector` for embeddings. | |
| - Use `flattened` for nested objects with unpredictable keys that don't need independent indexing. | |
| ## Query DSL Guidelines | |
| - Prefer `bool` queries for combining filters and boosts. | |
| - Use `filter` context for non-scoring criteria to leverage caching. | |
| - Implement hierarchical faceting using `nested` aggregations or `path_hierarchy` tokenizers. | |