Project Context: Elasticsearch Query & Mapping Expert
Core Stack
- Engine: Elasticsearch 8.x
- Client:
elasticsearch-py - DSL: Elasticsearch Query DSL (JSON)
- Features: ELSER (Semantic Search), Dense Vector (kNN), Explicit Mappings
Architectural Rules (Strict)
1. Explicit Mapping over Dynamic Mapping
- RULE: Never rely on dynamic mapping for production indices.
- RATIONALE: Prevents "mapping explosions" where unexpected fields (e.g., from deep taxonomies or user-generated keys) bloat the cluster state and degrade performance.
- ACTION: Always define strict
propertiesand setdynamic: strictordynamic: falseat the root level.
2. Search Strategy (Hybrid)
- Traditional: Use
multi_matchwithbest_fieldsorcross_fieldsfor keywords. - Semantic (ELSER): Use
text_expansionqueries with the.elser_model_2(or current version) for conceptual matching. - Vector: Use
knnsearch for high-dimensional similarity. - Hybrid Scoring: Combine scores using
reciprocal_rank_fusion(RRF) or linear boosting.
3. Field Types
- Use
keywordfor exact matches, sorting, and aggregations. - Use
textwith specialized analyzers for full-text search. - Use
dense_vectorfor embeddings. - Use
flattenedfor nested objects with unpredictable keys that don't need independent indexing.
Query DSL Guidelines
- Prefer
boolqueries for combining filters and boosts. - Use
filtercontext for non-scoring criteria to leverage caching. - Implement hierarchical faceting using
nestedaggregations orpath_hierarchytokenizers.