File size: 1,617 Bytes
1f25c4f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | # Project Context: Elasticsearch Query & Mapping Expert
## Core Stack
- **Engine:** Elasticsearch 8.x
- **Client:** `elasticsearch-py`
- **DSL:** Elasticsearch Query DSL (JSON)
- **Features:** ELSER (Semantic Search), Dense Vector (kNN), Explicit Mappings
## Architectural Rules (Strict)
### 1. Explicit Mapping over Dynamic Mapping
- **RULE:** Never rely on dynamic mapping for production indices.
- **RATIONALE:** Prevents "mapping explosions" where unexpected fields (e.g., from deep taxonomies or user-generated keys) bloat the cluster state and degrade performance.
- **ACTION:** Always define strict `properties` and set `dynamic: strict` or `dynamic: false` at the root level.
### 2. Search Strategy (Hybrid)
- **Traditional:** Use `multi_match` with `best_fields` or `cross_fields` for keywords.
- **Semantic (ELSER):** Use `text_expansion` queries with the `.elser_model_2` (or current version) for conceptual matching.
- **Vector:** Use `knn` search for high-dimensional similarity.
- **Hybrid Scoring:** Combine scores using `reciprocal_rank_fusion` (RRF) or linear boosting.
### 3. Field Types
- Use `keyword` for exact matches, sorting, and aggregations.
- Use `text` with specialized analyzers for full-text search.
- Use `dense_vector` for embeddings.
- Use `flattened` for nested objects with unpredictable keys that don't need independent indexing.
## Query DSL Guidelines
- Prefer `bool` queries for combining filters and boosts.
- Use `filter` context for non-scoring criteria to leverage caching.
- Implement hierarchical faceting using `nested` aggregations or `path_hierarchy` tokenizers.
|