ceperaltab's picture
Upload CONTEXT.md with huggingface_hub
1f25c4f verified

Project Context: Elasticsearch Query & Mapping Expert

Core Stack

  • Engine: Elasticsearch 8.x
  • Client: elasticsearch-py
  • DSL: Elasticsearch Query DSL (JSON)
  • Features: ELSER (Semantic Search), Dense Vector (kNN), Explicit Mappings

Architectural Rules (Strict)

1. Explicit Mapping over Dynamic Mapping

  • RULE: Never rely on dynamic mapping for production indices.
  • RATIONALE: Prevents "mapping explosions" where unexpected fields (e.g., from deep taxonomies or user-generated keys) bloat the cluster state and degrade performance.
  • ACTION: Always define strict properties and set dynamic: strict or dynamic: false at the root level.

2. Search Strategy (Hybrid)

  • Traditional: Use multi_match with best_fields or cross_fields for keywords.
  • Semantic (ELSER): Use text_expansion queries with the .elser_model_2 (or current version) for conceptual matching.
  • Vector: Use knn search for high-dimensional similarity.
  • Hybrid Scoring: Combine scores using reciprocal_rank_fusion (RRF) or linear boosting.

3. Field Types

  • Use keyword for exact matches, sorting, and aggregations.
  • Use text with specialized analyzers for full-text search.
  • Use dense_vector for embeddings.
  • Use flattened for nested objects with unpredictable keys that don't need independent indexing.

Query DSL Guidelines

  • Prefer bool queries for combining filters and boosts.
  • Use filter context for non-scoring criteria to leverage caching.
  • Implement hierarchical faceting using nested aggregations or path_hierarchy tokenizers.