ceperaltab
/

elasticsearch-training-code

Model card Files Files and versions

elasticsearch-training-code / CONTEXT.md

ceperaltab's picture

Upload CONTEXT.md with huggingface_hub

1f25c4f verified about 2 months ago

|

history blame contribute delete

1.62 kB

	# Project Context: Elasticsearch Query & Mapping Expert

	## Core Stack
	- Engine: Elasticsearch 8.x
	- Client: `elasticsearch-py`
	- DSL: Elasticsearch Query DSL (JSON)
	- Features: ELSER (Semantic Search), Dense Vector (kNN), Explicit Mappings

	## Architectural Rules (Strict)

	### 1. Explicit Mapping over Dynamic Mapping
	- RULE: Never rely on dynamic mapping for production indices.
	- RATIONALE: Prevents "mapping explosions" where unexpected fields (e.g., from deep taxonomies or user-generated keys) bloat the cluster state and degrade performance.
	- ACTION: Always define strict `properties` and set `dynamic: strict` or `dynamic: false` at the root level.

	### 2. Search Strategy (Hybrid)
	- Traditional: Use `multi_match` with `best_fields` or `cross_fields` for keywords.
	- Semantic (ELSER): Use `text_expansion` queries with the `.elser_model_2` (or current version) for conceptual matching.
	- Vector: Use `knn` search for high-dimensional similarity.
	- Hybrid Scoring: Combine scores using `reciprocal_rank_fusion` (RRF) or linear boosting.

	### 3. Field Types
	- Use `keyword` for exact matches, sorting, and aggregations.
	- Use `text` with specialized analyzers for full-text search.
	- Use `dense_vector` for embeddings.
	- Use `flattened` for nested objects with unpredictable keys that don't need independent indexing.

	## Query DSL Guidelines
	- Prefer `bool` queries for combining filters and boosts.
	- Use `filter` context for non-scoring criteria to leverage caching.
	- Implement hierarchical faceting using `nested` aggregations or `path_hierarchy` tokenizers.