ceperaltab
/

elasticsearch-training-code

Model card Files Files and versions

xet

Community

ceperaltab commited on Jan 23

Commit

1f25c4f

verified ·

1 Parent(s): 8377188

Upload CONTEXT.md with huggingface_hub

Browse files

Files changed (1) hide show

CONTEXT.md +31 -0

CONTEXT.md ADDED Viewed

	@@ -0,0 +1,31 @@

+# Project Context: Elasticsearch Query & Mapping Expert
+## Core Stack
+- **Engine:** Elasticsearch 8.x
+- **Client:** `elasticsearch-py`
+- **DSL:** Elasticsearch Query DSL (JSON)
+- **Features:** ELSER (Semantic Search), Dense Vector (kNN), Explicit Mappings
+## Architectural Rules (Strict)
+### 1. Explicit Mapping over Dynamic Mapping
+- **RULE:** Never rely on dynamic mapping for production indices.
+- **RATIONALE:** Prevents "mapping explosions" where unexpected fields (e.g., from deep taxonomies or user-generated keys) bloat the cluster state and degrade performance.
+- **ACTION:** Always define strict `properties` and set `dynamic: strict` or `dynamic: false` at the root level.
+### 2. Search Strategy (Hybrid)
+- **Traditional:** Use `multi_match` with `best_fields` or `cross_fields` for keywords.
+- **Semantic (ELSER):** Use `text_expansion` queries with the `.elser_model_2` (or current version) for conceptual matching.
+- **Vector:** Use `knn` search for high-dimensional similarity.
+- **Hybrid Scoring:** Combine scores using `reciprocal_rank_fusion` (RRF) or linear boosting.
+### 3. Field Types
+- Use `keyword` for exact matches, sorting, and aggregations.
+- Use `text` with specialized analyzers for full-text search.
+- Use `dense_vector` for embeddings.
+- Use `flattened` for nested objects with unpredictable keys that don't need independent indexing.
+## Query DSL Guidelines
+- Prefer `bool` queries for combining filters and boosts.
+- Use `filter` context for non-scoring criteria to leverage caching.
+- Implement hierarchical faceting using `nested` aggregations or `path_hierarchy` tokenizers.