| ---
|
| license: apache-2.0
|
| tags:
|
| - retrieval
|
| - search
|
| - lightgbm
|
| - cross-encoder
|
| - bge-m3
|
| language:
|
| - en
|
| - ar
|
| ---
|
|
|
| # Harrir Search Stack v1
|
|
|
| Two-tower asymmetric GCN search for fashion retail (Harrir catalog ~28k SKUs),
|
| multilingual EN+AR. Three artifacts, version-locked together.
|
|
|
| ## Layout
|
|
|
| ```
|
| .
|
| βββ adapter/ # LoRA adapter on BGE-M3 (LIVE)
|
| βββ tokenizer/ # BGE-M3 tokenizer (LIVE)
|
| βββ gcn_head.pt # W_q / W_p / W_img projection heads (LIVE)
|
| βββ ltr_model.txt # LightGBM LambdaRank champion (LIVE)
|
| βββ ltr_idf.json # BM25/TF-IDF stats over catalog
|
| βββ ltr_subcat_*.{json,pt} # subcategory embeddings (EN+AR)
|
| βββ ltr_spec_*.{json,pt} # product spec embeddings
|
| βββ cross_encoder/ # BAAI/bge-reranker-base fine-tuned
|
| # NOT loaded by the app today (served from
|
| # Modal). Kept here for future bake-in.
|
| ```
|
|
|
| ## Usage
|
|
|
| The Harrir app loads these at startup. Set this repo in `.env`:
|
|
|
| ```
|
| GCN_HF_REPO=rdxtremity/search-stack-v1
|
| GCN_ARTIFACTS_DIR=./models/gcn_stage2
|
| ```
|
|
|
| Then `python download_models.py` snapshot-downloads into `./models/gcn_stage2/`.
|
|
|
| Or manually:
|
|
|
| ```python
|
| from huggingface_hub import snapshot_download
|
| snapshot_download("rdxtremity/search-stack-v1", local_dir="./models/gcn_stage2")
|
| ```
|
|
|
| ## Eval
|
|
|
| Baseline nDCG (golden_v1, 346 queries / 19,923 graded pairs): **0.7695**
|
| Latest with LTR+CE rerank: see `Primary.GateRuns` audit log in the app.
|
| |