File size: 1,664 Bytes
206999a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | ---
license: apache-2.0
tags:
- retrieval
- search
- lightgbm
- cross-encoder
- bge-m3
language:
- en
- ar
---
# Harrir Search Stack v1
Two-tower asymmetric GCN search for fashion retail (Harrir catalog ~28k SKUs),
multilingual EN+AR. Three artifacts, version-locked together.
## Layout
```
.
βββ adapter/ # LoRA adapter on BGE-M3 (LIVE)
βββ tokenizer/ # BGE-M3 tokenizer (LIVE)
βββ gcn_head.pt # W_q / W_p / W_img projection heads (LIVE)
βββ ltr_model.txt # LightGBM LambdaRank champion (LIVE)
βββ ltr_idf.json # BM25/TF-IDF stats over catalog
βββ ltr_subcat_*.{json,pt} # subcategory embeddings (EN+AR)
βββ ltr_spec_*.{json,pt} # product spec embeddings
βββ cross_encoder/ # BAAI/bge-reranker-base fine-tuned
# NOT loaded by the app today (served from
# Modal). Kept here for future bake-in.
```
## Usage
The Harrir app loads these at startup. Set this repo in `.env`:
```
GCN_HF_REPO=rdxtremity/search-stack-v1
GCN_ARTIFACTS_DIR=./models/gcn_stage2
```
Then `python download_models.py` snapshot-downloads into `./models/gcn_stage2/`.
Or manually:
```python
from huggingface_hub import snapshot_download
snapshot_download("rdxtremity/search-stack-v1", local_dir="./models/gcn_stage2")
```
## Eval
Baseline nDCG (golden_v1, 346 queries / 19,923 graded pairs): **0.7695**
Latest with LTR+CE rerank: see `Primary.GateRuns` audit log in the app.
|