File size: 1,664 Bytes
206999a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---

license: apache-2.0
tags:
  - retrieval
  - search
  - lightgbm
  - cross-encoder
  - bge-m3
language:
  - en
  - ar
---


# Harrir Search Stack v1

Two-tower asymmetric GCN search for fashion retail (Harrir catalog ~28k SKUs),
multilingual EN+AR. Three artifacts, version-locked together.

## Layout

```

.

β”œβ”€β”€ adapter/                      # LoRA adapter on BGE-M3 (LIVE)

β”œβ”€β”€ tokenizer/                    # BGE-M3 tokenizer (LIVE)

β”œβ”€β”€ gcn_head.pt                   # W_q / W_p / W_img projection heads (LIVE)

β”œβ”€β”€ ltr_model.txt                 # LightGBM LambdaRank champion (LIVE)

β”œβ”€β”€ ltr_idf.json                  # BM25/TF-IDF stats over catalog

β”œβ”€β”€ ltr_subcat_*.{json,pt}        # subcategory embeddings (EN+AR)

β”œβ”€β”€ ltr_spec_*.{json,pt}          # product spec embeddings

└── cross_encoder/                # BAAI/bge-reranker-base fine-tuned

                                  # NOT loaded by the app today (served from

                                  # Modal). Kept here for future bake-in.

```

## Usage

The Harrir app loads these at startup. Set this repo in `.env`:

```

GCN_HF_REPO=rdxtremity/search-stack-v1

GCN_ARTIFACTS_DIR=./models/gcn_stage2

```

Then `python download_models.py` snapshot-downloads into `./models/gcn_stage2/`.

Or manually:

```python

from huggingface_hub import snapshot_download

snapshot_download("rdxtremity/search-stack-v1", local_dir="./models/gcn_stage2")

```

## Eval

Baseline nDCG (golden_v1, 346 queries / 19,923 graded pairs): **0.7695**

Latest with LTR+CE rerank: see `Primary.GateRuns` audit log in the app.