Upload folder using huggingface_hub

Files changed (8) hide show

README.md ADDED Viewed

+---
+license: mit
+tags:
+- product-search
+- semantic-search
+- bert
+- pytorch
+- information-retrieval
+---
+# Semantic Product Search Model
+This model performs semantic product search using BERT embeddings and a dual-encoder neural network architecture.
+## Model Architecture
+- **Base Model**: BERT-base-uncased for text embeddings
+- **Encoder**: Dual-encoder architecture with separate query and product encoders
+- **Similarity Network**: Multi-layer perceptron for relevance scoring
+- **Input Dimension**: 768 (BERT embedding size)
+- **Hidden Dimensions**: [512, 256, 128]
+- **Dropout**: 0.3
+## Usage
+See the `load_and_run_frontend.py` script for loading and using this model.
+## Files
+- `pytorch_model.bin`: Model weights
+- `config.json`: Model configuration
+- `tokenizer files`: BERT tokenizer files
+- `product_catalog.parquet`: Product catalog for search
+- `product_embeddings.npy`: Precomputed product embeddings (optional)
+## Performance
+Trained on Amazon Shopping Queries Dataset with the following metrics:
+- NDCG@10: ~0.54
+- MAP: ~0.54
+- Precision@10: ~0.50
+- Recall@10: ~0.54

config.json ADDED Viewed

+{
+  "input_dim": 768,
+  "hidden_dims": [
+    512,
+    256,
+    128
+  ],
+  "dropout": 0.3,
+  "bert_model_name": "bert-base-uncased"
+}

product_catalog.parquet ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:3ce37ebb78525d520e5f8adf8bcc0160853d856b78938163112c1b2fb1021be4
+size 57918185

product_embeddings.npy ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:7950b8d596897d2837c3076acb9df14c09966a26a28a57808481afae8dff4ccc
+size 237809792

pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:742ee9479ab452c9b9bf78c0ffabc6d4c751b5dfa15b7cbd8657cfec17954cf3
+size 4866020

special_tokens_map.json ADDED Viewed

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer_config.json ADDED Viewed

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff