netandreus
/

bge-reranker-v2-m3

@@ -16,7 +16,9 @@ model-index:
 - [Reranker model](#reranker-model)
   - [Brief information](#brief-information)
   - [Supporting architectures](#supporting-architectures)
-  - [Local inference example](#local-inference-example)
@@ -38,7 +40,27 @@ This repository contains reranker model ```bge-reranker-v2-m3``` which you can r
   - GPU (Nvidia T4)
   - Infernia 2 (2 cores, 32 Gb RAM)
-## Local inference example
 ```python
 from FlagEmbedding import FlagReranker

 - [Reranker model](#reranker-model)
   - [Brief information](#brief-information)
   - [Supporting architectures](#supporting-architectures)
+  - [Example usage](#example-usage)
+    - [HuggingFace Inference Endpoints](#huggingface-inference-endpoints)
+    - [Local inference](#local-inference)
   - GPU (Nvidia T4)
   - Infernia 2 (2 cores, 32 Gb RAM)
+## Example usage
+### HuggingFace Inference Endpoints
+```bash
+curl "https://xxxxxx.us-east-1.aws.endpoints.huggingface.cloud" \
+-X POST \
+-H "Accept: application/json" \
+-H "Authorization: Bearer hf_yyyyyyy" \
+-H "Content-Type: application/json" \
+-d '{
+  "inputs":"",
+  "query": "Hello, world!",
+  "texts": [
+    "Hello! How are you?",
+    "Cats and dogs",
+    "The sky is blue"
+    ]
+}'
+```
+### Local inference
 ```python
 from FlagEmbedding import FlagReranker

handler.py CHANGED Viewed

@@ -18,23 +18,26 @@ class EndpointHandler:
         """
         Expected input format:
         {
-            "query": "Your query here",
-            "texts": ["Document 1", "Document 2", ...],
             "normalize": true  # Optional; defaults to False
         }
         """
-        query = data.get("query")
-        texts = data.get("texts", [])
         normalize = data.get("normalize", False)
-        if not query or not texts:
-            return [{"error": "Both 'query' and 'texts' fields are required."}]
         # Prepare input pairs
-        pairs = [[query, text] for text in texts]
         # Tokenize input pairs
-        inputs = self.tokenizer(
             pairs,
             padding=True,
             truncation=True,
@@ -44,7 +47,7 @@ class EndpointHandler:
         with torch.no_grad():
             # Get model logits
-            outputs = self.model(**inputs)
             scores = outputs.logits.view(-1)
             # Apply sigmoid normalization if requested

         """
         Expected input format:
         {
+            "inputs": {
+                "query": "Your query here",
+                "documents": ["Document 1", "Document 2", ...]
+            },
             "normalize": true  # Optional; defaults to False
         }
         """
+        inputs = data.get("inputs", {})
+        query = inputs.get("query")
+        documents = inputs.get("documents", [])
         normalize = data.get("normalize", False)
+        if not query or not documents:
+            return [{"error": "Both 'query' and 'documents' fields are required inside 'inputs'."}]
         # Prepare input pairs
+        pairs = [[query, text] for text in documents]
         # Tokenize input pairs
+        tokenizer_inputs = self.tokenizer(
             pairs,
             padding=True,
             truncation=True,
         with torch.no_grad():
             # Get model logits
+            outputs = self.model(**tokenizer_inputs)
             scores = outputs.logits.view(-1)
             # Apply sigmoid normalization if requested