ilyankou
/

spatial-classifier

Text Classification

sentence-transformers

spatial-queries

text-embeddings-inference

Model card Files Files and versions

ilyankou commited on Mar 5

Commit

2a68202

·

verified ·

1 Parent(s): 365a23c

Update README.md

Files changed (1) hide show

README.md +15 -4

README.md CHANGED Viewed

@@ -22,14 +22,20 @@ base_model: BAAI/bge-small-en-v1.5
 # Spatial Web Search Query Classifier
-A binary [SetFit](https://github.com/huggingface/setfit) classifier that distinguishes spatial from non-spatial web search queries. Trained on a gold-annotated sample of [MS MARCO](https://microsoft.github.io/msmarco/) and used to identify 104,288 spatial queries (10.3%) across the full 1.01M-query corpus.
 **Accuracy / F1: 0.986** on a held-out balanced test set (76 negative, 72 positive).
 ## What counts as spatial?
-A query is spatial if its answer is geographically variant and requires reasoning about geographic primitives (location, distance, or direction) or topological relationships (adjacency, containment, or connectivity). This includes implicitly spatial queries such as costs and prices in a specific area — not just those containing a toponym.
 ## Model details
@@ -44,10 +50,15 @@ A query is spatial if its answer is geographically variant and requires reasonin
 from setfit import SetFitModel
 model = SetFitModel.from_pretrained("TODO")
-preds = model(["weather in erlanger ky", "what is symptom of bipolar disorder"])
 # => [1, 0]
 ```
 ## Training
-Weak labels were generated by running Llama 3.1 five times per query at temperature 0.2, then manually verified. The SetFit model was trained for one epoch with batch size 64 and learning rate 1e-5, then retrained on the full gold dataset for production inference.

 # Spatial Web Search Query Classifier
+A binary [SetFit](https://github.com/huggingface/setfit) classifier that distinguishes spatial
+from non-spatial web search queries. Trained on a gold-annotated sample
+of [MS MARCO](https://microsoft.github.io/msmarco/) and used to identify 104,288 spatial
+queries (10.3%) across the full 1.01M-query corpus.
 **Accuracy / F1: 0.986** on a held-out balanced test set (76 negative, 72 positive).
 ## What counts as spatial?
+A query is spatial if its answer is geographically variant and requires reasoning
+about geographic primitives (location, distance, or direction) or topological
+relationships (adjacency, containment, or connectivity). This includes implicitly
+spatial queries such as costs and prices in a specific area, not just those containing a toponym.
 ## Model details
 from setfit import SetFitModel
 model = SetFitModel.from_pretrained("TODO")
+preds = model([
+  "weather in erlanger ky",
+  "what is symptom of bipolar disorder"
+])
 # => [1, 0]
 ```
 ## Training
+Weak labels were generated by running Llama 3.1 five times per query at temperature 0.2,
+then manually verified. The SetFit model was trained for one epoch with batch size 64
+and learning rate 1e-5, then retrained on the full gold dataset for production inference.