bltlab
/

queryner-bert-base-uncased

+---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+{}
+---
+# Model Card for Model ID
+E-commerce query segmentation model in English.
+## Model Details
+### Model Description
+This is a token classification model using BERT base uncased as the base model.
+The model is fine-tuned on the (QueryNER training dataset)[https://huggingface.co/datasets/bltlab/queryner].
+- **Developed by:** (BLT Lab)[https://github.com/bltlab] in collaboration with eBay.
+- **Funded by:** eBay
+- **Shared by:** (@cpalenmichel)[https://github.com/cpalenmichel]
+- **Model type:** Token Classification / Sequence Labeling / Chunking
+- **Language(s) (NLP):** English
+- **License:** CC-BY 4.0
+- **Finetuned from model:** BERT base uncased
+### Model Sources
+Underlying model is based on (BERT base-uncased)[https://huggingface.co/google-bert/bert-base-uncased].
+- **Repository:** (https://github.com/bltlab/query-ner)[https://github.com/bltlab/query-ner]
+- **Paper:** Accepted at LREC-COLING Coming soon
+## Uses
+### Direct Use
+Intended use is research purposes and e-commerce query segmentation.
+### Downstream Use
+Potential downstream use cases include weighting entity spans, linking to knowledge bases, removing spans as a recovery strategy for null and low recall queries.
+### Out-of-Scope Use
+This model is trained only on the training data of the QueryNER dataset. It may not perform well on other domains without additional training data and further fine-tuning.
+## Bias, Risks, and Limitations
+See paper limitations section.
+## How to Get Started with the Model
+See huggingface tutorials for token classification and access the model using AutoModelForTokenClassification.
+Note that we do some post processing to make use of only the first subtoken's tag unlike the inference API.
+## Training Details
+### Training Data
+See paper for details.
+### Training Procedure
+See paper for details.
+#### Training Hyperparameters
+See paper for details.
+## Evaluation
+Evaluation details provided in the paper.
+Scoring was done using (SeqScore)[https://github.com/bltlab/seqscore] using the conlleval repair method for invalid label transition sequences.
+### Testing Data, Factors & Metrics
+#### Testing Data
+QueryNER test set: (https://huggingface.co/datasets/bltlab/queryner)[https://huggingface.co/datasets/bltlab/queryner]
+#### Factors
+Evaluation is reported with micro-F1 at the entity level on the QueryNER test set.
+We used conlleval repair method for invalid label transitions.
+#### Metrics
+We use micro-F1 at the entity level as this is fairly common practice for NER models.
+### Results
+[More Information Needed]
+## Environmental Impact
+Rough estimate
+- **Hardware Type:** 1 RTX 3090 GPU
+- **Hours used:** < 2 hours
+- **Cloud Provider:** Private
+- **Compute Region:** northamerica-northeast1
+- **Carbon Emitted:** 0.02
+## Citation
+Accepted at LREC-COLING coming soon
+**BibTeX:**
+Accepted at LREC-COLING coming soon
+## Model Card Authors
+Chester Palen-Michel (@cpalenmichel)[https://github.com/cpalenmichel]
+## Model Card Contact
+Chester Palen-Michel (@cpalenmichel)[https://github.com/cpalenmichel]