| --- |
| datasets: |
| - laurabernardy/ES-10-Geoentities |
| --- |
| xml-RoBERTa based model with CRF layer for location and address span extraction. |
| Trained on weakly labeled dataset (Part of Dataset 10 of Epstein files). Geographical entities in dataset labeled with qwen3 70b + BIO-tags added automatically. |
| Still needs tests in the wild. |
| |
| Trained until: Epoch 6 | Loss: 62.9988 (CFR-loss) | Token F1: 0.8452 | Binary F1: 0.8170 | Token Acc: 0.9842 | Span Acc: 0.6357 | Partial: 0.7419 |
| Token F1 - based on token matching |
| Binary F1 - shows performance of Geo Entety extraction only |
| Token Acc - based on token matching |
| Span Acc - based on span (whole geo entity) matching |
| partial - based on span (whole geo entity) matching (at least 50% correct overlap) |
| |
| --- |
| language: |
| - en |
| metrics: |
| - f1 |
| - accuracy |
| base_model: |
| - FacebookAI/xlm-roberta-base |
| pipeline_tag: text-classification |
| tags: |
| - geoparsing |
| - location |
| - ner |
| - informationextraction |
| --- |