laurabernardy
/

RueBERTa

Model card Files Files and versions

RueBERTa / README.md

laurabernardy's picture

Update README.md

9f998d0 verified about 2 months ago

|

history blame contribute delete

945 Bytes

	---
	datasets:
	- laurabernardy/ES-10-Geoentities
	---
	xml-RoBERTa based model with CRF layer for location and address span extraction.
	Trained on weakly labeled dataset (Part of Dataset 10 of Epstein files). Geographical entities in dataset labeled with qwen3 70b + BIO-tags added automatically.
	Still needs tests in the wild.

	Trained until: Epoch 6 \| Loss: 62.9988 (CFR-loss) \| Token F1: 0.8452 \| Binary F1: 0.8170 \| Token Acc: 0.9842 \| Span Acc: 0.6357 \| Partial: 0.7419
	Token F1 - based on token matching
	Binary F1 - shows performance of Geo Entety extraction only
	Token Acc - based on token matching
	Span Acc - based on span (whole geo entity) matching
	partial - based on span (whole geo entity) matching (at least 50% correct overlap)

	---
	language:
	- en
	metrics:
	- f1
	- accuracy
	base_model:
	- FacebookAI/xlm-roberta-base
	pipeline_tag: text-classification
	tags:
	- geoparsing
	- location
	- ner
	- informationextraction
	---