| | --- |
| | license: cc-by-nc-4.0 |
| | language: |
| | - fr |
| | base_model: |
| | - google/mt5-small |
| | pipeline_tag: text-generation |
| | citation: | |
| | @inproceedings{moncla2026edda, |
| | title={EDDA-Coordinata: An Annotated Dataset of Historical Geographic Coordinates}, |
| | author={Moncla, Ludovic and Nugues, Pierre and Joliveau, Thierry and McDonough, Katherine}, |
| | booktitle={Proceedings of the 2026 Language Resources and Evaluation Conference (LREC 2026)}, |
| | year={2026}, |
| | url={https://arxiv.org/abs/2602.23941} |
| | } |
| | --- |
| | |
| |
|
| | # Model Card of `GEODE/mt5-small-coords-norm` |
| | This model is fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) for extracting and normalizing geographic coordinates from texts. |
| |
|
| | ### Overview |
| | - **Language model:** [google/mt5-small](https://huggingface.co/google/mt5-small) |
| | - **Language:** French |
| | - **Training data:** |
| | - **Online Demo:** [https://huggingface.co/spaces/GEODE/edda-coordinates](https://huggingface.co/spaces/GEODE/edda-coordinates) |
| | - **Repository:** []() |
| |
|
| | ### Usage |
| |
|
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | pipe = pipeline("text2text-generation", model="GEODE/mt5-small-coords-norm") |
| | pipe("* AACH ou ACH, s. f. petite ville d'Allemagne dans le cercle de Souabe, près de la source de l'Aach. Long. 26. 57. lat. 47. 55.") |
| | ``` |
| |
|
| | ## Evaluation |
| |
|
| |
|
| | ### 5-Fold Cross-Validation Results |
| |
|
| | | Metric | Score | |
| | |:-----------------|---------:| |
| | | Mean Exact Match | 0.8365 | |
| | | Mean Char F1 | 0.9675 | |
| |
|
| |
|
| | ## Training hyperparameters |
| |
|
| | The following hyperparameters were used during fine-tuning: |
| | - dataset_path: |
| | - dataset_name: edda-coordinata |
| | - input_types: ['encyclopedic_text_entry', 'dms_coordinates'] |
| | - output_types: 'dms_coordinates' |
| | - model: google/mt5-small |
| | - max_length: 512 |
| | - max_length_output: 128 |
| | - epoch: 10 |
| | - batch: 8 |
| | - lr: 0.0005 |
| | - random_seed: 42 |
| | - gradient_accumulation_steps: 1 |
| |
|
| |
|
| | ## Citation |
| |
|
| | If you use the **EDDA-Coordinata** dataset or the associated models, please cite our LREC 2026 paper: |
| |
|
| | ```bibtex |
| | @inproceedings{moncla2026edda, |
| | title={EDDA-Coordinata: An Annotated Dataset of Historical Geographic Coordinates}, |
| | author={Moncla, Ludovic and Nugues, Pierre and Joliveau, Thierry and McDonough, Katherine}, |
| | booktitle={Proceedings of the 2026 Language Resources and Evaluation Conference (LREC 2026)}, |
| | year={2026}, |
| | url={[https://arxiv.org/abs/2602.23941](https://arxiv.org/abs/2602.23941)} |
| | } |
| | ``` |
| |
|