|
|
--- |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- ik-ram28/synthetic-NER-dataset |
|
|
language: |
|
|
- fr |
|
|
base_model: |
|
|
- Ihor/gliner-biomed-large-v1.0 |
|
|
--- |
|
|
# EvalLLM-GLiNER-Biomedical |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a fine-tuned version of [gliner-biomed-large-v1.0](https://huggingface.co/Ihor/gliner-biomed-large-v1.0) specifically designed for French biomedical Named Entity Recognition (NER). It was developed as part of the EvalLLM 2025 challenge. |
|
|
|
|
|
The model leverages GLiNER's zero-shot capabilities while being fine-tuned on synthetic biomedical data, making it highly effective for identifying 21 types of biomedical entities in French text. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Base Model |
|
|
- **Architecture**: GLiNER (Generalist and Lightweight Model for Named Entity Recognition) |
|
|
- **Base Version**: gliner-biomed-large-v1.0 |
|
|
- **Language**: French |
|
|
- **Domain**: Biomedical and health-related text |
|
|
|
|
|
### Training Configuration |
|
|
- **Training Epochs**: 3 (early stopping at 2.85 epochs) |
|
|
- **Learning Rate**: 1e-5 |
|
|
- **Weight Decay**: 0.01 |
|
|
- **Scheduler**: Cosine with 10% warm-up |
|
|
- **Batch Size**: 8 |
|
|
- **Training Data**: 1,748 synthetic documents |
|
|
|
|
|
|
|
|
## Entity Types (21 categories) |
|
|
|
|
|
| Entity Type | French Label | Example | |
|
|
|-------------|--------------|---------| |
|
|
| `ABS_DATE` | Date absolue | "15 mars 2020" | |
|
|
| `ABS_PERIOD` | Période absolue | "janvier 2019 à mars 2020" | |
|
|
| `BIO_TOXIN` | Toxine biologique | "toxine botulique" | |
|
|
| `DIS_REF_TO_PATH` | Référence maladie-pathogène | "infection par E. coli" | |
|
|
| `DOC_AUTHOR` | Auteur de document | "Dr. Martin Dubois" | |
|
|
| `DOC_DATE` | Date de document | "publié le 12/03/2021" | |
|
|
| `DOC_SOURCE` | Source de document | "Journal of Medicine" | |
|
|
| `EVENT_MACRO` | Événement macro | "épidémie de COVID-19" | |
|
|
| `EVENT_MICRO` | Événement micro | "cas de contamination" | |
|
|
| `EXPLOSIVE` | Explosif | "TNT", "dynamite" | |
|
|
| `FUZZY_PERIOD` | Période floue | "début d'année", "récemment" | |
|
|
| `INF_DISEASE` | Maladie infectieuse | "grippe", "tuberculose" | |
|
|
| `LOCATION` | Localisation | "Paris", "France" | |
|
|
| `LOC_REF_TO_ORG` | Référence lieu-organisation | "hôpital de Lyon" | |
|
|
| `NON_INF_DISEASE` | Maladie non infectieuse | "diabète", "cancer" | |
|
|
| `ORGANIZATION` | Organisation | "OMS", "Institut Pasteur" | |
|
|
| `ORG_REF_TO_LOC` | Référence organisation-lieu | "OMS Europe" | |
|
|
| `PATHOGEN` | Pathogène | "virus Ebola", "E. coli" | |
|
|
| `PATH_REF_TO_DIS` | Référence pathogène-maladie | "virus causant la grippe" | |
|
|
| `RADIOISOTOPE` | Radio-isotope | "uranium 235", "césium 137" | |
|
|
| `REL_DATE` | Date relative | "hier", "la semaine dernière" | |
|
|
| `REL_PERIOD` | Période relative | "depuis 3 mois" | |
|
|
| `TOXIC_AGENT` | Agent toxique | "plomb", "mercure" | |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
|
|
|
``` |
|
|
|
|
|
## Related Resources |
|
|
|
|
|
- **GitHub Repository**: [EvalLLM2025](https://github.com/ikram28/EvalLLM2025) |
|
|
- **Paper**: [Link to paper when published] |
|
|
- **Challenge**: [EvalLLM 2025](https://evalllm2025.sciencesconf.org/) |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the Apache 2.0 License. |
|
|
|
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- GLiNER team for the base architecture |
|
|
- EvalLLM 2025 organizers |
|
|
|