| --- |
| license: apache-2.0 |
| datasets: |
| - thunlp/docred |
| --- |
| ## Evaluation Results |
|
|
| Evaluation was conducted on the **DocRED dev set**. |
|
|
| ### Dev Set Performance |
| - **Best Epoch**: 29 / 30 |
| - **Training Loss**: 0.0023 |
|
|
| ### Main Metrics |
| - **Micro F1**: **59.25%** |
| - Precision: **63.11%** |
| - Recall: **55.83%** |
|
|
| ### Interpretation |
| This V2 checkpoint achieves a **Micro F1 of 59.25%** on the DocRED dev set, with a **Precision of 63.11%** and **Recall of 55.83%**. |
| Compared to V1 (Micro F1 60.71%), V2 shows slightly lower overall F1 but maintains a competitive precision-recall balance. |
| The relatively higher precision suggests that V2 makes more conservative predictions, reducing false positives at the cost of some recall. |
|
|
| ### V1 vs V2 Comparison |
|
|
| | Metric | V1 (best_model_f1_56_64) | V2 (best_model_V2) | |
| |-----------|--------------------------|--------------------| |
| | Micro F1 | 60.71% | 59.25% | |
| | Precision | 65.34% | 63.11% | |
| | Recall | 56.70% | 55.83% | |
|
|
| ### Notes |
| - This model is designed for **document-level relation extraction** on the **DocRED** benchmark. |
| - V2 was trained as an ablation/comparison run against V1 to verify reproducibility and threshold sensitivity. |
| - Performance may vary depending on preprocessing details, threshold settings, and evaluation configuration. |
|
|
| ## License and Dataset Notice |
|
|
| ### Code / Model License |
| This project is built upon several open-source works: |
|
|
| - **HuggingFace Transformers / BERT** — Apache License 2.0 |
| - **ATLOP** — MIT License |
| - **GAIN** — MIT License |
| - **DREEAM** — based on the original paper and implementation references |
|
|
| ### Dataset Notice |
| This model is trained and evaluated on the **DocRED** dataset. |
| **DocRED is intended for research use.** Users should separately review the dataset's original terms and conditions before any redistribution or commercial use. |
|
|
| ### Intended Use |
| This repository is intended for: |
| - academic research |
| - experimentation on document-level relation extraction |
| - knowledge graph construction pipelines |
| - benchmark comparison and ablation studies |
|
|
| It is **not guaranteed for production use** without additional validation. |