hihi_model / README.md
park990's picture
Upload README.md
35781b2 verified
---
license: apache-2.0
datasets:
- thunlp/docred
---
## Evaluation Results
Evaluation was conducted on the **DocRED dev set**.
### Dev Set Performance
- **Best Epoch**: 29 / 30
- **Training Loss**: 0.0023
### Main Metrics
- **Micro F1**: **59.25%**
- Precision: **63.11%**
- Recall: **55.83%**
### Interpretation
This V2 checkpoint achieves a **Micro F1 of 59.25%** on the DocRED dev set, with a **Precision of 63.11%** and **Recall of 55.83%**.
Compared to V1 (Micro F1 60.71%), V2 shows slightly lower overall F1 but maintains a competitive precision-recall balance.
The relatively higher precision suggests that V2 makes more conservative predictions, reducing false positives at the cost of some recall.
### V1 vs V2 Comparison
| Metric | V1 (best_model_f1_56_64) | V2 (best_model_V2) |
|-----------|--------------------------|--------------------|
| Micro F1 | 60.71% | 59.25% |
| Precision | 65.34% | 63.11% |
| Recall | 56.70% | 55.83% |
### Notes
- This model is designed for **document-level relation extraction** on the **DocRED** benchmark.
- V2 was trained as an ablation/comparison run against V1 to verify reproducibility and threshold sensitivity.
- Performance may vary depending on preprocessing details, threshold settings, and evaluation configuration.
## License and Dataset Notice
### Code / Model License
This project is built upon several open-source works:
- **HuggingFace Transformers / BERT** — Apache License 2.0
- **ATLOP** — MIT License
- **GAIN** — MIT License
- **DREEAM** — based on the original paper and implementation references
### Dataset Notice
This model is trained and evaluated on the **DocRED** dataset.
**DocRED is intended for research use.** Users should separately review the dataset's original terms and conditions before any redistribution or commercial use.
### Intended Use
This repository is intended for:
- academic research
- experimentation on document-level relation extraction
- knowledge graph construction pipelines
- benchmark comparison and ablation studies
It is **not guaranteed for production use** without additional validation.