PersianCorefUD-CorPipe
Fine-tuned Persian coreference resolution model based on
CorPipe 25 with a google/mt5-large
encoder. This is the first coreference model for Persian literary text,
trained on the Mehr news corpus and the PersianCorefUD corpus
(The Little Prince / ุดุงุฒุฏู ฺฉฺูููู).
This model accompanies the paper:
Nassajian, M. (2025). PersianCorefUD: A Coreference Resolution Corpus for Persian Literary Text. Manuscript in preparation.
Model details
| Property | Value |
|---|---|
| Base model | ufal/corpipe25-corefud1.3-large-251101 |
| Encoder | google/mt5-large |
| Training data | Mehr corpus (320 docs) + Little Prince fold 1 (1,005 sentences) |
| Optimizer | AdaFactor, lr=2e-5, cosine decay |
| Epochs | 60 |
| Batch size | 8 |
| Sampling exponent | 0.7 |
Performance on PersianCorefUD (Little Prince test set)
| System | CoNLL F1 | Zero F1 |
|---|---|---|
| System 2 โ this model | 52.62% | 0.61% |
| System 6 โ this model + rule-based zero linker | 58.70% | 83.30%* |
*Zero F1 for System 6 uses gold zero pronoun node positions.
How to use
Your input must be a CoNLL-U file with Universal Dependencies annotation. Produce it first with UDPipe 2 using the Persian-PerDT model.
Step 1 โ Get CorPipe 25
git clone https://github.com/ufal/corpipe
cd corpipe
pip install -r requirements.txt
Step 2 โ Download this model
from huggingface_hub import snapshot_download
snapshot_download(
"Mnsjn/PersianCorefUD-CorPipe",
local_dir="persian_coref_model/"
)
Step 3 โ Parse your Persian text with UDPipe first
Go to https://lindat.mff.cuni.cz/services/udpipe/, choose model
persian-perdt, paste your text, download the CoNLL-U output.
Step 4 โ Run coreference prediction
python corpipe25.py \
--load persian_coref_model/ \
--test your_file.conllu \
--out your_file_coref.conllu
The output is a CoNLL-U file with Entity=(cXX) annotations in
the MISC column following the CorefUD 1.0 format.
Input format
The model expects standard CoNLL-U files. Zero pronoun nodes
(empty nodes with decimal IDs like 4.1) must already be present
in the input if you want zero pronoun coreference to be predicted.
If you are working with raw text without pre-annotated zero pronouns,
the model will still predict coreference for overt mentions.
Corpus
The PersianCorefUD corpus used to train and evaluate this model is available at: https://github.com/Mnsjn/PersianCorefUD
Citation
Model tree for Mnsjn/PersianCorefUD-CorPipe
Base model
google/mt5-large