PersianCorefUD-CorPipe

Fine-tuned Persian coreference resolution model based on CorPipe 25 with a google/mt5-large encoder. This is the first coreference model for Persian literary text, trained on the Mehr news corpus and the PersianCorefUD corpus (The Little Prince / ุดุงุฒุฏู‡ ฺฉูˆฺ†ูˆู„ูˆ).

This model accompanies the paper:

Nassajian, M. (2025). PersianCorefUD: A Coreference Resolution Corpus for Persian Literary Text. Manuscript in preparation.


Model details

Property Value
Base model ufal/corpipe25-corefud1.3-large-251101
Encoder google/mt5-large
Training data Mehr corpus (320 docs) + Little Prince fold 1 (1,005 sentences)
Optimizer AdaFactor, lr=2e-5, cosine decay
Epochs 60
Batch size 8
Sampling exponent 0.7

Performance on PersianCorefUD (Little Prince test set)

System CoNLL F1 Zero F1
System 2 โ€” this model 52.62% 0.61%
System 6 โ€” this model + rule-based zero linker 58.70% 83.30%*

*Zero F1 for System 6 uses gold zero pronoun node positions.


How to use

Your input must be a CoNLL-U file with Universal Dependencies annotation. Produce it first with UDPipe 2 using the Persian-PerDT model.

Step 1 โ€” Get CorPipe 25

git clone https://github.com/ufal/corpipe
cd corpipe
pip install -r requirements.txt

Step 2 โ€” Download this model

from huggingface_hub import snapshot_download
snapshot_download(
    "Mnsjn/PersianCorefUD-CorPipe",
    local_dir="persian_coref_model/"
)

Step 3 โ€” Parse your Persian text with UDPipe first

Go to https://lindat.mff.cuni.cz/services/udpipe/, choose model persian-perdt, paste your text, download the CoNLL-U output.

Step 4 โ€” Run coreference prediction

python corpipe25.py \
    --load persian_coref_model/ \
    --test your_file.conllu \
    --out your_file_coref.conllu

The output is a CoNLL-U file with Entity=(cXX) annotations in the MISC column following the CorefUD 1.0 format.


Input format

The model expects standard CoNLL-U files. Zero pronoun nodes (empty nodes with decimal IDs like 4.1) must already be present in the input if you want zero pronoun coreference to be predicted. If you are working with raw text without pre-annotated zero pronouns, the model will still predict coreference for overt mentions.


Corpus

The PersianCorefUD corpus used to train and evaluate this model is available at: https://github.com/Mnsjn/PersianCorefUD


Citation

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mnsjn/PersianCorefUD-CorPipe

Base model

google/mt5-large
Finetuned
(1)
this model