PersianCorefUD-CorPipe

Fine-tuned Persian coreference resolution model based on CorPipe 25 with a google/mt5-large encoder. This is the first coreference model for Persian literary text, trained on the Mehr news corpus and the PersianCorefUD corpus (The Little Prince / شازده کوچولو).

This model accompanies the paper:

Nassajian, M. (2025). PersianCorefUD: A Coreference Resolution Corpus for Persian Literary Text. Manuscript in preparation.

Model details

Property	Value
Base model	`ufal/corpipe25-corefud1.3-large-251101`
Encoder	`google/mt5-large`
Training data	Mehr corpus (320 docs) + Little Prince fold 1 (1,005 sentences)
Optimizer	AdaFactor, lr=2e-5, cosine decay
Epochs	60
Batch size	8
Sampling exponent	0.7

Performance on PersianCorefUD (Little Prince test set)

System	CoNLL F1	Zero F1
System 2 — this model	52.62%	0.61%
System 6 — this model + rule-based zero linker	58.70%	83.30%*

*Zero F1 for System 6 uses gold zero pronoun node positions.

How to use

Your input must be a CoNLL-U file with Universal Dependencies annotation. Produce it first with UDPipe 2 using the Persian-PerDT model.

Step 1 — Get CorPipe 25

git clone https://github.com/ufal/corpipe
cd corpipe
pip install -r requirements.txt

Step 2 — Download this model

from huggingface_hub import snapshot_download
snapshot_download(
    "Mnsjn/PersianCorefUD-CorPipe",
    local_dir="persian_coref_model/"
)

Step 3 — Parse your Persian text with UDPipe first

Go to https://lindat.mff.cuni.cz/services/udpipe/, choose model persian-perdt, paste your text, download the CoNLL-U output.

Step 4 — Run coreference prediction

python corpipe25.py \
    --load persian_coref_model/ \
    --test your_file.conllu \
    --out your_file_coref.conllu

The output is a CoNLL-U file with Entity=(cXX) annotations in the MISC column following the CorefUD 1.0 format.

Input format

The model expects standard CoNLL-U files. Zero pronoun nodes (empty nodes with decimal IDs like 4.1) must already be present in the input if you want zero pronoun coreference to be predicted. If you are working with raw text without pre-annotated zero pronouns, the model will still predict coreference for overt mentions.

Corpus

The PersianCorefUD corpus used to train and evaluate this model is available at: https://github.com/Mnsjn/PersianCorefUD

Citation

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mnsjn/PersianCorefUD-CorPipe

Base model

google/mt5-large

Finetuned

ufal/corpipe25-corefud1.3-large-251101

Finetuned

(1)

this model