Parsan โ€” Hebrew morphosyntax models

Trained weights for Parsan, a single joint model for Hebrew morphosyntax: raw text in, Universal Dependencies CoNLL-U out (segmentation, POS, morphological features, lemmas, dependency parse).

Try it in your browser: noamor/parsan-demo โ€” paste Hebrew text and get a dependency tree + CoNLL-U.

This repository holds three checkpoints, each in its own folder:

folder what it is size
joint_base DictaBERT joint tagger + parser + lemma (best accuracy) ~749 MB
joint_tiny2 DictaBERT-tiny variant (~3x faster) ~185 MB
seg_char_ctx character segmenter on dictabert-char ~353 MB

Use

Install the library, download the weights, and point PARSAN_RUNS at them (the library also fetches them automatically on first use):

from huggingface_hub import snapshot_download
snapshot_download("noamor/parsan", local_dir="runs")
PARSAN_RUNS=$PWD/runs python scripts/predict.py \
    --text input.txt --sent newline --profile base --out out.conllu

--profile base|tiny, --segmenter char|rftok.

Results

End-to-end from raw text, IAHLT gold, LAS (F1x100); OOD is the micro-average over five held-out genres.

system wiki knesset OOD
Parsan 92.2 88.6 89.2
HebPipe 89.7 86.2 86.1
Stanza 83.1 80.7 80.5

Credits

Built on DictaBERT (Dicta) and the UD Hebrew-IAHLT treebanks. Thanks to Amir Zeldes for the encouragement and inspiration, and to Avner Algom (IAHLT). MIT license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using noamor/parsan 1