Parsan — Hebrew morphosyntax models

Trained weights for Parsan, a single joint model for Hebrew morphosyntax: raw text in, Universal Dependencies CoNLL-U out (segmentation, POS, morphological features, lemmas, dependency parse).

Try it in your browser: noamor/parsan-demo — paste Hebrew text and get a dependency tree + CoNLL-U.

This repository holds three checkpoints, each in its own folder:

folder	what it is	size
`joint_base`	DictaBERT joint tagger + parser + lemma (best accuracy)	~749 MB
`joint_tiny2`	DictaBERT-tiny variant (~3x faster)	~185 MB
`seg_char_ctx`	character segmenter on dictabert-char	~353 MB

Use

Install the library, download the weights, and point PARSAN_RUNS at them (the library also fetches them automatically on first use):

from huggingface_hub import snapshot_download
snapshot_download("noamor/parsan", local_dir="runs")

PARSAN_RUNS=$PWD/runs python scripts/predict.py \
    --text input.txt --sent newline --profile base --out out.conllu

--profile base|tiny, --segmenter char|rftok.

Results

End-to-end from raw text, IAHLT gold, LAS (F1x100); OOD is the micro-average over five held-out genres.

system	wiki	knesset	OOD
Parsan	92.2	88.6	89.2
HebPipe	89.7	86.2	86.1
Stanza	83.1	80.7	80.5

Credits

Built on DictaBERT (Dicta) and the UD Hebrew-IAHLT treebanks. Thanks to Amir Zeldes for the encouragement and inspiration, and to Avner Algom (IAHLT). MIT license.

Downloads last month: -; Downloads are not tracked for this model. How to track

noamor
/

parsan

Parsan — Hebrew morphosyntax models

Use

Results

Credits

Space using noamor/parsan 1