BEREL-seg: TBD

State-of-the-art language model for Rabbinic Hebrew, released [here] - add link.

This model is fine-tuned from BEREL_3.0 for the prefix segmentation task.

Sample usage:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('dicta-il/BEREL-seg')
model = AutoModel.from_pretrained('dicta-il/BEREL-seg', trust_remote_code=True)

model.eval()

sentence = 'ื•ื–ื” ืœืฉื•ืŸ ื”ืจืžื‘ืดืŸ ื‘ืคื™ืจื•ืฉื• ืขืœ ื”ืชื•ืจื”, ืฉื”ื“ื‘ืจ ื™ื“ื•ืข ื•ืžืคื•ืจืกื ืœื›ืœ ื‘ืขืœื™ ื”ืขื™ื•ืŸ ืฉืื™ืŸ ื”ืžืงืจื ื™ื•ืฆื ืžื™ื“ื™ ืคืฉื•ื˜ื• ืืฃ ืขืœ ืคื™ ืฉื”ื“ืจืฉ ืืžืช.'

print(model.predict([sentence], tokenizer))

Output:

[
  [
    [ "[CLS]" ],
    [ "ื•", "ื–ื”" ],
    [ "ืœืฉื•ืŸ" ],
    [ "ื”", "ืจืžื‘\"ืŸ" ],
    [ "ื‘", "ืคื™ืจื•ืฉื•" ],
    [ "ืขืœ" ],
    [ "ื”", "ืชื•ืจื”" ],
    [ ", " ],
    [ "ืฉื”ื“", "ื‘ืจ" ],
    [ "ื™ื“ื•ืข" ],
    [ "ื•", "ืžืคื•ืจืกื" ],
    [ "ืœ", "ื›ืœ" ],
    [ "ื‘ืขืœื™" ],
    [ "ื”", "ืขื™ื•ืŸ" ],
    [ "ืฉ", "ืื™ืŸ" ],
    [ "ื”", "ืžืงืจื" ],
    [ "ื™ื•ืฆื" ],
    [ "ืคืฉื•ื˜ื•" ],
    [ "ืืฃ" ],
    [ "ืขืœ" ],
    [ "ืคื™" ],
    [ "ืฉื”ื“", "ืจืฉ" ],
    [ "ืืžืช" ],
    [ "." ],
    [ "[SEP]" ]
  ]
]

Citation

If you use BEREL-seg in your research, please cite tbd

BibTeX:

tbd

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dicta-il/BEREL-seg

Finetuned
(5)
this model