| | --- |
| | pipeline_tag: sentence-similarity |
| | tags: |
| | - sentence-transformers |
| | - feature-extraction |
| | - sentence-similarity |
| | datasets: |
| | - HaifaCLGroup/KnessetCorpus |
| | language: |
| | - he |
| | base_model: |
| | - intfloat/multilingual-e5-large |
| | --- |
| | |
| | # Knesset-multi-e5-large |
| |
|
| | This is a [sentence-transformers](https://www.sbert.net) model. It maps sentences and paragraphs to a 1024-dimensional dense vector space and can be used for tasks like clustering or semantic search. |
| |
|
| | **Knesset-multi-e5-large** is based on the [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) model. |
| | The transformer encoder has been fine-tuned on [Knesset data](https://huggingface.co/datasets/HaifaCLGroup/KnessetCorpus) to better capture legislative and parliamentary language. |
| |
|
| |
|
| | [ArXiv](https://arxiv.org/abs/2512.05231) |
| |
|
| | ## Usage (Sentence-Transformers) |
| |
|
| | Using this model is straightforward if you have [sentence-transformers](https://www.sbert.net) installed: |
| |
|
| | ```bash |
| | pip install -U sentence-transformers |
| | ``` |
| |
|
| | Then you can use the model like this: |
| |
|
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | sentences = ["ืื ืืฉืคื ืจืืฉืื ืืืืืื", "ืื ืืืฉืคื ืืฉื ื"] |
| | |
| | model = SentenceTransformer('GiliGold/Knesset-multi-e5-large') |
| | embeddings = model.encode(sentences) |
| | print(embeddings) |
| | ``` |
| |
|
| |
|
| | ## Full Model Architecture |
| | ``` |
| | SentenceTransformer( |
| | (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel |
| | (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False}) |
| | (2): Normalize() |
| | ) |
| | ``` |
| | ## Additional Details |
| | - Base Model: intfloat/multilingual-e5-large |
| | - Fine-Tuning Data: Knesset data |
| | - Key Modifications: |
| | The encoder part has been fine-tuned on [Knesset data](https://huggingface.co/datasets/HaifaCLGroup/KnessetCorpus) to enhance performance for tasks involving legislative and parliamentary content. |
| | The original pooling and normalization layers have been retained to ensure that the model's embeddings remain consistent with the architecture of the base model. |
| | ## Citing & Authors |
| | <!--- Describe where people can find more information --> |
| | @misc{goldin2025unveilingaffectivepolarizationtrends, |
| | title={Unveiling Affective Polarization Trends in Parliamentary Proceedings}, |
| | author={Gili Goldin and Ella Rabinovich and Shuly Wintner}, |
| | year={2025}, |
| | eprint={2512.05231}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2512.05231}, |
| | } |