File size: 1,447 Bytes


---
license: apache-2.0
datasets:
- mostafaamiri/khamenei_ir_1352_1403_08_13
language:
- fa
library_name: fasttext
---
# Model Card for Model ID

Khamenei Word embeding

## Model Details

### Model Description

The resulting linguistic representation encapsulates semantic relationships from decades of Persian-language political, theological, and jurisprudential discourse extracted from the official digital archive of Iran's Supreme Leader. By employing character-level n-gram decomposition, this approach overcomes classical vectorization limitations, enabling meaningful interpretation of morphologically complex Persian terms, rare Quranic Arabic insertions, and domain-specific neologisms that conventional methods would treat as out-of-vocabulary. The model captures intricate ideological associations—clustering concepts such as "mustazafeen" (the oppressed), "esteghlal" (independence), and "moghawemat" (resistance) within their unique conceptual framework while preserving the morphological nuances essential for analyzing Persian's agglutinative structures. Its robustness against spelling variations and capacity to generate vectors for previously unseen word forms by leveraging subword patterns make it particularly suited for processing this specialized corpus, where historical references, compound political terminology, and doctrinal language demand sophisticated contextual understanding beyond standard lexical analysis.