dbbiyte
/

CISA-BERTurk-sentiment

PyTorch

bert

Model card Files Files and versions

xet

Community

ilterm commited on Jul 30, 2025

Commit

23558e1

verified ·

1 Parent(s): 76b505a

Update README.md

Browse files

Files changed (1) hide show

README.md +177 -3

README.md CHANGED Viewed

@@ -1,3 +1,177 @@
----
-license: cc-by-nc-4.0
----

+---
+license: cc-by-nc-4.0
+---
+---
+---
+# CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish
+This model performs **Cross-Individual Sentiment Analysis (CISA)** on historical Turkish texts (1900-1950), analyzing the **author's sentiment toward specific individuals** mentioned in the text, rather than the overall text sentiment.
+## 🎯 Model Details
+- **Model Name**: CISA-BERTurk-Sentiment
+- **Base Model**: [BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) (dbmdz/bert-base-turkish-cased)
+- **Architecture**: DECA-EBSA (Dual-Encoder Context-Aware Entity-Based Sentiment Analysis)
+- **Language**: Turkish
+- **Period**: Historical Turkish texts (1900-1950)
+- **Task**: Cross-Individual Sentiment Analysis
+- **Classes**:
+  - 0: Negative
+  - 1: Neutral
+  - 2: Positive
+## 🆚 CISA vs Standard Sentiment Analysis
+### Example Comparison:
+**Text**: *"Ali Bey'in vefatı bizleri elem-i azîme sevk etmişti"* (Ali Bey's death filled us all with sadness)
+| Analysis Type | Result | Explanation |
+|--------------|--------|-------------|
+| **Standard SA** | ❌ Negative | Overall text tone is sad |
+| **CISA** | ✅ Positive | Author's respect/love for Ali Bey |
+### CISA Advantages:
+- ✅ **Person-focused** sentiment detection
+- ✅ **Author perspective** analysis
+- ✅ **Entity-based** precision
+- ✅ **Context-aware** evaluation
+## 📊 Performance Metrics
+| Metric | Value |
+|--------|-------|
+| **Accuracy** | **87.08%** |
+| **Precision** | **87.07%** |
+| **Recall** | **87.08%** |
+| **F1-Score** | **87.05%** |
+## 📈 Dataset Information
+- **Total Texts**: 7,816
+- **Total Entities**: 9,249
+- **Average Entities per Text**: 1.18
+- **Sentiment Distribution**:
+  - Negative: 2,357 (25.5%)
+  - Neutral: 3,563 (38.5%)
+  - Positive: 3,329 (36.0%)
+## 🚀 Usage
+**Note**: This model uses a complex DECA-EBSA architecture with enhanced attention mechanisms, Turkish linguistic features, and contextual encoding. The full implementation requires the complete model architecture from the training code.
+### Model Loading
+```python
+from transformers import AutoTokenizer
+from huggingface_hub import hf_hub_download
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained("dbbiyte/CISA-BERTurk-sentiment")
+# Download model weights
+weights_path = hf_hub_download("dbbiyte/CISA-BERTurk-sentiment", "pytorch_model.bin")
+print("Model weights downloaded successfully!")
+print("For full CISA analysis, use the complete PositionAwareDualEncoderEBSA architecture from the training code.")
+```
+### Expected CISA Results
+For the examples in our test set:
+| Text | Entity | Standard SA | CISA Result |
+|------|--------|-------------|-------------|
+| "Ali Bey'in vefatı hepimizi hüzne boğmuştu" | Ali Bey | Negative | **Positive** |
+| "Leyla Hanım'ın musiki resitalinde, nağmelerinin ruhuma işledi" | Leyla Hanım | Positive | **Positive** |
+**CISA Key Insight**: The model analyzes the author's sentiment toward the mentioned person, not the overall text sentiment.
+## 🏗️ DECA-EBSA Architecture
+### Dual-Encoder Structure:
+1. **Text Encoder**: Full text context processing
+2. **Entity Encoder**: Entity + local context processing
+### Key Features:
+- **Enhanced Entity-Context Attention**: 12-head cross-attention
+- **Position-Aware Modeling**: Entity position information
+- **Turkish Linguistic Features**: Ottoman Turkish specific patterns
+- **Context-Aware Classification**: Formal/informal distinction
+- **Adaptive Focal Loss**: Focus on difficult examples
+- **R-Drop Regularization**: Consistency enforcement
+## 🔬 Research Contributions
+### 1. Cross-Individual Sentiment Analysis (CISA)
+- **First application** of CISA to historical Turkish
+- **Author perspective** focused sentiment analysis
+- **Entity-based approach** for person-specific emotions
+### 2. DECA-EBSA Methodology
+- **Dual-Encoder** architecture
+- **Context-Aware** modeling
+- **Entity-Based** attention mechanisms
+### 3. Historical Turkish NLP Contributions
+- **1900-1950 period** specialized dataset
+- **Ottoman Turkish** linguistic features
+- **Formal/informal** context distinction
+## 👥 Authors
+**İzmir Institute of Technology - Digital Humanities and AI Laboratory**:
+- **Dr. Mustafa İLTER** - İzmir Institute of Technology
+- **Dr. Doğan EVECEN** - İzmir Institute of Technology
+- **Dr. Buket ERŞAHİN** - İzmir Institute of Technology
+- **Dr. Yasemin ÖZCAN GÖNÜLAL** - İzmir Institute of Technology
+- **Assoc. Prof.. Selma TEKİR** - İzmir Institute of Technology
+**Pamukkale University**:
+- **Assoc. Prof. Sezen KARABULUT** - Pamukkale University
+- **İbrahim BERCİ** - Pamukkale University
+- **Emre ONUÇ** - Pamukkale University
+## 🏦 Funding & Acknowledgments
+This work was supported by **The Scientific and Technological Research Council of Turkey (TÜBİTAK)** under project number **323K372**. We thank TÜBİTAK for their support.
+## 📚 BERTurk Reference
+This model uses [BERTurk](https://github.com/stefan-it/turkish-bert) developed by Stefan Schweter, a BERT model pre-trained on 35GB of Turkish text, optimized for Turkish natural language processing tasks.
+## 📄 License and Usage Terms
+This model is released under **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)** license.
+### ✅ Permitted Uses:
+- **Academic research** (citation required)
+- **Educational purposes**
+- **Non-profit projects**
+- **Personal experimental studies**
+### ❌ Prohibited Uses:
+- **Commercial applications**
+- **Profit-driven projects**
+- **Commercial product/service development**
+### 📄 Citation Requirement:
+When using this model, please cite as:
+```bibtex
+@software{ilter2025cisa,
+  author = {İlter, Mustafa and Evecen, Doğan and Erşahin, Buket and Özcan Gönülal, Yasemin and Karabulut, Sezen and Berci, İbrahim and Onuç, Emre and Tekir, Selma},
+  title = {CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish},
+  url = {https://huggingface.co/dbbiyte/CISA-BERTurk-sentiment},
+  year = {2025},
+}
+```
+## 🚨 Limitations
+- Model is optimized specifically for **1900-1950 period Turkish texts**
+- Performance may vary on **modern Turkish texts**
+- **Historical spelling conventions** and **archaic vocabulary** should be considered
+- Maximum sequence length is **256 tokens**
+## 🏷️ Model Tags
+`turkish` `sentiment-analysis` `historical-texts` `entity-based` `cross-individual` `berturk` `bert` `1900-1950` `pytorch` `safetensors`