| | ---
|
| | language: tr
|
| | tags:
|
| | - text-classification
|
| | - customer-support
|
| | - Turkish
|
| | datasets:
|
| | - Turkish_Conversations
|
| | license: mit
|
| | model_name: bert-topic-classification-turkish
|
| | base_model: dbmdz/bert-base-turkish-cased
|
| | library_name: transformers
|
| | pipeline_tag: text-classification
|
| | ---
|
| |
|
| |
|
| | # bert-topic-classification-turkish
|
| |
|
| | ## Model Description
|
| | This is a fine-tuned BERT model for topic classification on Turkish text data. The model is trained on a custom dataset, **Turkish_Conversations**, consisting of Turkish customer support conversations. The model classifies text into the following 5 categories:
|
| |
|
| | 1. **Financial Services** (Finansal Hizmetler)
|
| | 2. **Account Operations** (Hesap İşlemleri)
|
| | 3. **Technical Support** (Teknik Destek)
|
| | 4. **Products and Sales** (Ürün ve Satış)
|
| | 5. **Returns and Exchanges** (İade ve Değişim)
|
| |
|
| | The model achieves an accuracy of **93.51%** on the validation dataset.
|
| |
|
| | ---
|
| |
|
| | ## Usage
|
| | Below is an example of how to use the model for topic classification:
|
| |
|
| | ```python
|
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| |
|
| | # Load the model and tokenizer
|
| | tokenizer = AutoTokenizer.from_pretrained("GosamaIKU/bert-topic-classification-turkish")
|
| | model = AutoModelForSequenceClassification.from_pretrained("GosamaIKU/bert-topic-classification-turkish")
|
| |
|
| | # Example dataset
|
| | dataset = [
|
| | {"conversation_id": 1, "speaker": "customer", "text": "Siparişim eksik geldi."},
|
| | {"conversation_id": 1, "speaker": "representative", "text": "Hemen kontrol edip size bilgi vereceğim."},
|
| | {"conversation_id": 1, "speaker": "customer", "text": "Anlayışınız için teşekkür ederim."}
|
| | ]
|
| |
|
| | # Combine texts for topic analysis
|
| | combined_text = " ".join([item["text"] for item in dataset])
|
| | inputs = tokenizer(combined_text, return_tensors="pt")
|
| | outputs = model(**inputs)
|
| |
|
| | # Access topic classification results
|
| | logits = outputs.logits
|
| | predicted_class = logits.argmax(dim=1).item()
|
| | print(f"Predicted Topic Class ID: {predicted_class}")
|
| |
|
| | ```
|
| |
|
| | ## Training Details
|
| | - **Base Model:** [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased)
|
| | - **Dataset:** **Turkish_Conversations** (Custom dataset for Turkish customer support)
|
| | - **Epochs:** 5
|
| | - **Batch Size:** 8
|
| | - **Learning Rate:** 0.00005
|
| | - **Accuracy:** 93.51%
|
| | - **Framework:** PyTorch
|
| |
|
| | ---
|
| |
|
| | ## Limitations
|
| | - The model may not perform well on text significantly different from the training data (e.g., informal or slang language).
|
| | - It is designed for topic classification and may not generalize to other NLP tasks like sentiment analysis or intent detection.
|
| | - Performance may degrade on very short or ambiguous texts.
|
| |
|
| | ---
|
| |
|
| | ## Model Files
|
| | This repository contains the following files:
|
| | - `config.json`: Model configuration file.
|
| | - `model.safetensors`: Model weights.
|
| | - `special_tokens_map.json`: Special tokens used in the tokenizer.
|
| | - `tokenizer_config.json`: Tokenizer configuration file.
|
| | - `vocab.txt`: Vocabulary file for the tokenizer.
|
| |
|
| | ---
|
| |
|
| | ## Links and Resources
|
| | - **Base Model:** [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased)
|
| | - **Zero-Shot Model (Optional):** [xlm-roberta-large-xnli](https://huggingface.co/joeddav/xlm-roberta-large-xnli)
|
| | - **Fine-Tuned Model:** [GosamaIKU/bert-topic-classification-turkish](https://huggingface.co/GosamaIKU/bert-topic-classification-turkish)
|
| |
|
| | ---
|
| |
|
| | ## Dataset
|
| | The model was fine-tuned on a custom dataset named **Turkish_Conversations**, which consists of 2,695 Turkish customer support conversations. The dataset includes text labeled into the following categories:
|
| | - Financial Services
|
| | - Account Operations
|
| | - Technical Support
|
| | - Products and Sales
|
| | - Returns and Exchanges
|
| |
|
| | If you wish to access this dataset, please upload it to the repository or share a link to download it.
|
| |
|
| | ---
|
| |
|
| | ## License
|
| | This model is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
|
| | |