Anıl Sevinç

Update model card with corrected categories

763b053 about 1 year ago

4.01 kB

	---
	language: tr
	tags:
	- text-classification
	- customer-support
	- Turkish
	datasets:
	- Turkish_Conversations
	license: mit
	model_name: bert-topic-classification-turkish
	base_model: dbmdz/bert-base-turkish-cased
	library_name: transformers
	pipeline_tag: text-classification
	---


	# bert-topic-classification-turkish

	## Model Description
	This is a fine-tuned BERT model for topic classification on Turkish text data. The model is trained on a custom dataset, Turkish_Conversations, consisting of Turkish customer support conversations. The model classifies text into the following 5 categories:

	1. Financial Services (Finansal Hizmetler)
	2. Account Operations (Hesap İşlemleri)
	3. Technical Support (Teknik Destek)
	4. Products and Sales (Ürün ve Satış)
	5. Returns and Exchanges (İade ve Değişim)

	The model achieves an accuracy of 93.51% on the validation dataset.

	---

	## Usage
	Below is an example of how to use the model for topic classification:

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load the model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("GosamaIKU/bert-topic-classification-turkish")
	model = AutoModelForSequenceClassification.from_pretrained("GosamaIKU/bert-topic-classification-turkish")

	# Example dataset
	dataset = [
	{"conversation_id": 1, "speaker": "customer", "text": "Siparişim eksik geldi."},
	{"conversation_id": 1, "speaker": "representative", "text": "Hemen kontrol edip size bilgi vereceğim."},
	{"conversation_id": 1, "speaker": "customer", "text": "Anlayışınız için teşekkür ederim."}
	]

	# Combine texts for topic analysis
	combined_text = " ".join([item["text"] for item in dataset])
	inputs = tokenizer(combined_text, return_tensors="pt")
	outputs = model(**inputs)

	# Access topic classification results
	logits = outputs.logits
	predicted_class = logits.argmax(dim=1).item()
	print(f"Predicted Topic Class ID: {predicted_class}")

	```

	## Training Details
	- Base Model: [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased)
	- Dataset: Turkish_Conversations (Custom dataset for Turkish customer support)
	- Epochs: 5
	- Batch Size: 8
	- Learning Rate: 0.00005
	- Accuracy: 93.51%
	- Framework: PyTorch

	---

	## Limitations
	- The model may not perform well on text significantly different from the training data (e.g., informal or slang language).
	- It is designed for topic classification and may not generalize to other NLP tasks like sentiment analysis or intent detection.
	- Performance may degrade on very short or ambiguous texts.

	---

	## Model Files
	This repository contains the following files:
	- `config.json`: Model configuration file.
	- `model.safetensors`: Model weights.
	- `special_tokens_map.json`: Special tokens used in the tokenizer.
	- `tokenizer_config.json`: Tokenizer configuration file.
	- `vocab.txt`: Vocabulary file for the tokenizer.

	---

	## Links and Resources
	- Base Model: [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased)
	- Zero-Shot Model (Optional): [xlm-roberta-large-xnli](https://huggingface.co/joeddav/xlm-roberta-large-xnli)
	- Fine-Tuned Model: [GosamaIKU/bert-topic-classification-turkish](https://huggingface.co/GosamaIKU/bert-topic-classification-turkish)

	---

	## Dataset
	The model was fine-tuned on a custom dataset named Turkish_Conversations, which consists of 2,695 Turkish customer support conversations. The dataset includes text labeled into the following categories:
	- Financial Services
	- Account Operations
	- Technical Support
	- Products and Sales
	- Returns and Exchanges

	If you wish to access this dataset, please upload it to the repository or share a link to download it.

	---

	## License
	This model is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.