Update README.md

100be00 verified 7 months ago

8.34 kB

	---
	license: apache-2.0
	language:
	- tr
	base_model:
	- dbmdz/bert-base-turkish-cased
	pipeline_tag: text-classification
	---
	# Turkish BERT for Aspect-Based Sentiment Analysis

	This model is a fine-tuned version of [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) specifically trained for aspect-based sentiment analysis on Turkish e-commerce product reviews.

	## Model Description

	- Base Model: dbmdz/bert-base-turkish-cased
	- Task: Sequence Classification (Aspect-Based Sentiment Analysis)
	- Language: Turkish
	- Domain: E-commerce product reviews

	## Model Performance

	- F1 Score: 88% on test set
	- Test Set Size: 4,000 samples
	- Training Set Size: 36,000 samples

	## Training Details

	### Training Data
	- Dataset Size: 36,000 reviews
	- Data Source: Private e-commerce product review dataset
	- Domain: E-commerce product reviews in Turkish
	- Coverage: Over 500 product categories

	### Training Configuration
	- Epochs: 5
	- Task Type: Sequence Classification
	- Input Format: `[aspect_term] [SEP] [review_text]`
	- Label Classes:
	- `positive`: Positive sentiment towards the aspect
	- `negative`: Negative sentiment towards the aspect
	- `neutral`: Neutral sentiment towards the aspect

	### Training Loss
	The model showed consistent improvement across epochs:

	\| Epoch \| Loss \|
	\|-------\|------\|
	\| 1 \| 0.47 \|
	\| 2 \| 0.34 \|
	\| 3 \| 0.25 \|
	\| 4 \| 0.22 \|
	\| 5 \| 0.11 \|

	## Usage

	### Option 1: Using Pipeline (Recommended)

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	from transformers import pipeline

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa")
	model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa")

	# Create pipeline
	sentiment_analyzer = pipeline("text-classification",
	model=model,
	tokenizer=tokenizer)

	# Example usage
	aspect = "arka kamerası"
	review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."
	text = f"{aspect} [SEP] {review}"
	result = sentiment_analyzer(text)
	print(result)
	```

	Expected Output:
	```python
	[{'label': 'positive', 'score': 0.9998155236244202}]
	```

	### Option 2: Manual Inference

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa")
	model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa")

	# Example aspect and review
	aspect = "arka kamerası"
	review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."

	# Tokenize aspect and review together
	inputs = tokenizer(aspect, review, return_tensors="pt", truncation=True, padding=True)

	# Get predictions
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class_id = predictions.argmax(dim=-1).item()
	confidence = predictions.max().item()

	# Convert prediction to label
	predicted_label = model.config.id2label[predicted_class_id]
	print(f"Aspect: {aspect}")
	print(f"Sentiment: {predicted_label}")
	print(f"Confidence: {confidence:.4f}")
	```

	Expected Output:
	```
	Aspect: arka kamerası
	Sentiment: positive
	Confidence: 0.9998
	```

	### Option 3: Batch Inference

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa")
	model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa")

	# Example aspect-review pairs
	examples = [
	("arka kamerası", "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."),
	("bataryası", "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."),
	("fiyatı", "Ürünün fiyatı çok uygun ve kalitesi de iyi."),
	]

	aspects = [ex[0] for ex in examples]
	reviews = [ex[1] for ex in examples]

	# Tokenize all pairs
	inputs = tokenizer(aspects, reviews, return_tensors="pt", truncation=True, padding=True)

	# Get predictions for all pairs
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class_ids = predictions.argmax(dim=-1)
	confidences = predictions.max(dim=-1).values

	# Display results
	for i, (aspect, review) in enumerate(examples):
	predicted_label = model.config.id2label[predicted_class_ids[i].item()]
	confidence = confidences[i].item()
	print(f"Aspect: {aspect}")
	print(f"Sentiment: {predicted_label} (confidence: {confidence:.4f})")
	print("-" * 40)
	```

	Expected Output:
	```
	Aspect: arka kamerası
	Sentiment: positive (confidence: 0.9998)

	Aspect: bataryası
	Sentiment: negative (confidence: 0.9990)

	Aspect: fiyatı
	Sentiment: positive (confidence: 0.9998)
	```

	## Combined Usage with Aspect Extraction (Recommended)

	This model works perfectly with the aspect extraction model [opdullah/bert-turkish-ecomm-aspect-extraction](https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction) for complete aspect-based sentiment analysis:

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification, AutoModelForSequenceClassification, pipeline
	import torch

	# Load aspect extraction model
	aspect_extractor = pipeline("token-classification",
	model="opdullah/bert-turkish-ecomm-aspect-extraction",
	aggregation_strategy="simple")

	# Load sentiment analysis model
	sentiment_tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa")
	sentiment_model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa")

	def analyze_aspect_sentiment(review):
	# Extract aspects
	aspects = aspect_extractor(review)

	results = []
	for aspect in aspects:
	if aspect['entity_group'] == 'ASPECT':
	aspect_text = aspect['word']

	# Analyze sentiment
	inputs = sentiment_tokenizer(aspect_text, review, return_tensors="pt", truncation=True)
	with torch.no_grad():
	outputs = sentiment_model(**inputs)
	prediction = outputs.logits.argmax().item()
	sentiment = sentiment_model.config.id2label[prediction]

	results.append({'aspect': aspect_text, 'sentiment': sentiment})

	return results

	# Usage
	review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."
	results = analyze_aspect_sentiment(review)

	for result in results:
	print(f"{result['aspect']}: {result['sentiment']}")
	```

	Expected Output:
	```
	arka kamerası: positive
	bataryası: negative
	```

	## Label Mapping

	```python
	id2label = {
	0: "negative",
	1: "neutral",
	2: "positive"
	}

	label2id = {
	"negative": 0,
	"neutral": 1,
	"positive": 2
	}
	```

	## Intended Use

	This model is designed for:
	- Analyzing sentiment of specific aspects in Turkish e-commerce product reviews
	- Building complete aspect-based sentiment analysis systems
	- Understanding customer opinions on specific product features
	- Supporting recommendation systems and review analysis tools

	## Limitations

	- Trained specifically on e-commerce domain data
	- Requires aspect terms to be identified beforehand (use with aspect extraction model)
	- Performance may vary on other domains or text types
	- Limited to Turkish language
	- Based on private dataset, so reproducibility may be limited

	## Citation

	If you use this model, please cite:

	```
	@misc{turkish-bert-absa,
	title={Turkish BERT for Aspect-Based Sentiment Analysis},
	author={Abdullah Koçak},
	year={2025},
	url={https://huggingface.co/opdullah/bert-turkish-ecomm-absa}
	}
	```

	## Base Model Citation

	```
	@misc{schweter2020bertbase,
	title={BERTurk - BERT models for Turkish},
	author={Stefan Schweter},
	year={2020},
	url={https://huggingface.co/dbmdz/bert-base-turkish-cased}
	}
	```

	## Related Models

	- [opdullah/bert-turkish-ecomm-aspect-extraction](https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction) - For extracting aspect terms from Turkish e-commerce reviews