opdullah's picture
Update README.md
100be00 verified
---
license: apache-2.0
language:
- tr
base_model:
- dbmdz/bert-base-turkish-cased
pipeline_tag: text-classification
---
# Turkish BERT for Aspect-Based Sentiment Analysis
This model is a fine-tuned version of [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) specifically trained for aspect-based sentiment analysis on Turkish e-commerce product reviews.
## Model Description
- **Base Model**: dbmdz/bert-base-turkish-cased
- **Task**: Sequence Classification (Aspect-Based Sentiment Analysis)
- **Language**: Turkish
- **Domain**: E-commerce product reviews
## Model Performance
- **F1 Score**: 88% on test set
- **Test Set Size**: 4,000 samples
- **Training Set Size**: 36,000 samples
## Training Details
### Training Data
- **Dataset Size**: 36,000 reviews
- **Data Source**: Private e-commerce product review dataset
- **Domain**: E-commerce product reviews in Turkish
- **Coverage**: Over 500 product categories
### Training Configuration
- **Epochs**: 5
- **Task Type**: Sequence Classification
- **Input Format**: `[aspect_term] [SEP] [review_text]`
- **Label Classes**:
- `positive`: Positive sentiment towards the aspect
- `negative`: Negative sentiment towards the aspect
- `neutral`: Neutral sentiment towards the aspect
### Training Loss
The model showed consistent improvement across epochs:
| Epoch | Loss |
|-------|------|
| 1 | 0.47 |
| 2 | 0.34 |
| 3 | 0.25 |
| 4 | 0.22 |
| 5 | 0.11 |
## Usage
### Option 1: Using Pipeline (Recommended)
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa")
model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa")
# Create pipeline
sentiment_analyzer = pipeline("text-classification",
model=model,
tokenizer=tokenizer)
# Example usage
aspect = "arka kamerası"
review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."
text = f"{aspect} [SEP] {review}"
result = sentiment_analyzer(text)
print(result)
```
**Expected Output:**
```python
[{'label': 'positive', 'score': 0.9998155236244202}]
```
### Option 2: Manual Inference
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa")
model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa")
# Example aspect and review
aspect = "arka kamerası"
review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."
# Tokenize aspect and review together
inputs = tokenizer(aspect, review, return_tensors="pt", truncation=True, padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class_id = predictions.argmax(dim=-1).item()
confidence = predictions.max().item()
# Convert prediction to label
predicted_label = model.config.id2label[predicted_class_id]
print(f"Aspect: {aspect}")
print(f"Sentiment: {predicted_label}")
print(f"Confidence: {confidence:.4f}")
```
**Expected Output:**
```
Aspect: arka kamerası
Sentiment: positive
Confidence: 0.9998
```
### Option 3: Batch Inference
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa")
model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa")
# Example aspect-review pairs
examples = [
("arka kamerası", "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."),
("bataryası", "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."),
("fiyatı", "Ürünün fiyatı çok uygun ve kalitesi de iyi."),
]
aspects = [ex[0] for ex in examples]
reviews = [ex[1] for ex in examples]
# Tokenize all pairs
inputs = tokenizer(aspects, reviews, return_tensors="pt", truncation=True, padding=True)
# Get predictions for all pairs
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class_ids = predictions.argmax(dim=-1)
confidences = predictions.max(dim=-1).values
# Display results
for i, (aspect, review) in enumerate(examples):
predicted_label = model.config.id2label[predicted_class_ids[i].item()]
confidence = confidences[i].item()
print(f"Aspect: {aspect}")
print(f"Sentiment: {predicted_label} (confidence: {confidence:.4f})")
print("-" * 40)
```
**Expected Output:**
```
Aspect: arka kamerası
Sentiment: positive (confidence: 0.9998)
Aspect: bataryası
Sentiment: negative (confidence: 0.9990)
Aspect: fiyatı
Sentiment: positive (confidence: 0.9998)
```
## Combined Usage with Aspect Extraction (Recommended)
This model works perfectly with the aspect extraction model [opdullah/bert-turkish-ecomm-aspect-extraction](https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction) for complete aspect-based sentiment analysis:
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, AutoModelForSequenceClassification, pipeline
import torch
# Load aspect extraction model
aspect_extractor = pipeline("token-classification",
model="opdullah/bert-turkish-ecomm-aspect-extraction",
aggregation_strategy="simple")
# Load sentiment analysis model
sentiment_tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa")
sentiment_model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa")
def analyze_aspect_sentiment(review):
# Extract aspects
aspects = aspect_extractor(review)
results = []
for aspect in aspects:
if aspect['entity_group'] == 'ASPECT':
aspect_text = aspect['word']
# Analyze sentiment
inputs = sentiment_tokenizer(aspect_text, review, return_tensors="pt", truncation=True)
with torch.no_grad():
outputs = sentiment_model(**inputs)
prediction = outputs.logits.argmax().item()
sentiment = sentiment_model.config.id2label[prediction]
results.append({'aspect': aspect_text, 'sentiment': sentiment})
return results
# Usage
review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."
results = analyze_aspect_sentiment(review)
for result in results:
print(f"{result['aspect']}: {result['sentiment']}")
```
**Expected Output:**
```
arka kamerası: positive
bataryası: negative
```
## Label Mapping
```python
id2label = {
0: "negative",
1: "neutral",
2: "positive"
}
label2id = {
"negative": 0,
"neutral": 1,
"positive": 2
}
```
## Intended Use
This model is designed for:
- Analyzing sentiment of specific aspects in Turkish e-commerce product reviews
- Building complete aspect-based sentiment analysis systems
- Understanding customer opinions on specific product features
- Supporting recommendation systems and review analysis tools
## Limitations
- Trained specifically on e-commerce domain data
- Requires aspect terms to be identified beforehand (use with aspect extraction model)
- Performance may vary on other domains or text types
- Limited to Turkish language
- Based on private dataset, so reproducibility may be limited
## Citation
If you use this model, please cite:
```
@misc{turkish-bert-absa,
title={Turkish BERT for Aspect-Based Sentiment Analysis},
author={Abdullah Koçak},
year={2025},
url={https://huggingface.co/opdullah/bert-turkish-ecomm-absa}
}
```
## Base Model Citation
```
@misc{schweter2020bertbase,
title={BERTurk - BERT models for Turkish},
author={Stefan Schweter},
year={2020},
url={https://huggingface.co/dbmdz/bert-base-turkish-cased}
}
```
## Related Models
- [opdullah/bert-turkish-ecomm-aspect-extraction](https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction) - For extracting aspect terms from Turkish e-commerce reviews