|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- tr |
|
|
base_model: |
|
|
- dbmdz/bert-base-turkish-cased |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
# Turkish BERT for Aspect-Based Sentiment Analysis |
|
|
|
|
|
This model is a fine-tuned version of [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) specifically trained for aspect-based sentiment analysis on Turkish e-commerce product reviews. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Base Model**: dbmdz/bert-base-turkish-cased |
|
|
- **Task**: Sequence Classification (Aspect-Based Sentiment Analysis) |
|
|
- **Language**: Turkish |
|
|
- **Domain**: E-commerce product reviews |
|
|
|
|
|
## Model Performance |
|
|
|
|
|
- **F1 Score**: 88% on test set |
|
|
- **Test Set Size**: 4,000 samples |
|
|
- **Training Set Size**: 36,000 samples |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
- **Dataset Size**: 36,000 reviews |
|
|
- **Data Source**: Private e-commerce product review dataset |
|
|
- **Domain**: E-commerce product reviews in Turkish |
|
|
- **Coverage**: Over 500 product categories |
|
|
|
|
|
### Training Configuration |
|
|
- **Epochs**: 5 |
|
|
- **Task Type**: Sequence Classification |
|
|
- **Input Format**: `[aspect_term] [SEP] [review_text]` |
|
|
- **Label Classes**: |
|
|
- `positive`: Positive sentiment towards the aspect |
|
|
- `negative`: Negative sentiment towards the aspect |
|
|
- `neutral`: Neutral sentiment towards the aspect |
|
|
|
|
|
### Training Loss |
|
|
The model showed consistent improvement across epochs: |
|
|
|
|
|
| Epoch | Loss | |
|
|
|-------|------| |
|
|
| 1 | 0.47 | |
|
|
| 2 | 0.34 | |
|
|
| 3 | 0.25 | |
|
|
| 4 | 0.22 | |
|
|
| 5 | 0.11 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Option 1: Using Pipeline (Recommended) |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load model and tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa") |
|
|
|
|
|
# Create pipeline |
|
|
sentiment_analyzer = pipeline("text-classification", |
|
|
model=model, |
|
|
tokenizer=tokenizer) |
|
|
|
|
|
# Example usage |
|
|
aspect = "arka kamerası" |
|
|
review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz." |
|
|
text = f"{aspect} [SEP] {review}" |
|
|
result = sentiment_analyzer(text) |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
**Expected Output:** |
|
|
```python |
|
|
[{'label': 'positive', 'score': 0.9998155236244202}] |
|
|
``` |
|
|
|
|
|
### Option 2: Manual Inference |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
|
|
# Load model and tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa") |
|
|
|
|
|
# Example aspect and review |
|
|
aspect = "arka kamerası" |
|
|
review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz." |
|
|
|
|
|
# Tokenize aspect and review together |
|
|
inputs = tokenizer(aspect, review, return_tensors="pt", truncation=True, padding=True) |
|
|
|
|
|
# Get predictions |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
predicted_class_id = predictions.argmax(dim=-1).item() |
|
|
confidence = predictions.max().item() |
|
|
|
|
|
# Convert prediction to label |
|
|
predicted_label = model.config.id2label[predicted_class_id] |
|
|
print(f"Aspect: {aspect}") |
|
|
print(f"Sentiment: {predicted_label}") |
|
|
print(f"Confidence: {confidence:.4f}") |
|
|
``` |
|
|
|
|
|
**Expected Output:** |
|
|
``` |
|
|
Aspect: arka kamerası |
|
|
Sentiment: positive |
|
|
Confidence: 0.9998 |
|
|
``` |
|
|
|
|
|
### Option 3: Batch Inference |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
|
|
# Load model and tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa") |
|
|
|
|
|
# Example aspect-review pairs |
|
|
examples = [ |
|
|
("arka kamerası", "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."), |
|
|
("bataryası", "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."), |
|
|
("fiyatı", "Ürünün fiyatı çok uygun ve kalitesi de iyi."), |
|
|
] |
|
|
|
|
|
aspects = [ex[0] for ex in examples] |
|
|
reviews = [ex[1] for ex in examples] |
|
|
|
|
|
# Tokenize all pairs |
|
|
inputs = tokenizer(aspects, reviews, return_tensors="pt", truncation=True, padding=True) |
|
|
|
|
|
# Get predictions for all pairs |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
predicted_class_ids = predictions.argmax(dim=-1) |
|
|
confidences = predictions.max(dim=-1).values |
|
|
|
|
|
# Display results |
|
|
for i, (aspect, review) in enumerate(examples): |
|
|
predicted_label = model.config.id2label[predicted_class_ids[i].item()] |
|
|
confidence = confidences[i].item() |
|
|
print(f"Aspect: {aspect}") |
|
|
print(f"Sentiment: {predicted_label} (confidence: {confidence:.4f})") |
|
|
print("-" * 40) |
|
|
``` |
|
|
|
|
|
**Expected Output:** |
|
|
``` |
|
|
Aspect: arka kamerası |
|
|
Sentiment: positive (confidence: 0.9998) |
|
|
|
|
|
Aspect: bataryası |
|
|
Sentiment: negative (confidence: 0.9990) |
|
|
|
|
|
Aspect: fiyatı |
|
|
Sentiment: positive (confidence: 0.9998) |
|
|
``` |
|
|
|
|
|
## Combined Usage with Aspect Extraction (Recommended) |
|
|
|
|
|
This model works perfectly with the aspect extraction model [opdullah/bert-turkish-ecomm-aspect-extraction](https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction) for complete aspect-based sentiment analysis: |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForTokenClassification, AutoModelForSequenceClassification, pipeline |
|
|
import torch |
|
|
|
|
|
# Load aspect extraction model |
|
|
aspect_extractor = pipeline("token-classification", |
|
|
model="opdullah/bert-turkish-ecomm-aspect-extraction", |
|
|
aggregation_strategy="simple") |
|
|
|
|
|
# Load sentiment analysis model |
|
|
sentiment_tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa") |
|
|
sentiment_model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa") |
|
|
|
|
|
def analyze_aspect_sentiment(review): |
|
|
# Extract aspects |
|
|
aspects = aspect_extractor(review) |
|
|
|
|
|
results = [] |
|
|
for aspect in aspects: |
|
|
if aspect['entity_group'] == 'ASPECT': |
|
|
aspect_text = aspect['word'] |
|
|
|
|
|
# Analyze sentiment |
|
|
inputs = sentiment_tokenizer(aspect_text, review, return_tensors="pt", truncation=True) |
|
|
with torch.no_grad(): |
|
|
outputs = sentiment_model(**inputs) |
|
|
prediction = outputs.logits.argmax().item() |
|
|
sentiment = sentiment_model.config.id2label[prediction] |
|
|
|
|
|
results.append({'aspect': aspect_text, 'sentiment': sentiment}) |
|
|
|
|
|
return results |
|
|
|
|
|
# Usage |
|
|
review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz." |
|
|
results = analyze_aspect_sentiment(review) |
|
|
|
|
|
for result in results: |
|
|
print(f"{result['aspect']}: {result['sentiment']}") |
|
|
``` |
|
|
|
|
|
**Expected Output:** |
|
|
``` |
|
|
arka kamerası: positive |
|
|
bataryası: negative |
|
|
``` |
|
|
|
|
|
## Label Mapping |
|
|
|
|
|
```python |
|
|
id2label = { |
|
|
0: "negative", |
|
|
1: "neutral", |
|
|
2: "positive" |
|
|
} |
|
|
|
|
|
label2id = { |
|
|
"negative": 0, |
|
|
"neutral": 1, |
|
|
"positive": 2 |
|
|
} |
|
|
``` |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is designed for: |
|
|
- Analyzing sentiment of specific aspects in Turkish e-commerce product reviews |
|
|
- Building complete aspect-based sentiment analysis systems |
|
|
- Understanding customer opinions on specific product features |
|
|
- Supporting recommendation systems and review analysis tools |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained specifically on e-commerce domain data |
|
|
- Requires aspect terms to be identified beforehand (use with aspect extraction model) |
|
|
- Performance may vary on other domains or text types |
|
|
- Limited to Turkish language |
|
|
- Based on private dataset, so reproducibility may be limited |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
``` |
|
|
@misc{turkish-bert-absa, |
|
|
title={Turkish BERT for Aspect-Based Sentiment Analysis}, |
|
|
author={Abdullah Koçak}, |
|
|
year={2025}, |
|
|
url={https://huggingface.co/opdullah/bert-turkish-ecomm-absa} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Base Model Citation |
|
|
|
|
|
``` |
|
|
@misc{schweter2020bertbase, |
|
|
title={BERTurk - BERT models for Turkish}, |
|
|
author={Stefan Schweter}, |
|
|
year={2020}, |
|
|
url={https://huggingface.co/dbmdz/bert-base-turkish-cased} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Related Models |
|
|
|
|
|
- [opdullah/bert-turkish-ecomm-aspect-extraction](https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction) - For extracting aspect terms from Turkish e-commerce reviews |