--- license: apache-2.0 language: - tr base_model: - dbmdz/bert-base-turkish-cased pipeline_tag: text-classification --- # Turkish BERT for Aspect-Based Sentiment Analysis This model is a fine-tuned version of [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) specifically trained for aspect-based sentiment analysis on Turkish e-commerce product reviews. ## Model Description - **Base Model**: dbmdz/bert-base-turkish-cased - **Task**: Sequence Classification (Aspect-Based Sentiment Analysis) - **Language**: Turkish - **Domain**: E-commerce product reviews ## Model Performance - **F1 Score**: 88% on test set - **Test Set Size**: 4,000 samples - **Training Set Size**: 36,000 samples ## Training Details ### Training Data - **Dataset Size**: 36,000 reviews - **Data Source**: Private e-commerce product review dataset - **Domain**: E-commerce product reviews in Turkish - **Coverage**: Over 500 product categories ### Training Configuration - **Epochs**: 5 - **Task Type**: Sequence Classification - **Input Format**: `[aspect_term] [SEP] [review_text]` - **Label Classes**: - `positive`: Positive sentiment towards the aspect - `negative`: Negative sentiment towards the aspect - `neutral`: Neutral sentiment towards the aspect ### Training Loss The model showed consistent improvement across epochs: | Epoch | Loss | |-------|------| | 1 | 0.47 | | 2 | 0.34 | | 3 | 0.25 | | 4 | 0.22 | | 5 | 0.11 | ## Usage ### Option 1: Using Pipeline (Recommended) ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification from transformers import pipeline # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa") model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa") # Create pipeline sentiment_analyzer = pipeline("text-classification", model=model, tokenizer=tokenizer) # Example usage aspect = "arka kamerası" review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz." text = f"{aspect} [SEP] {review}" result = sentiment_analyzer(text) print(result) ``` **Expected Output:** ```python [{'label': 'positive', 'score': 0.9998155236244202}] ``` ### Option 2: Manual Inference ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa") model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa") # Example aspect and review aspect = "arka kamerası" review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz." # Tokenize aspect and review together inputs = tokenizer(aspect, review, return_tensors="pt", truncation=True, padding=True) # Get predictions with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class_id = predictions.argmax(dim=-1).item() confidence = predictions.max().item() # Convert prediction to label predicted_label = model.config.id2label[predicted_class_id] print(f"Aspect: {aspect}") print(f"Sentiment: {predicted_label}") print(f"Confidence: {confidence:.4f}") ``` **Expected Output:** ``` Aspect: arka kamerası Sentiment: positive Confidence: 0.9998 ``` ### Option 3: Batch Inference ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa") model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa") # Example aspect-review pairs examples = [ ("arka kamerası", "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."), ("bataryası", "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz."), ("fiyatı", "Ürünün fiyatı çok uygun ve kalitesi de iyi."), ] aspects = [ex[0] for ex in examples] reviews = [ex[1] for ex in examples] # Tokenize all pairs inputs = tokenizer(aspects, reviews, return_tensors="pt", truncation=True, padding=True) # Get predictions for all pairs with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class_ids = predictions.argmax(dim=-1) confidences = predictions.max(dim=-1).values # Display results for i, (aspect, review) in enumerate(examples): predicted_label = model.config.id2label[predicted_class_ids[i].item()] confidence = confidences[i].item() print(f"Aspect: {aspect}") print(f"Sentiment: {predicted_label} (confidence: {confidence:.4f})") print("-" * 40) ``` **Expected Output:** ``` Aspect: arka kamerası Sentiment: positive (confidence: 0.9998) Aspect: bataryası Sentiment: negative (confidence: 0.9990) Aspect: fiyatı Sentiment: positive (confidence: 0.9998) ``` ## Combined Usage with Aspect Extraction (Recommended) This model works perfectly with the aspect extraction model [opdullah/bert-turkish-ecomm-aspect-extraction](https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction) for complete aspect-based sentiment analysis: ```python from transformers import AutoTokenizer, AutoModelForTokenClassification, AutoModelForSequenceClassification, pipeline import torch # Load aspect extraction model aspect_extractor = pipeline("token-classification", model="opdullah/bert-turkish-ecomm-aspect-extraction", aggregation_strategy="simple") # Load sentiment analysis model sentiment_tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-absa") sentiment_model = AutoModelForSequenceClassification.from_pretrained("opdullah/bert-turkish-ecomm-absa") def analyze_aspect_sentiment(review): # Extract aspects aspects = aspect_extractor(review) results = [] for aspect in aspects: if aspect['entity_group'] == 'ASPECT': aspect_text = aspect['word'] # Analyze sentiment inputs = sentiment_tokenizer(aspect_text, review, return_tensors="pt", truncation=True) with torch.no_grad(): outputs = sentiment_model(**inputs) prediction = outputs.logits.argmax().item() sentiment = sentiment_model.config.id2label[prediction] results.append({'aspect': aspect_text, 'sentiment': sentiment}) return results # Usage review = "Bu telefonun arka kamerası çok iyi ama bataryası yetersiz." results = analyze_aspect_sentiment(review) for result in results: print(f"{result['aspect']}: {result['sentiment']}") ``` **Expected Output:** ``` arka kamerası: positive bataryası: negative ``` ## Label Mapping ```python id2label = { 0: "negative", 1: "neutral", 2: "positive" } label2id = { "negative": 0, "neutral": 1, "positive": 2 } ``` ## Intended Use This model is designed for: - Analyzing sentiment of specific aspects in Turkish e-commerce product reviews - Building complete aspect-based sentiment analysis systems - Understanding customer opinions on specific product features - Supporting recommendation systems and review analysis tools ## Limitations - Trained specifically on e-commerce domain data - Requires aspect terms to be identified beforehand (use with aspect extraction model) - Performance may vary on other domains or text types - Limited to Turkish language - Based on private dataset, so reproducibility may be limited ## Citation If you use this model, please cite: ``` @misc{turkish-bert-absa, title={Turkish BERT for Aspect-Based Sentiment Analysis}, author={Abdullah Koçak}, year={2025}, url={https://huggingface.co/opdullah/bert-turkish-ecomm-absa} } ``` ## Base Model Citation ``` @misc{schweter2020bertbase, title={BERTurk - BERT models for Turkish}, author={Stefan Schweter}, year={2020}, url={https://huggingface.co/dbmdz/bert-base-turkish-cased} } ``` ## Related Models - [opdullah/bert-turkish-ecomm-aspect-extraction](https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction) - For extracting aspect terms from Turkish e-commerce reviews