🏨 Hospitality Aspect & Sentiment BERT (ABSA)

Aspect-Based Sentiment Analysis for Hotel Reviews (Sentence-Level)


Introduction

I built this project to go beyond “overall sentiment” and actually understand what hotel guests are talking about (staff, food, maintenance) and how they feel about each part. In real hotel operations, reviews are usually mixed: a guest might love the breakfast but complain about a broken AC. So I designed this as a practical Aspect-Based Sentiment Analysis (ABSA) system that works at the sentence level and returns aspect + sentiment outputs that can be used for real hospitality analytics.


What this model does

This system identifies:

  1. Aspect(s) mentioned in each sentence (multi-label)
  2. Sentiment for the sentences that contain aspects (positive/neutral/negative)

Supported Aspects (current)

  • FOOD (restaurant, breakfast, dining, taste, buffet, etc.)
  • STAFF (front desk, housekeeping, service, behavior, responsiveness)
  • MAINTENANCE (AC, plumbing, broken items, repairs, room condition)

Dataset used-

Primary dataset used: Kaggle – Hotel Reviews: Aspects, Sentiments, and Topics

This dataset contains hotel reviews with structured aspect and sentiment information at the review level.


🔧 Data Preparation – Aspect Model

  • Column Mapping
    Merged multiple fine-grained labels into three target aspects:
    FOOD, STAFF, and MAINTENANCE.

  • Sentence-Level Expansion
    Split each review into individual sentences using regex-based segmentation.
    Initially, each sentence inherited the review-level labels.

  • Keyword-Based Label Refinement
    To reduce noise, curated keyword vocabularies were applied for each aspect.
    A sentence retained an aspect label only if relevant keywords were present.

  • Filtering
    Removed sentences with no aspect relevance to ensure cleaner supervision.

Sentences with no active aspect labels were dropped

My Approach: Two-Stage ABSA Pipeline

Instead of training one single model to do everything at once, I used a two-stage pipeline:

Stage 1: Aspect Detection (Multi-Label Classification)

  • Input: sentence from a review
  • Output: one or more aspects (FOOD / STAFF / MAINTENANCE)
  • Why multi-label? Because one sentence can mention multiple aspects.

I trained a BERT-based model with a multi-label head (sigmoid outputs) and used per-aspect thresholds to decide whether an aspect is present.

Stage 2: Sentiment Classification (Conditional on Aspect Presence)

  • Input: only the sentences where Stage 1 detected at least one aspect
  • Output: sentiment label (POS / NEU / NEG)

This stage predicts sentiment only on relevant sentences, which helps reduce noise and makes the output easier to trust.

✅ Final output is Aspect + Sentiment at the sentence level.


Dataset Used (Kaggle)

Primary dataset used for training and experimentation:

How I used it

  • The dataset contains review text and labeling signals related to aspects and sentiments.
  • I aligned and curated the dataset for the aspects I wanted:
    • FOOD
    • STAFF
    • MAINTENANCE
  • Reviews were split into sentences for sentence-level training and inference.

Training Details

Aspect Model

  • Type: BERT backbone + custom multi-label classification head
  • Loss: Binary Cross Entropy (BCE) for multi-label
  • Output activation: Sigmoid
  • Inference: aspect probabilities + per-label thresholds

Sentiment Model

  • Type: Transformer sequence classifier (3-class)
  • Classes: POS / NEU / NEG
  • Inference: Softmax probabilities, highest class selected

I trained Aspect first, then Sentiment

I initially considered training a single model for both tasks, but I chose a staged approach because:

  • It’s easier to debug (I can confirm if aspect detection is working before sentiment)
  • It’s more controllable (threshold tuning per aspect)
  • It avoids mixing noisy labels and making the model “confused”
  • It reflects how real ABSA systems are often built in practical projects

Overfitting & What I Learned

In a previous attempt, I generated a very large amount of synthetic training data. The model produced F1 = 1.0, which looked great but was a clear sign of overfitting / data leakage / unrealistic training setup.

So for this project:

  • I reduced dependency on synthetic samples.
  • Focused on realistic sentence-level examples
  • Used thresholds and better evaluation checks to avoid “fake perfect” results

This made the model more useful in real-world testing.


Model Eva;uation-

Model Evaluation

This project was evaluated separately for Aspect Detection and Sentiment Classification, reflecting the two-stage design of the pipeline. Metrics were calculated on a held-out test set, using deterministic splits to avoid data leakage.


1️⃣ Aspect Detection Model Evaluation

Task: Multi-label classification at the sentence level
Aspects: FOOD, STAFF, MAINTENANCE
Model: BERT-based multi-label classifier
Decision Rule: Sigmoid outputs + per-aspect thresholds (tuned on validation set)

Evaluation Strategy

  • Each sentence may contain multiple aspects
  • Predictions are compared against ground-truth multi-hot labels
  • Precision, Recall, and F1-score are reported per aspect
  • Threshold tuning was applied to avoid false positives

Test Set Metrics

Aspect Precision Recall F1-score
FOOD ~0.96 ~0.97 ~0.95
STAFF ~0.95 ~0.99 ~0.97
MAINTENANCE ~0.98 ~0.97 ~0.96

Interpretation

  • High precision indicates the model rarely assigns an aspect incorrectly
  • High recall confirms strong coverage for relevant sentences
  • Slightly lower scores for MAINTENANCE reflect fewer training examples and more varied language patterns
  • These scores were achieved without using excessive synthetic data, reducing the risk of artificial overfitting

2️⃣ Sentiment Classification Model Evaluation

Task: Sentence-level sentiment classification
Labels: Positive, Negative
Model: Transformer-based sequence classifier (DistilBERT)
Note: Neutral samples were excluded to maintain label clarity

Evaluation Strategy

  • Only sentences with detected aspects were evaluated
  • Weakly supervised sentiment labels were filtered for confidence
  • Standard accuracy and macro F1 were used

Test Set Metrics

Metric Score
Accuracy ~0.96
Macro F1 ~0.97

Interpretation

  • The model reliably separates positive and negative sentiment for aspect-related sentences
  • High scores reflect focused, domain-specific training data
  • Neutral sentiment was intentionally excluded to avoid ambiguity in early-stage modeling

3️⃣ End-to-End Pipeline Behavior

Although aspect detection and sentiment classification are evaluated independently, they are used together during inference:

  1. Reviews are split into sentences
  2. Aspect model identifies relevant aspects
  3. Sentiment model predicts sentiment only for sentences with detected aspects

This staged evaluation ensures:

  • Clear error isolation (aspect vs sentiment)
  • Better interpretability
  • More reliable real-world behavior

Summary

  • Aspect detection achieves high precision and recall across all core hotel departments
  • Sentiment classification performs reliably for clear positive/negative cases
  • Evaluation reflects practical, deployment-ready performance, not leaderboard tuning

Output Format (Example)

Input:

“The staff were friendly, but the bathroom faucet was leaking. Dinner was excellent.”

Output (sentence-level):

  • STAFF → Positive
  • MAINTENANCE → Negative
  • FOOD → Positive

Intended Use Cases

✅ Hospitality review analytics
✅ Aspect-level insights for hotel operations
✅ Service recovery signals (maintenance complaints, staff issues)
✅ Portfolio demonstration of applied NLP and deployment


Limitations (Important)

  • This model is trained on overall aspect presence + sentiment signals.
  • It performs best when aspect mentions are clear and direct.
  • It may struggle with:
    • Mixed sentiment in one sentence (“good but also bad”)
    • Very implicit aspect mentions
    • New aspects outside FOOD/STAFF/MAINTENANCE

Deployment Notes

  • Deployed using Docker + FastAPI on Hugging Face Spaces
  • Avoided Gradio due to dependency conflicts
  • Supports private repo loading using HF_TOKEN

Future Improvements

  • I am working on adding more departments like Housekeeping, room service, Amenities, Location
  • Build a joint ABSA model (aspect + sentiment together)
  • Improve sentence splitting and context handling
  • Add aspect intensity scoring (not just POS/NEG/NEU)

Disclaimer

This is a learning-focused applied ML project intended for hospitality analytics. Predictions should be used as guidance and validated with real business context.


Author

Amey Tillu Hospitality & Tourism Data Analyst & AI/ML Hobbyist

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Amey9766/Hospitality-Aspect-Sentiment-BERT

Finetuned
(6452)
this model

Space using Amey9766/Hospitality-Aspect-Sentiment-BERT 1