🏨 Hospitality Aspect & Sentiment BERT (ABSA)

Aspect-Based Sentiment Analysis for Hotel Reviews (Sentence-Level)

Introduction

I built this project to go beyond “overall sentiment” and actually understand what hotel guests are talking about (staff, food, maintenance) and how they feel about each part. In real hotel operations, reviews are usually mixed: a guest might love the breakfast but complain about a broken AC. So I designed this as a practical Aspect-Based Sentiment Analysis (ABSA) system that works at the sentence level and returns aspect + sentiment outputs that can be used for real hospitality analytics.

What this model does

This system identifies:

Aspect(s) mentioned in each sentence (multi-label)
Sentiment for the sentences that contain aspects (positive/neutral/negative)

Supported Aspects (current)

FOOD (restaurant, breakfast, dining, taste, buffet, etc.)
STAFF (front desk, housekeeping, service, behavior, responsiveness)
MAINTENANCE (AC, plumbing, broken items, repairs, room condition)

Dataset used-

Primary dataset used: Kaggle – Hotel Reviews: Aspects, Sentiments, and Topics

This dataset contains hotel reviews with structured aspect and sentiment information at the review level.

🔧 Data Preparation – Aspect Model

Column Mapping
Merged multiple fine-grained labels into three target aspects:
FOOD, STAFF, and MAINTENANCE.
Sentence-Level Expansion
Split each review into individual sentences using regex-based segmentation.
Initially, each sentence inherited the review-level labels.
Keyword-Based Label Refinement
To reduce noise, curated keyword vocabularies were applied for each aspect.
A sentence retained an aspect label only if relevant keywords were present.
Filtering
Removed sentences with no aspect relevance to ensure cleaner supervision.

Sentences with no active aspect labels were dropped

My Approach: Two-Stage ABSA Pipeline

Instead of training one single model to do everything at once, I used a two-stage pipeline:

Stage 1: Aspect Detection (Multi-Label Classification)

Input: sentence from a review
Output: one or more aspects (FOOD / STAFF / MAINTENANCE)
Why multi-label? Because one sentence can mention multiple aspects.

I trained a BERT-based model with a multi-label head (sigmoid outputs) and used per-aspect thresholds to decide whether an aspect is present.

Stage 2: Sentiment Classification (Conditional on Aspect Presence)

Input: only the sentences where Stage 1 detected at least one aspect
Output: sentiment label (POS / NEU / NEG)

This stage predicts sentiment only on relevant sentences, which helps reduce noise and makes the output easier to trust.

✅ Final output is Aspect + Sentiment at the sentence level.

Dataset Used (Kaggle)

Primary dataset used for training and experimentation:

Hotel Reviews: Aspects, Sentiments and Topics (Kaggle)
https://www.kaggle.com/datasets/costastziouvas/hotel-reviews-aspects-sentiments-and-topics

How I used it

The dataset contains review text and labeling signals related to aspects and sentiments.
I aligned and curated the dataset for the aspects I wanted:
- FOOD
- STAFF
- MAINTENANCE
Reviews were split into sentences for sentence-level training and inference.

Training Details

Aspect Model

Type: BERT backbone + custom multi-label classification head
Loss: Binary Cross Entropy (BCE) for multi-label
Output activation: Sigmoid
Inference: aspect probabilities + per-label thresholds

Sentiment Model

Type: Transformer sequence classifier (3-class)
Classes: POS / NEU / NEG
Inference: Softmax probabilities, highest class selected

I trained Aspect first, then Sentiment

I initially considered training a single model for both tasks, but I chose a staged approach because:

It’s easier to debug (I can confirm if aspect detection is working before sentiment)
It’s more controllable (threshold tuning per aspect)
It avoids mixing noisy labels and making the model “confused”
It reflects how real ABSA systems are often built in practical projects

Overfitting & What I Learned

In a previous attempt, I generated a very large amount of synthetic training data. The model produced F1 = 1.0, which looked great but was a clear sign of overfitting / data leakage / unrealistic training setup.

So for this project:

I reduced dependency on synthetic samples.
Focused on realistic sentence-level examples
Used thresholds and better evaluation checks to avoid “fake perfect” results

This made the model more useful in real-world testing.

Model Eva;uation-

Model Evaluation

This project was evaluated separately for Aspect Detection and Sentiment Classification, reflecting the two-stage design of the pipeline. Metrics were calculated on a held-out test set, using deterministic splits to avoid data leakage.

1️⃣ Aspect Detection Model Evaluation

Task: Multi-label classification at the sentence level
Aspects: FOOD, STAFF, MAINTENANCE
Model: BERT-based multi-label classifier
Decision Rule: Sigmoid outputs + per-aspect thresholds (tuned on validation set)

Evaluation Strategy

Each sentence may contain multiple aspects
Predictions are compared against ground-truth multi-hot labels
Precision, Recall, and F1-score are reported per aspect
Threshold tuning was applied to avoid false positives

Test Set Metrics

Aspect	Precision	Recall	F1-score
FOOD	~0.96	~0.97	~0.95
STAFF	~0.95	~0.99	~0.97
MAINTENANCE	~0.98	~0.97	~0.96

Interpretation

High precision indicates the model rarely assigns an aspect incorrectly
High recall confirms strong coverage for relevant sentences
Slightly lower scores for MAINTENANCE reflect fewer training examples and more varied language patterns
These scores were achieved without using excessive synthetic data, reducing the risk of artificial overfitting

2️⃣ Sentiment Classification Model Evaluation

Task: Sentence-level sentiment classification
Labels: Positive, Negative
Model: Transformer-based sequence classifier (DistilBERT)
Note: Neutral samples were excluded to maintain label clarity

Evaluation Strategy

Only sentences with detected aspects were evaluated
Weakly supervised sentiment labels were filtered for confidence
Standard accuracy and macro F1 were used

Test Set Metrics

Metric	Score
Accuracy	~0.96
Macro F1	~0.97

Interpretation

The model reliably separates positive and negative sentiment for aspect-related sentences
High scores reflect focused, domain-specific training data
Neutral sentiment was intentionally excluded to avoid ambiguity in early-stage modeling

3️⃣ End-to-End Pipeline Behavior

Although aspect detection and sentiment classification are evaluated independently, they are used together during inference:

Reviews are split into sentences
Aspect model identifies relevant aspects
Sentiment model predicts sentiment only for sentences with detected aspects

This staged evaluation ensures:

Clear error isolation (aspect vs sentiment)
Better interpretability
More reliable real-world behavior

Summary

Aspect detection achieves high precision and recall across all core hotel departments
Sentiment classification performs reliably for clear positive/negative cases
Evaluation reflects practical, deployment-ready performance, not leaderboard tuning

Output Format (Example)

Input:

“The staff were friendly, but the bathroom faucet was leaking. Dinner was excellent.”

Output (sentence-level):

STAFF → Positive
MAINTENANCE → Negative
FOOD → Positive

Intended Use Cases

✅ Hospitality review analytics
✅ Aspect-level insights for hotel operations
✅ Service recovery signals (maintenance complaints, staff issues)
✅ Portfolio demonstration of applied NLP and deployment

Limitations (Important)

This model is trained on overall aspect presence + sentiment signals.
It performs best when aspect mentions are clear and direct.
It may struggle with:
- Mixed sentiment in one sentence (“good but also bad”)
- Very implicit aspect mentions
- New aspects outside FOOD/STAFF/MAINTENANCE

Deployment Notes

Deployed using Docker + FastAPI on Hugging Face Spaces
Avoided Gradio due to dependency conflicts
Supports private repo loading using HF_TOKEN

Future Improvements

I am working on adding more departments like Housekeeping, room service, Amenities, Location
Build a joint ABSA model (aspect + sentiment together)
Improve sentence splitting and context handling
Add aspect intensity scoring (not just POS/NEG/NEU)

Disclaimer

This is a learning-focused applied ML project intended for hospitality analytics. Predictions should be used as guidance and validated with real business context.

Author

Amey Tillu Hospitality & Tourism Data Analyst & AI/ML Hobbyist

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Amey9766/Hospitality-Aspect-Sentiment-BERT

Base model

google-bert/bert-base-uncased

Finetuned

(6452)

this model

Amey9766
/

Hospitality-Aspect-Sentiment-BERT

🏨 Hospitality Aspect & Sentiment BERT (ABSA)

Introduction

What this model does

Supported Aspects (current)

Dataset used-

🔧 Data Preparation – Aspect Model

My Approach: Two-Stage ABSA Pipeline

Stage 1: Aspect Detection (Multi-Label Classification)

Stage 2: Sentiment Classification (Conditional on Aspect Presence)

Dataset Used (Kaggle)

How I used it

Training Details

Aspect Model

Sentiment Model

I trained Aspect first, then Sentiment

Overfitting & What I Learned

Model Eva;uation-

Model Evaluation

1️⃣ Aspect Detection Model Evaluation

Evaluation Strategy

Test Set Metrics

Interpretation

2️⃣ Sentiment Classification Model Evaluation

Evaluation Strategy

Test Set Metrics

Interpretation

3️⃣ End-to-End Pipeline Behavior

Summary

Output Format (Example)

Intended Use Cases

Limitations (Important)

Deployment Notes

Future Improvements

Disclaimer

Author

Model tree for Amey9766/Hospitality-Aspect-Sentiment-BERT

Space using Amey9766/Hospitality-Aspect-Sentiment-BERT 1