Aspect-Based Sentiment Analysis - Technical Documentation
This module extracts product aspects from reviews, classifies sentiment per aspect, and returns aggregated highlights, pros, cons, and an advisory-tone summary. English and Arabic reviews are both supported; all output is always in English.
Architecture
src/aspect_based_sentiment/routes.py -> FastAPI endpoints
src/aspect_based_sentiment/aspect_sentiment.py -> ABSA pipeline + aggregation
src/models.py -> Lazy-loaded NLP models
src/utils.py -> Supabase client singleton
Endpoint
GET /product/{product_id}/review-summary
Query parameters
threshold_divisor(default4.0): controls how strict aspect-level mention thresholds are.
The confidence threshold is fixed in code at 0.65 to filter low-confidence ABSA predictions.
Thresholds are computed as:
pos_threshold = total_reviews / threshold_divisorneg_threshold = total_reviews / threshold_divisor
Lower threshold_divisor means stricter thresholds; higher values mean looser thresholds.
Pipeline
Reviews from Supabase
-> [Arabic only] translate to English via Helsinki-NLP/opus-mt-ar-en
-> extract noun chunks (candidate aspects) using spaCy en_core_web_md
-> evaluate chunks against dynamic product title, tags, and categories to discard self-referential pronouns
-> classify each (review, aspect) pair with DeBERTa ABSA
-> normalize aspect names (for example, "the camera" -> "camera")
-> aggregate positive/negative counts across reviews
-> generate aspect-level UI lines using randomized templates
-> generate 3-4 advisory sentences (Noon-style summary)
-> return highlights + pros + cons
Stage details
_is_arabic(text)/_translate_to_english(text)(routes.py)
- Detects Arabic by Unicode character ratio (>30% Arabic chars triggers translation).
- Translates with
Helsinki-NLP/opus-mt-ar-enbefore any downstream processing.
extract_aspects(text, product_title, product_tags, product_categories)
- Expects English text; Arabic is pre-translated upstream.
- Uses spaCy
en_core_web_mdnoun chunks as candidate aspects. - Drops non-meaningful chunks (determiners, pronouns) and lowercases output.
- Robust Self-Reference Filter: Dynamically prevents the algorithm from picking up the product itself (e.g. tracking "phone" as an aspect for a smartphone).
- Gathers the product's title, tags, and database categories.
- Automatically identifies and splits English compound terms (e.g.,
smartphone->smart,phone). - Utilizes spaCy lemmatization to universally support plural categories (e.g., matching a review saying
"phone"against the category"smartphones"). - Performs a subset lemma match against these curated product features to instantly filter out generic nouns prior to ABSA computation.
classify_aspects(review_text, aspects)
- Runs ABSA classification with
Positive,Negative, orNeutrallabels. - Returns sentiment label and confidence score for each aspect.
aggregate_pros_cons(...)
- Filters predictions below the fixed confidence threshold (
0.65). - Merges similar aspect strings with normalization.
- Counts positive and negative mentions per aspect.
- Builds an aspect-level
summaryline using randomized text templates based purely on the aggregated mention counts. - Produces:
highlights: ranked bytotal_mentionspros: aspects where positive dominance exceeds thresholdcons: aspects where negative dominance exceeds threshold
Models Used
| Component | Model | Role |
|---|---|---|
| Arabic translation | Helsinki-NLP/opus-mt-ar-en |
Translate Arabic reviews to English |
| Aspect extraction | en_core_web_md |
Noun-chunk extraction |
| ABSA classifier | yangheng/deberta-v3-base-absa-v1.1 |
Sentiment per (review, aspect) pair |
All models run on CPU with lazy loading.
Response Shape
{
"product_id": "uuid-string",
"total_reviews": 20,
"highlights": [
{
"aspect": "camera",
"summary": "Camera quality is excellent but it can overheat during long recording sessions.",
"positive_mentions": 9,
"negative_mentions": 4,
"total_mentions": 13
}
],
"pros": [
"Camera quality is excellent but it can overheat during long recording sessions."
],
"cons": [
"cons": [
"The device overheats during long recording sessions."
]
}
highlights[].summarycontains the Noon-style advisory sentences (combined from all pros/cons).
Low-Review Interpretation Guidance
For products with a small number of reviews, threshold-driven outputs can look stronger than the evidence really is.
Recommended interpretation policy:
- Keep threshold logic enabled for consistency.
- Treat outputs as low confidence when
total_reviews < 10. - Prefer displaying
highlightsplus a low-confidence note rather than making strongpros/consclaims.
Data Source
Table: reviews
Fields used by ABSA endpoint:
idproduct_idratingtitle(fallback whencontentis empty)content(primary text)
sentiment is currently not used by this endpoint for serving responses.