Spaces:

Hamdy005
/

raij-ai

Running

App Files Files Community

raij-ai / aspect_based_sentiment /Documentation.md

github-actions[bot]

chore: sync from GitHub 2026-04-04 21:04:08 UTC

73ad240 20 days ago

preview code

raw

history blame contribute delete

5.26 kB

Aspect-Based Sentiment Analysis - Technical Documentation

This module extracts product aspects from reviews, classifies sentiment per aspect, and returns aggregated highlights, pros, cons, and an advisory-tone summary. English and Arabic reviews are both supported; all output is always in English.

Architecture

src/aspect_based_sentiment/routes.py           -> FastAPI endpoints
src/aspect_based_sentiment/aspect_sentiment.py -> ABSA pipeline + aggregation
src/models.py                                  -> Lazy-loaded NLP models
src/utils.py                                   -> Supabase client singleton

Endpoint

GET /product/{product_id}/review-summary

Query parameters

threshold_divisor (default 4.0): controls how strict aspect-level mention thresholds are.

The confidence threshold is fixed in code at 0.65 to filter low-confidence ABSA predictions.

Thresholds are computed as:

pos_threshold = total_reviews / threshold_divisor
neg_threshold = total_reviews / threshold_divisor

Lower threshold_divisor means stricter thresholds; higher values mean looser thresholds.

Pipeline

Reviews from Supabase
  -> [Arabic only] translate to English via Helsinki-NLP/opus-mt-ar-en
  -> extract noun chunks (candidate aspects) using spaCy en_core_web_md
  -> evaluate chunks against dynamic product title, tags, and categories to discard self-referential pronouns
  -> classify each (review, aspect) pair with DeBERTa ABSA
  -> normalize aspect names (for example, "the camera" -> "camera")
  -> aggregate positive/negative counts across reviews
  -> generate aspect-level UI lines using randomized templates
  -> generate 3-4 advisory sentences (Noon-style summary)
  -> return highlights + pros + cons

Stage details

_is_arabic(text) / _translate_to_english(text) (routes.py)

Detects Arabic by Unicode character ratio (>30% Arabic chars triggers translation).
Translates with Helsinki-NLP/opus-mt-ar-en before any downstream processing.

extract_aspects(text, product_title, product_tags, product_categories)

Expects English text; Arabic is pre-translated upstream.
Uses spaCy en_core_web_md noun chunks as candidate aspects.
Drops non-meaningful chunks (determiners, pronouns) and lowercases output.
Robust Self-Reference Filter: Dynamically prevents the algorithm from picking up the product itself (e.g. tracking "phone" as an aspect for a smartphone).
- Gathers the product's title, tags, and database categories.
- Automatically identifies and splits English compound terms (e.g., smartphone -> smart, phone).
- Utilizes spaCy lemmatization to universally support plural categories (e.g., matching a review saying "phone" against the category "smartphones").
- Performs a subset lemma match against these curated product features to instantly filter out generic nouns prior to ABSA computation.

classify_aspects(review_text, aspects)

Runs ABSA classification with Positive, Negative, or Neutral labels.
Returns sentiment label and confidence score for each aspect.

aggregate_pros_cons(...)

Filters predictions below the fixed confidence threshold (0.65).
Merges similar aspect strings with normalization.
Counts positive and negative mentions per aspect.
Builds an aspect-level summary line using randomized text templates based purely on the aggregated mention counts.
Produces:
- highlights: ranked by total_mentions
- pros: aspects where positive dominance exceeds threshold
- cons: aspects where negative dominance exceeds threshold

Models Used

Component	Model	Role
Arabic translation	`Helsinki-NLP/opus-mt-ar-en`	Translate Arabic reviews to English
Aspect extraction	`en_core_web_md`	Noun-chunk extraction
ABSA classifier	`yangheng/deberta-v3-base-absa-v1.1`	Sentiment per `(review, aspect)` pair

All models run on CPU with lazy loading.

Response Shape

{
  "product_id": "uuid-string",
  "total_reviews": 20,
  "highlights": [
    {
      "aspect": "camera",
      "summary": "Camera quality is excellent but it can overheat during long recording sessions.",
      "positive_mentions": 9,
      "negative_mentions": 4,
      "total_mentions": 13
    }
  ],
  "pros": [
    "Camera quality is excellent but it can overheat during long recording sessions."
  ],
  "cons": [
  "cons": [
    "The device overheats during long recording sessions."
  ]
}

highlights[].summary contains the Noon-style advisory sentences (combined from all pros/cons).

Low-Review Interpretation Guidance

For products with a small number of reviews, threshold-driven outputs can look stronger than the evidence really is.

Recommended interpretation policy:

Keep threshold logic enabled for consistency.
Treat outputs as low confidence when total_reviews < 10.
Prefer displaying highlights plus a low-confidence note rather than making strong pros/cons claims.

Data Source

Table: reviews

Fields used by ABSA endpoint:

id
product_id
rating
title (fallback when content is empty)
content (primary text)

sentiment is currently not used by this endpoint for serving responses.