Spaces:
Sleeping
Sleeping
| # Aspect-Based Review Intelligence System β Complete Build Plan | |
| > **Goal:** Fine-tune RoBERTa for aspect-based sentiment analysis + RAG Q&A layer + Streamlit dashboard. | |
| > **Resume line:** NLP + Transformer Fine-tuning + RAG + FastAPI + Deployed | |
| ## Project Architecture | |
| ```mermaid | |
| graph LR | |
| A[Raw Reviews] --> B[Data Processing] | |
| B --> C[Fine-tune RoBERTa] | |
| C --> D[ABSA Model] | |
| B --> E[Embed Reviews] | |
| E --> F[FAISS Vector Store] | |
| D --> G[FastAPI Backend] | |
| F --> G | |
| G --> H[Streamlit Dashboard] | |
| H --> I1[Aspect Sentiment Heatmap] | |
| H --> I2[Trend Charts] | |
| H --> I3[Natural Language Q&A] | |
| ``` | |
| ## What the System Does | |
| ``` | |
| INPUT: "The pizza was amazing but the waiter was incredibly rude and slow" | |
| OUTPUT (ABSA Model): | |
| βββ food β Positive (confidence: 0.94) | |
| βββ service β Negative (confidence: 0.91) | |
| βββ ambiance β No mention | |
| βββ price β No mention | |
| OUTPUT (RAG Q&A): | |
| User: "Why do customers complain about service?" | |
| System: "Based on 847 reviews, the top service complaints are: | |
| 1. Slow wait times (mentioned in 34% of negative reviews) | |
| 2. Rude staff behavior (28%) | |
| 3. Order mistakes (19%)" | |
| ``` | |
| ## Dataset: SemEval 2014 Task 4 | |
| | Detail | Value | | |
| |---|---| | |
| | **Name** | SemEval-2014 Task 4: Aspect-Based Sentiment Analysis | | |
| | **Domain** | Restaurant reviews (also has Laptop β use Restaurant) | | |
| | **Size** | ~3,000 training sentences, ~800 test sentences | | |
| | **Labels** | Aspect categories: `food`, `service`, `ambiance`, `price`, `anecdotes/miscellaneous` | | |
| | **Sentiments** | `positive`, `negative`, `neutral`, `conflict` | | |
| | **Format** | XML | | |
| | **Why this dataset** | Standard academic benchmark. Any interviewer who knows NLP will recognize it. Your results are directly comparable to published papers. | | |
| > [!IMPORTANT] | |
| > **Download link:** [SemEval 2014 Task 4 Dataset](https://alt.qcri.org/semeval2014/task4/index.php?id=data-and-tools) | |
| > Download the Restaurant train + test XML files. | |
| ## Complete File Structure | |
| ``` | |
| review-intelligence/ | |
| β | |
| βββ data/ | |
| β βββ raw/ # SemEval XML files go here | |
| β β βββ Restaurants_Train_v2.xml | |
| β β βββ Restaurants_Test_Gold.xml | |
| β βββ processed/ # Generated CSVs | |
| β βββ train.csv | |
| β βββ test.csv | |
| β | |
| βββ src/ | |
| β βββ data_processing.py # Step 1: Parse XML β CSV | |
| β βββ train_absa.py # Step 2: Fine-tune RoBERTa | |
| β βββ evaluate.py # Step 3: Evaluate model | |
| β βββ inference.py # Step 4: Single-review prediction | |
| β βββ build_vectorstore.py # Step 5: Embed reviews β FAISS | |
| β βββ rag_engine.py # Step 6: RAG retrieval + LLM answer | |
| β | |
| βββ api/ | |
| β βββ main.py # Step 7: FastAPI backend | |
| β | |
| βββ app/ | |
| β βββ streamlit_app.py # Step 8: Dashboard | |
| β | |
| βββ models/ # Saved fine-tuned model | |
| βββ vectorstore/ # FAISS index files | |
| β | |
| βββ requirements.txt | |
| βββ Dockerfile | |
| βββ docker-compose.yml | |
| βββ .env # API keys (Groq/OpenAI) | |
| βββ README.md | |
| ``` | |
| **Total files to write: 12** (8 Python + 4 config/docs) | |
| ## Step-by-Step Implementation | |
| ### Step 1: `src/data_processing.py` β Parse SemEval XML to CSV | |
| **What it does:** Reads the XML format, extracts each sentence + aspect category + sentiment, outputs a clean CSV. | |
| **Input XML format:** | |
| ```xml | |
| <sentence id="1"> | |
| <text>The pizza was amazing but the waiter was rude.</text> | |
| <aspectCategories> | |
| <aspectCategory category="food" polarity="positive"/> | |
| <aspectCategory category="service" polarity="negative"/> | |
| </aspectCategories> | |
| </sentence> | |
| ``` | |
| **Output CSV format:** | |
| | text | aspect | sentiment | | |
| |---|---|---| | |
| | The pizza was amazing but the waiter was rude. | food | positive | | |
| | The pizza was amazing but the waiter was rude. | service | negative | | |
| **Key logic:** | |
| ```python | |
| import xml.etree.ElementTree as ET | |
| import pandas as pd | |
| def parse_semeval_xml(xml_path): | |
| tree = ET.parse(xml_path) | |
| root = tree.getroot() | |
| rows = [] | |
| for sentence in root.findall('.//sentence'): | |
| text = sentence.find('text').text | |
| for aspect_cat in sentence.findall('.//aspectCategory'): | |
| rows.append({ | |
| 'text': text, | |
| 'aspect': aspect_cat.get('category'), | |
| 'sentiment': aspect_cat.get('polarity') | |
| }) | |
| return pd.DataFrame(rows) | |
| ``` | |
| **Label mapping:** | |
| ```python | |
| SENTIMENT_MAP = {'positive': 0, 'negative': 1, 'neutral': 2, 'conflict': 3} | |
| ASPECT_CATEGORIES = ['food', 'service', 'ambiance', 'price', 'anecdotes/miscellaneous'] | |
| ``` | |
| ### Step 2: `src/train_absa.py` β Fine-tune RoBERTa | |
| **What it does:** Fine-tunes `roberta-base` for aspect-based sentiment classification. | |
| **Model approach:** Auxiliary Sentence Pair Classification | |
| - Input to model: `"[CLS] The pizza was amazing but waiter was rude [SEP] food [SEP]"` | |
| - Output: `positive` (for the "food" aspect) | |
| - This converts ABSA into a standard sentence-pair classification task that RoBERTa handles natively. | |
| **Key implementation details:** | |
| ```python | |
| # Tokenization β sentence pair format | |
| # Sentence A = review text | |
| # Sentence B = aspect category name | |
| inputs = tokenizer( | |
| review_text, # "The pizza was amazing..." | |
| aspect_category, # "food" | |
| truncation=True, | |
| padding='max_length', | |
| max_length=128, | |
| return_tensors='pt' | |
| ) | |
| ``` | |
| **Training config:** | |
| | Parameter | Value | Why | | |
| |---|---|---| | |
| | Base model | `roberta-base` | Best balance of size vs accuracy for this task | | |
| | Learning rate | `2e-5` | Standard for transformer fine-tuning | | |
| | Batch size | `16` | Fits in ~6GB GPU / free Colab | | |
| | Epochs | `5` | SemEval is small; more epochs = overfitting | | |
| | Max length | `128` | Restaurant reviews are short | | |
| | Optimizer | AdamW | Standard for transformers | | |
| | Scheduler | Linear warmup (10% steps) | Prevents early instability | | |
| | Loss | CrossEntropyLoss | 4-class classification | | |
| **Libraries needed:** | |
| ```python | |
| from transformers import RobertaTokenizer, RobertaForSequenceClassification | |
| from transformers import Trainer, TrainingArguments | |
| from datasets import Dataset | |
| from sklearn.model_selection import train_test_split | |
| ``` | |
| **Training loop (using HuggingFace Trainer):** | |
| ```python | |
| model = RobertaForSequenceClassification.from_pretrained( | |
| 'roberta-base', num_labels=4 # pos, neg, neutral, conflict | |
| ) | |
| training_args = TrainingArguments( | |
| output_dir='./models/absa-roberta', | |
| num_train_epochs=5, | |
| per_device_train_batch_size=16, | |
| per_device_eval_batch_size=32, | |
| learning_rate=2e-5, | |
| warmup_ratio=0.1, | |
| weight_decay=0.01, | |
| evaluation_strategy='epoch', | |
| save_strategy='epoch', | |
| load_best_model_at_end=True, | |
| metric_for_best_model='f1_macro', | |
| ) | |
| trainer = Trainer( | |
| model=model, | |
| args=training_args, | |
| train_dataset=train_dataset, | |
| eval_dataset=val_dataset, | |
| compute_metrics=compute_metrics, | |
| ) | |
| trainer.train() | |
| ``` | |
| **Expected results on SemEval 2014 Restaurant:** | |
| - Accuracy: ~83-87% | |
| - Macro F1: ~75-80% | |
| - These are competitive with published baselines (~85% accuracy) | |
| > [!TIP] | |
| > **Where to train:** Use Google Colab (free T4 GPU). Training takes ~15-20 minutes for 5 epochs on SemEval-sized data. | |
| ### Step 3: `src/evaluate.py` β Evaluation & Metrics | |
| **What it does:** Generates classification report, confusion matrix, per-aspect performance. | |
| **Key metrics to compute:** | |
| ```python | |
| from sklearn.metrics import classification_report, confusion_matrix | |
| # Overall metrics | |
| print(classification_report(y_true, y_pred, | |
| target_names=['positive', 'negative', 'neutral', 'conflict'])) | |
| # Per-aspect accuracy | |
| for aspect in ASPECT_CATEGORIES: | |
| mask = (test_df['aspect'] == aspect) | |
| aspect_acc = accuracy_score(y_true[mask], y_pred[mask]) | |
| print(f"{aspect}: {aspect_acc:.2%}") | |
| ``` | |
| **What to save for resume:** | |
| - Overall accuracy and macro F1 | |
| - Per-aspect F1 (shows where model is strong/weak) | |
| - Comparison vs. a baseline (e.g., TF-IDF + Logistic Regression) | |
| --- | |
| ### Step 4: `src/inference.py` β Single Review Prediction | |
| **What it does:** Takes one review, runs it through the model for ALL aspects, returns structured output. | |
| ```python | |
| def predict_aspects(review_text: str, model, tokenizer): | |
| results = {} | |
| for aspect in ASPECT_CATEGORIES: | |
| inputs = tokenizer(review_text, aspect, | |
| truncation=True, padding='max_length', | |
| max_length=128, return_tensors='pt') | |
| outputs = model(**inputs.to(device)) | |
| probs = torch.softmax(outputs.logits, dim=1) | |
| pred_label = torch.argmax(probs).item() | |
| confidence = probs[0][pred_label].item() | |
| # Only include if confidence > threshold | |
| if confidence > 0.6: | |
| results[aspect] = { | |
| 'sentiment': LABELS[pred_label], | |
| 'confidence': round(confidence, 3) | |
| } | |
| return results | |
| ``` | |
| **Example output:** | |
| ```json | |
| { | |
| "food": {"sentiment": "positive", "confidence": 0.94}, | |
| "service": {"sentiment": "negative", "confidence": 0.91} | |
| } | |
| ``` | |
| ### Step 5: `src/build_vectorstore.py` β Embed Reviews into FAISS | |
| **What it does:** Takes all reviews, generates sentence embeddings, stores in FAISS for RAG retrieval. | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| import faiss | |
| import numpy as np | |
| import pickle | |
| # Load embedding model | |
| embedder = SentenceTransformer('all-MiniLM-L6-v2') | |
| # Embed all reviews | |
| review_texts = df['text'].unique().tolist() | |
| embeddings = embedder.encode(review_texts, show_progress_bar=True) | |
| # Build FAISS index | |
| dimension = embeddings.shape[1] # 384 for MiniLM | |
| index = faiss.IndexFlatIP(dimension) | |
| faiss.normalize_L2(embeddings) | |
| index.add(embeddings.astype('float32')) | |
| # Save | |
| faiss.write_index(index, 'vectorstore/reviews.index') | |
| with open('vectorstore/review_texts.pkl', 'wb') as f: | |
| pickle.dump(review_texts, f) | |
| ``` | |
| ### Step 6: `src/rag_engine.py` β RAG Q&A Engine | |
| **What it does:** User asks a question β retrieves relevant reviews β LLM synthesizes answer. | |
| ```python | |
| def answer_question(question: str, top_k: int = 10): | |
| # 1. Embed the question | |
| q_embedding = embedder.encode([question]) | |
| faiss.normalize_L2(q_embedding) | |
| # 2. Search FAISS | |
| scores, indices = index.search(q_embedding.astype('float32'), top_k) | |
| retrieved_reviews = [review_texts[i] for i in indices[0]] | |
| # 3. Run ABSA on each retrieved review | |
| aspect_results = [] | |
| for review in retrieved_reviews: | |
| aspects = predict_aspects(review, model, tokenizer) | |
| aspect_results.append({'text': review, 'aspects': aspects}) | |
| # 4. Send to LLM for synthesis | |
| context = "\n".join([ | |
| f"Review: {r['text']}\nAspects: {r['aspects']}" | |
| for r in aspect_results | |
| ]) | |
| prompt = f"""Based on these customer reviews and their aspect sentiments: | |
| {context} | |
| Question: {question} | |
| Provide a concise, data-backed answer with specific counts and percentages.""" | |
| # Call Groq/OpenAI | |
| response = llm.chat.completions.create( | |
| model="llama-3.1-8b-instant", # or gpt-3.5-turbo | |
| messages=[{"role": "user", "content": prompt}] | |
| ) | |
| return response.choices[0].message.content | |
| ``` | |
| ### Step 7: `api/main.py` β FastAPI Backend | |
| **Endpoints:** | |
| | Endpoint | Method | What It Does | | |
| |---|---|---| | |
| | `/predict` | POST | Takes a review β returns aspect sentiments | | |
| | `/ask` | POST | Takes a question β returns RAG answer | | |
| | `/stats` | GET | Returns aggregate sentiment stats per aspect | | |
| | `/health` | GET | Health check | | |
| ```python | |
| from fastapi import FastAPI | |
| from pydantic import BaseModel | |
| app = FastAPI(title="Review Intelligence API") | |
| class ReviewInput(BaseModel): | |
| text: str | |
| class QuestionInput(BaseModel): | |
| question: str | |
| @app.post("/predict") | |
| def predict(review: ReviewInput): | |
| results = predict_aspects(review.text, model, tokenizer) | |
| return {"review": review.text, "aspects": results} | |
| @app.post("/ask") | |
| def ask(q: QuestionInput): | |
| answer = answer_question(q.question) | |
| return {"question": q.question, "answer": answer} | |
| @app.get("/stats") | |
| def get_stats(): | |
| # Return pre-computed aggregate stats | |
| return aggregate_sentiment_stats | |
| ``` | |
| ### Step 8: `app/streamlit_app.py` β Dashboard | |
| **3 main panels:** | |
| **Panel 1 β Live Analysis:** | |
| - Text input: paste any review | |
| - Click "Analyze" β shows aspect sentiment cards with color coding | |
| - Green = positive, Red = negative, Gray = neutral | |
| **Panel 2 β Aggregate Dashboard:** | |
| - Aspect sentiment heatmap (aspects Γ sentiment, color intensity = count) | |
| - Trend chart showing sentiment over time per aspect (if reviews have timestamps) | |
| - Bar chart: "Top 5 complaints" and "Top 5 praises" | |
| **Panel 3 β Q&A:** | |
| - Text input: "What do customers say about food quality?" | |
| - Returns LLM-synthesized answer with source reviews shown below | |
| ## requirements.txt | |
| ``` | |
| torch>=2.0 | |
| transformers>=4.35 | |
| datasets>=2.14 | |
| sentence-transformers>=2.2 | |
| faiss-cpu>=1.7 | |
| scikit-learn>=1.3 | |
| pandas>=2.0 | |
| numpy>=1.24 | |
| fastapi>=0.104 | |
| uvicorn>=0.24 | |
| streamlit>=1.28 | |
| plotly>=5.17 | |
| groq>=0.4 | |
| python-dotenv>=1.0 | |
| ``` | |
| ## Build Order (Do This Sequence) | |
| | Step | File | Time Estimate | Dependency | | |
| |---|---|---|---| | |
| | 1 | Download SemEval data | 10 min | None | | |
| | 2 | `src/data_processing.py` | 30 min | Step 1 | | |
| | 3 | `src/train_absa.py` | 2-3 hours | Step 2 | | |
| | 4 | `src/evaluate.py` | 30 min | Step 3 | | |
| | 5 | `src/inference.py` | 30 min | Step 3 | | |
| | 6 | `src/build_vectorstore.py` | 20 min | Step 2 | | |
| | 7 | `src/rag_engine.py` | 1 hour | Steps 5+6 | | |
| | 8 | `api/main.py` | 1 hour | Steps 5+7 | | |
| | 9 | `app/streamlit_app.py` | 2-3 hours | Step 8 | | |
| | 10 | `Dockerfile` + `docker-compose.yml` | 30 min | Step 8 | | |
| | 11 | `README.md` | 1 hour | All | | |
| **Total estimated time: 10-12 hours of focused work.** | |
| ## Resume Bullets (Draft) | |
| > **Aspect-Based Review Intelligence System** | PyTorch, RoBERTa, FAISS, FastAPI, Streamlit | |
| > GitHub | Live Demo | |
| > | |
| > β’ Fine-tuned RoBERTa on SemEval-2014 benchmark for aspect-based sentiment analysis, achieving [X]% macro F1 across 5 aspect categories (food, service, ambiance, price), outperforming TF-IDF + Logistic Regression baseline by [Y]%. | |
| > | |
| > β’ Built RAG-powered Q&A layer using sentence-transformers and FAISS over 3,000+ annotated reviews, enabling natural language queries like "why do customers complain about service?" with LLM-synthesized answers. | |
| > | |
| > β’ Deployed as a full-stack application with FastAPI backend and Streamlit dashboard featuring real-time aspect sentiment analysis, aggregate heatmaps, and conversational Q&A interface. | |
| ## Interview Questions You Must Prepare For | |
| | Question | What They're Testing | | |
| |---|---| | |
| | Why RoBERTa over BERT? | Do you understand model differences? (RoBERTa = better training, no NSP, more data) | | |
| | Why sentence-pair format for ABSA? | Do you understand how to reformulate tasks for transformers? | | |
| | What's the difference between aspect term extraction and aspect category detection? | NLP depth | | |
| | How would you handle aspects not in the 5 categories? | Can you think beyond the training data? | | |
| | Why FAISS over ChromaDB? | Do you understand trade-offs? (FAISS = speed, Chroma = ease) | | |
| | How do you handle reviews with conflicting sentiments? | The "conflict" label β do you understand it? | | |