Spaces:

aiqualitylab
/

qa-bug-triage

Sleeping

App Files Files Community

qa-bug-triage / README.md

aiqualitylab

Update README.md

a00a036 verified about 1 month ago

preview code

raw

history blame contribute delete

6.17 kB

	---
	license: mit
	title: ' QA Bug Triage Pipeline'
	sdk: gradio
	emoji: 🏆
	colorFrom: red
	colorTo: gray
	---
	# 🐛 QA Bug Triage Pipeline

	> A modern RAG workflow for turning messy app reviews into structured, searchable QA bug intelligence.

	[![Python](https://img.shields.io/badge/Python-3.10+-blue?style=flat-square&logo=python&logoColor=white)](https://python.org)
	[![OpenAI](https://img.shields.io/badge/GPT--4o-OpenAI-412991?style=flat-square&logo=openai&logoColor=white)](https://openai.com)
	[![Gradio](https://img.shields.io/badge/Gradio-UI-orange?style=flat-square&logo=gradio&logoColor=white)](https://gradio.app)
	[![ChromaDB](https://img.shields.io/badge/ChromaDB-Vector%20Store-teal?style=flat-square)](https://trychroma.com)
	[![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE)

	🔗 Links: [Hugging Face Demo](https://huggingface.co/spaces/aiqualitylab/qa-bug-triage) · [GitHub Repository](https://github.com/aiqualitylab/qa-bug-triage)

	---

	## 📖 Overview

	Teams often receive product feedback as noisy, repetitive, and unstructured review text. This project converts those reviews into structured bug reports with an LLM, stores them in a local vector database, and makes them easy to search and summarize.

	The result is a lightweight bug triage assistant built with Python, Gradio, OpenAI, ChromaDB, and RAG evaluation tooling.

	---

	## ✨ What It Does

	\| Capability \| Description \|
	\|---\|---\|
	\| 📥 Review collection \| Fetches real Google Play reviews \|
	\| 🔀 Query routing \| Classifies incoming text before triage \|
	\| 🗂️ Structured triage \| Generates JSON bug reports with consistent fields \|
	\| 🔍 Hybrid retrieval \| Combines semantic retrieval with BM25 keyword matching \|
	\| 🤖 AI summaries \| Produces concise summaries for triage and search results \|
	\| 🗑️ Store reset \| Clears persisted bugs directly from the UI \|

	---

	## 🏗️ Architecture

	```
	Google Play Reviews
	│
	▼
	┌─────────────┐
	│ Query Router │ ──→ feature request / general complaint (dropped)
	└─────────────┘
	│ bug report
	▼
	┌─────────────┐
	│ Triage │ ──→ structured JSON bug record
	└─────────────┘
	│
	▼
	┌─────────────┐
	│ ChromaDB │ ──→ vector + BM25 hybrid index
	└─────────────┘
	│
	▼
	┌─────────────┐
	│ AI Summary │ ──→ concise triage output
	└─────────────┘
	```

	---

	## 🚀 Quick Start

	```powershell
	# Windows PowerShell
	python -m venv .venv
	.\.venv\Scripts\Activate.ps1
	pip install -r requirements.txt
	python app.py
	```

	Then open the local Gradio URL in your browser.

	---

	## 🔑 API Keys

	This app uses BYOK (Bring Your Own Key):

	- Paste your OpenAI API key into the masked field in the UI
	- The key input is masked and never committed to the repository

	> ⚠️ Never commit API keys to source control.

	---

	## 🖥️ How To Use

	1. Collect — fetch and triage live Google Play reviews
	2. Triage — analyze a single custom review
	3. Search — retrieve similar bugs via hybrid retrieval
	4. Clear bugs — reset the ChromaDB store

	---

	## 📁 Project Structure

	```
	qa-bug-triage/
	├── app.py # Gradio app and interaction flows
	├── collect.py # Google Play review collection
	├── triage.py # Routing and structured triage logic
	├── rag.py # Chroma storage and hybrid retrieval
	└── eval/
	├── eval.py # RAG evaluation script
	├── eval_dataset.json # Evaluation dataset
	└── results.json # Latest saved evaluation metrics
	```

	---

	## 📊 Evaluation

	Run the evaluation suite:

	```powershell
	python eval\eval.py --api-key YOUR_OPENAI_API_KEY
	```

	Latest results:

	\| Metric \| Score \|
	\|---\|---\|
	\| Answer Relevancy \| `0.868` \|
	\| Faithfulness \| `0.292` \|
	\| Context Precision \| `0.020` \|

	---

	## 💰 Cost Estimate

	Target: under `$0.50` for a short demo session.

	\| Parameter \| Value \|
	\|---\|---\|
	\| Token range \| ~8k – 20k tokens \|
	\| Typical cost \| < $0.50 per session \|
	\| Recommended max reviews \| 5 – 10 \|

	Tips to keep costs low:
	- Keep max reviews between 5 and 10
	- Avoid repeated large collect runs
	- Use short test inputs for manual triage validation

	---

	## 🛠️ Tech Stack

	\| Tool \| Role \|
	\|---\|---\|
	\| [Python](https://python.org) \| Core language \|
	\| [Gradio](https://gradio.app) \| Web UI \|
	\| [OpenAI GPT-4o](https://openai.com) \| LLM for triage and summaries \|
	\| [ChromaDB](https://trychroma.com) \| Vector store \|
	\| [rank-bm25](https://github.com/dorianbrown/rank_bm25) \| Keyword retrieval \|
	\| [RAGAS](https://docs.ragas.io) \| RAG evaluation framework \|
	\| [google-play-scraper](https://github.com/JoMingyu/google-play-scraper) \| Review data source \|

	---

	## ✅ Functionalities Implemented

	### Requirements covered

	- [x] RAG project written in Python
	- [x] Uses at least one LLM
	- [x] Public repository with collection and curation scripts
	- [x] README with project explanation and setup
	- [x] BYOK input in the UI — see [API Keys](#-api-keys)
	- [x] Cost estimate included — see [Cost Estimate](#-cost-estimate)
	- [x] API key requirements listed — see [API Keys](#-api-keys)
	- [x] More than 5 optional techniques covered (7 total — see below)

	### Techniques implemented

	- [x] Streaming responses in the UI — `app.py`
	- [x] Dynamic few-shot prompting using similar bugs — `triage.py`
	- [x] Evaluation code and dataset included — `eval/eval.py`, `eval/eval_dataset.json`
	- [x] Domain-specific app for QA bug triage — `triage.py`, `app.py`
	- [x] Structured JSON data curation for RAG — `triage.py`
	- [x] Hybrid retrieval with semantic search and BM25 — `rag.py`
	- [x] Query routing in the active app flow — `triage.py`

	---

	## 📄 License

	MIT © [aiqualitylab](https://github.com/aiqualitylab)