qa-bug-triage / README.md
aiqualitylab's picture
Update README.md
a00a036 verified
---
license: mit
title: ' QA Bug Triage Pipeline'
sdk: gradio
emoji: πŸ†
colorFrom: red
colorTo: gray
---
# πŸ› QA Bug Triage Pipeline
> A modern RAG workflow for turning messy app reviews into structured, searchable QA bug intelligence.
[![Python](https://img.shields.io/badge/Python-3.10+-blue?style=flat-square&logo=python&logoColor=white)](https://python.org)
[![OpenAI](https://img.shields.io/badge/GPT--4o-OpenAI-412991?style=flat-square&logo=openai&logoColor=white)](https://openai.com)
[![Gradio](https://img.shields.io/badge/Gradio-UI-orange?style=flat-square&logo=gradio&logoColor=white)](https://gradio.app)
[![ChromaDB](https://img.shields.io/badge/ChromaDB-Vector%20Store-teal?style=flat-square)](https://trychroma.com)
[![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE)
**πŸ”— Links:** [Hugging Face Demo](https://huggingface.co/spaces/aiqualitylab/qa-bug-triage) Β· [GitHub Repository](https://github.com/aiqualitylab/qa-bug-triage)
---
## πŸ“– Overview
Teams often receive product feedback as noisy, repetitive, and unstructured review text. This project converts those reviews into structured bug reports with an LLM, stores them in a local vector database, and makes them easy to search and summarize.
The result is a lightweight **bug triage assistant** built with Python, Gradio, OpenAI, ChromaDB, and RAG evaluation tooling.
---
## ✨ What It Does
| Capability | Description |
|---|---|
| πŸ“₯ Review collection | Fetches real Google Play reviews |
| πŸ”€ Query routing | Classifies incoming text before triage |
| πŸ—‚οΈ Structured triage | Generates JSON bug reports with consistent fields |
| πŸ” Hybrid retrieval | Combines semantic retrieval with BM25 keyword matching |
| πŸ€– AI summaries | Produces concise summaries for triage and search results |
| πŸ—‘οΈ Store reset | Clears persisted bugs directly from the UI |
---
## πŸ—οΈ Architecture
```
Google Play Reviews
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Query Router β”‚ ──→ feature request / general complaint (dropped)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ bug report
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Triage β”‚ ──→ structured JSON bug record
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ChromaDB β”‚ ──→ vector + BM25 hybrid index
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ AI Summary β”‚ ──→ concise triage output
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## πŸš€ Quick Start
```powershell
# Windows PowerShell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python app.py
```
Then open the local Gradio URL in your browser.
---
## πŸ”‘ API Keys
This app uses **BYOK (Bring Your Own Key)**:
- Paste your OpenAI API key into the masked field in the UI
- The key input is masked and never committed to the repository
> ⚠️ **Never commit API keys to source control.**
---
## πŸ–₯️ How To Use
1. **Collect** β€” fetch and triage live Google Play reviews
2. **Triage** β€” analyze a single custom review
3. **Search** β€” retrieve similar bugs via hybrid retrieval
4. **Clear bugs** β€” reset the ChromaDB store
---
## πŸ“ Project Structure
```
qa-bug-triage/
β”œβ”€β”€ app.py # Gradio app and interaction flows
β”œβ”€β”€ collect.py # Google Play review collection
β”œβ”€β”€ triage.py # Routing and structured triage logic
β”œβ”€β”€ rag.py # Chroma storage and hybrid retrieval
└── eval/
β”œβ”€β”€ eval.py # RAG evaluation script
β”œβ”€β”€ eval_dataset.json # Evaluation dataset
└── results.json # Latest saved evaluation metrics
```
---
## πŸ“Š Evaluation
Run the evaluation suite:
```powershell
python eval\eval.py --api-key YOUR_OPENAI_API_KEY
```
**Latest results:**
| Metric | Score |
|---|---|
| Answer Relevancy | `0.868` |
| Faithfulness | `0.292` |
| Context Precision | `0.020` |
---
## πŸ’° Cost Estimate
**Target:** under `$0.50` for a short demo session.
| Parameter | Value |
|---|---|
| Token range | ~8k – 20k tokens |
| Typical cost | < $0.50 per session |
| Recommended max reviews | 5 – 10 |
**Tips to keep costs low:**
- Keep max reviews between 5 and 10
- Avoid repeated large collect runs
- Use short test inputs for manual triage validation
---
## πŸ› οΈ Tech Stack
| Tool | Role |
|---|---|
| [Python](https://python.org) | Core language |
| [Gradio](https://gradio.app) | Web UI |
| [OpenAI GPT-4o](https://openai.com) | LLM for triage and summaries |
| [ChromaDB](https://trychroma.com) | Vector store |
| [rank-bm25](https://github.com/dorianbrown/rank_bm25) | Keyword retrieval |
| [RAGAS](https://docs.ragas.io) | RAG evaluation framework |
| [google-play-scraper](https://github.com/JoMingyu/google-play-scraper) | Review data source |
---
## βœ… Functionalities Implemented
### Requirements covered
- [x] RAG project written in Python
- [x] Uses at least one LLM
- [x] Public repository with collection and curation scripts
- [x] README with project explanation and setup
- [x] BYOK input in the UI β€” see [API Keys](#-api-keys)
- [x] Cost estimate included β€” see [Cost Estimate](#-cost-estimate)
- [x] API key requirements listed β€” see [API Keys](#-api-keys)
- [x] More than 5 optional techniques covered (7 total β€” see below)
### Techniques implemented
- [x] Streaming responses in the UI β€” `app.py`
- [x] Dynamic few-shot prompting using similar bugs β€” `triage.py`
- [x] Evaluation code and dataset included β€” `eval/eval.py`, `eval/eval_dataset.json`
- [x] Domain-specific app for QA bug triage β€” `triage.py`, `app.py`
- [x] Structured JSON data curation for RAG β€” `triage.py`
- [x] Hybrid retrieval with semantic search and BM25 β€” `rag.py`
- [x] Query routing in the active app flow β€” `triage.py`
---
## πŸ“„ License
MIT Β© [aiqualitylab](https://github.com/aiqualitylab)