Spaces:
Sleeping
Sleeping
| license: mit | |
| title: ' QA Bug Triage Pipeline' | |
| sdk: gradio | |
| emoji: π | |
| colorFrom: red | |
| colorTo: gray | |
| # π QA Bug Triage Pipeline | |
| > A modern RAG workflow for turning messy app reviews into structured, searchable QA bug intelligence. | |
| [](https://python.org) | |
| [](https://openai.com) | |
| [](https://gradio.app) | |
| [](https://trychroma.com) | |
| [](LICENSE) | |
| **π Links:** [Hugging Face Demo](https://huggingface.co/spaces/aiqualitylab/qa-bug-triage) Β· [GitHub Repository](https://github.com/aiqualitylab/qa-bug-triage) | |
| --- | |
| ## π Overview | |
| Teams often receive product feedback as noisy, repetitive, and unstructured review text. This project converts those reviews into structured bug reports with an LLM, stores them in a local vector database, and makes them easy to search and summarize. | |
| The result is a lightweight **bug triage assistant** built with Python, Gradio, OpenAI, ChromaDB, and RAG evaluation tooling. | |
| --- | |
| ## β¨ What It Does | |
| | Capability | Description | | |
| |---|---| | |
| | π₯ Review collection | Fetches real Google Play reviews | | |
| | π Query routing | Classifies incoming text before triage | | |
| | ποΈ Structured triage | Generates JSON bug reports with consistent fields | | |
| | π Hybrid retrieval | Combines semantic retrieval with BM25 keyword matching | | |
| | π€ AI summaries | Produces concise summaries for triage and search results | | |
| | ποΈ Store reset | Clears persisted bugs directly from the UI | | |
| --- | |
| ## ποΈ Architecture | |
| ``` | |
| Google Play Reviews | |
| β | |
| βΌ | |
| βββββββββββββββ | |
| β Query Router β βββ feature request / general complaint (dropped) | |
| βββββββββββββββ | |
| β bug report | |
| βΌ | |
| βββββββββββββββ | |
| β Triage β βββ structured JSON bug record | |
| βββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββ | |
| β ChromaDB β βββ vector + BM25 hybrid index | |
| βββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββ | |
| β AI Summary β βββ concise triage output | |
| βββββββββββββββ | |
| ``` | |
| --- | |
| ## π Quick Start | |
| ```powershell | |
| # Windows PowerShell | |
| python -m venv .venv | |
| .\.venv\Scripts\Activate.ps1 | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| Then open the local Gradio URL in your browser. | |
| --- | |
| ## π API Keys | |
| This app uses **BYOK (Bring Your Own Key)**: | |
| - Paste your OpenAI API key into the masked field in the UI | |
| - The key input is masked and never committed to the repository | |
| > β οΈ **Never commit API keys to source control.** | |
| --- | |
| ## π₯οΈ How To Use | |
| 1. **Collect** β fetch and triage live Google Play reviews | |
| 2. **Triage** β analyze a single custom review | |
| 3. **Search** β retrieve similar bugs via hybrid retrieval | |
| 4. **Clear bugs** β reset the ChromaDB store | |
| --- | |
| ## π Project Structure | |
| ``` | |
| qa-bug-triage/ | |
| βββ app.py # Gradio app and interaction flows | |
| βββ collect.py # Google Play review collection | |
| βββ triage.py # Routing and structured triage logic | |
| βββ rag.py # Chroma storage and hybrid retrieval | |
| βββ eval/ | |
| βββ eval.py # RAG evaluation script | |
| βββ eval_dataset.json # Evaluation dataset | |
| βββ results.json # Latest saved evaluation metrics | |
| ``` | |
| --- | |
| ## π Evaluation | |
| Run the evaluation suite: | |
| ```powershell | |
| python eval\eval.py --api-key YOUR_OPENAI_API_KEY | |
| ``` | |
| **Latest results:** | |
| | Metric | Score | | |
| |---|---| | |
| | Answer Relevancy | `0.868` | | |
| | Faithfulness | `0.292` | | |
| | Context Precision | `0.020` | | |
| --- | |
| ## π° Cost Estimate | |
| **Target:** under `$0.50` for a short demo session. | |
| | Parameter | Value | | |
| |---|---| | |
| | Token range | ~8k β 20k tokens | | |
| | Typical cost | < $0.50 per session | | |
| | Recommended max reviews | 5 β 10 | | |
| **Tips to keep costs low:** | |
| - Keep max reviews between 5 and 10 | |
| - Avoid repeated large collect runs | |
| - Use short test inputs for manual triage validation | |
| --- | |
| ## π οΈ Tech Stack | |
| | Tool | Role | | |
| |---|---| | |
| | [Python](https://python.org) | Core language | | |
| | [Gradio](https://gradio.app) | Web UI | | |
| | [OpenAI GPT-4o](https://openai.com) | LLM for triage and summaries | | |
| | [ChromaDB](https://trychroma.com) | Vector store | | |
| | [rank-bm25](https://github.com/dorianbrown/rank_bm25) | Keyword retrieval | | |
| | [RAGAS](https://docs.ragas.io) | RAG evaluation framework | | |
| | [google-play-scraper](https://github.com/JoMingyu/google-play-scraper) | Review data source | | |
| --- | |
| ## β Functionalities Implemented | |
| ### Requirements covered | |
| - [x] RAG project written in Python | |
| - [x] Uses at least one LLM | |
| - [x] Public repository with collection and curation scripts | |
| - [x] README with project explanation and setup | |
| - [x] BYOK input in the UI β see [API Keys](#-api-keys) | |
| - [x] Cost estimate included β see [Cost Estimate](#-cost-estimate) | |
| - [x] API key requirements listed β see [API Keys](#-api-keys) | |
| - [x] More than 5 optional techniques covered (7 total β see below) | |
| ### Techniques implemented | |
| - [x] Streaming responses in the UI β `app.py` | |
| - [x] Dynamic few-shot prompting using similar bugs β `triage.py` | |
| - [x] Evaluation code and dataset included β `eval/eval.py`, `eval/eval_dataset.json` | |
| - [x] Domain-specific app for QA bug triage β `triage.py`, `app.py` | |
| - [x] Structured JSON data curation for RAG β `triage.py` | |
| - [x] Hybrid retrieval with semantic search and BM25 β `rag.py` | |
| - [x] Query routing in the active app flow β `triage.py` | |
| --- | |
| ## π License | |
| MIT Β© [aiqualitylab](https://github.com/aiqualitylab) |