hungnha
/

DoAn

Model card Files Files and versions

xet

Community

hungnha commited on Feb 25

Commit

225bdac

1 Parent(s): 92c9b4d

Cập nhật readme

Browse files

Files changed (1) hide show

README.md +21 -0

README.md CHANGED Viewed

@@ -296,6 +296,27 @@ python scripts/run_eval.py --samples 20 --mode all
 | **Context Recall** | How well the retrieved contexts cover the ground truth |
 | **ROUGE-1 / ROUGE-2 / ROUGE-L** | N-gram overlap with ground truth answers |
 Results are saved to `evaluation/results/` as both JSON and CSV files with timestamps.
 ---

 | **Context Recall** | How well the retrieved contexts cover the ground truth |
 | **ROUGE-1 / ROUGE-2 / ROUGE-L** | N-gram overlap with ground truth answers |
+### Results
+Benchmark on HUST student regulation Q&A dataset (200 samples):
+| Metric | vector_only | bm25_only | hybrid | hybrid_rerank |
+|---------------------|:-----------:|:---------:|:------:|:-------------:|
+| **Answer Relevancy** | 0.749 | 0.635 | 0.832 | **0.872** |
+| **Context Precision** | 0.678 | 0.538 | 0.795 | **0.861** |
+| **Context Recall** | 0.815 | 0.732 | 0.849 | **0.872** |
+| **Faithfulness** | 0.912 | 0.938 | 0.942 | **0.937** |
+| **ROUGE-1** | 0.557 | 0.533 | 0.576 | **0.598** |
+| **ROUGE-2** | 0.408 | 0.385 | 0.421 | **0.439** |
+| **ROUGE-L** | 0.526 | 0.508 | 0.545 | **0.567** |
+**Key takeaways:**
+- **`hybrid_rerank` achieves the best scores in 6 out of 7 metrics**, confirming it as the optimal default retrieval mode.
+- **Faithfulness is consistently high (>0.91 across all modes)**, meaning the LLM reliably grounds its answers in the provided context with minimal hallucination.
+- **Reranking significantly boosts Context Precision** (+60% over BM25-only, +8% over hybrid), demonstrating the value of Qwen3-Reranker in filtering irrelevant documents.
+- **Hybrid search substantially outperforms single-mode retrieval**, validating the ensemble approach of combining semantic (vector) and lexical (BM25) search.
 Results are saved to `evaluation/results/` as both JSON and CSV files with timestamps.
 ---