Update README.md
Browse files
README.md
CHANGED
|
@@ -56,36 +56,67 @@ cd First-Aid-Tutor
|
|
| 56 |
pip install -r requirements.txt
|
| 57 |
python raggpt.py
|
| 58 |
```
|
| 59 |
-
## π Evaluation Metrics
|
| 60 |
-
The chatbot's responses were evaluated using **faithfulness, answer relevancy, context recall, answer correctness, and semantic similarity**.
|
| 61 |
|
| 62 |
-
| Metric | Score |
|
| 63 |
-
|----------------------|--------|
|
| 64 |
-
| **Faithfulness** | **0.0900** |
|
| 65 |
-
| **Answer Relevancy** | **0.9609** |
|
| 66 |
-
| **Context Recall** | **1.0000** |
|
| 67 |
-
| **Answer Correctness** | **0.2689** |
|
| 68 |
-
| **Semantic Similarity** | **0.7756** |
|
| 69 |
|
| 70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
---
|
| 73 |
|
|
|
|
| 74 |
|
|
|
|
|
|
|
|
|
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
---
|
| 78 |
|
| 79 |
-
##
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
- β
High fever in infants
|
| 83 |
-
- β
Low blood sugar treatment
|
| 84 |
-
- β
First-aid steps for burns, seizures, choking, etc.
|
| 85 |
-
- **Expected answers** were sourced from **medical literature**.
|
| 86 |
|
| 87 |
---
|
| 88 |
|
|
|
|
| 89 |
## π₯ Download & Re-Evaluate
|
| 90 |
You can **re-evaluate the chatbot** by running the following:
|
| 91 |
|
|
|
|
| 56 |
pip install -r requirements.txt
|
| 57 |
python raggpt.py
|
| 58 |
```
|
|
|
|
|
|
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
+
# π©Ί First Aid Assistant - Model Evaluation Report
|
| 62 |
+
|
| 63 |
+
This repository presents the evaluation results of the **First Aid Assistant** chatbot, which provides first aid guidance based on common emergency conditions. The model has been evaluated using the **RAGAS** framework with metrics that assess the quality of the generated answers.
|
| 64 |
+
|
| 65 |
+
## π **Evaluation Metrics**
|
| 66 |
+
|
| 67 |
+
The chatbot was evaluated based on the following RAGAS metrics:
|
| 68 |
+
|
| 69 |
+
- **Answer Relevancy:** Measures how relevant the response is to the user's question.
|
| 70 |
+
- **Answer Correctness:** Compares the generated response to the ground truth to assess factual correctness.
|
| 71 |
+
- **Semantic Similarity:** Evaluates how semantically similar the generated answer is to the reference answer.
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## π **Performance Summary**
|
| 76 |
+
|
| 77 |
+
| **Metric** | **Average Score** |
|
| 78 |
+
|:--------------------------|:-----------------:|
|
| 79 |
+
| **Answer Relevancy** | **0.94** |
|
| 80 |
+
| **Answer Correctness** | **0.91** |
|
| 81 |
+
| **Semantic Similarity** | **0.97** |
|
| 82 |
+
|
| 83 |
+
---
|
| 84 |
+
|
| 85 |
+
## π **Detailed Results**
|
| 86 |
+
|
| 87 |
+
Hereβs a snapshot of the evaluation for some sample questions:
|
| 88 |
+
|
| 89 |
+
| **Question** | **Answer Relevancy** | **Answer Correctness** | **Semantic Similarity** |
|
| 90 |
+
|---------------------------------------------------------|----------------------|------------------------|-------------------------|
|
| 91 |
+
| What are the first aid measures for high fever in infants? | 0.93 | 0.85 | 0.98 |
|
| 92 |
+
| What are the signs and symptoms of low blood sugar? | 0.85 | 0.98 | 0.94 |
|
| 93 |
+
| What does RICE stand for in first aid treatment? | 0.99 | 1.00 | 0.98 |
|
| 94 |
+
| What is the treatment of snake bite? | 0.96 | 1.00 | 0.98 |
|
| 95 |
+
| How do you provide first aid for choking? | 0.96 | 0.97 | 0.98 |
|
| 96 |
|
| 97 |
---
|
| 98 |
|
| 99 |
+
## π **Key Insights**
|
| 100 |
|
| 101 |
+
- The chatbot performed exceptionally well in **semantic similarity** (average score of **0.97**), indicating that responses are closely aligned with the ground truth.
|
| 102 |
+
- **Answer correctness** is strong overall but showed slight variability, suggesting room for improvement in handling complex queries.
|
| 103 |
+
- The **relevancy** of responses remained consistently high, reflecting the model's ability to address user questions effectively.
|
| 104 |
|
| 105 |
+
---
|
| 106 |
+
|
| 107 |
+
## π **Evaluation Artifacts**
|
| 108 |
+
|
| 109 |
+
- **RAGAS Evaluation Report:** [View Full Report](./ragas_evaluation_results.html)
|
| 110 |
|
| 111 |
---
|
| 112 |
|
| 113 |
+
## π **Conclusion**
|
| 114 |
+
|
| 115 |
+
The **First Aid Assistant** demonstrates reliable performance in answering first aid-related queries with high semantic accuracy and relevancy. Continuous improvement in factual correctness will further enhance its capability to provide life-saving information in emergency situations.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
---
|
| 118 |
|
| 119 |
+
|
| 120 |
## π₯ Download & Re-Evaluate
|
| 121 |
You can **re-evaluate the chatbot** by running the following:
|
| 122 |
|