DrSyedFaizan commited on
Commit
ce940cd
Β·
verified Β·
1 Parent(s): d414b21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -17
README.md CHANGED
@@ -56,36 +56,67 @@ cd First-Aid-Tutor
56
  pip install -r requirements.txt
57
  python raggpt.py
58
  ```
59
- ## πŸ“Š Evaluation Metrics
60
- The chatbot's responses were evaluated using **faithfulness, answer relevancy, context recall, answer correctness, and semantic similarity**.
61
 
62
- | Metric | Score |
63
- |----------------------|--------|
64
- | **Faithfulness** | **0.0900** |
65
- | **Answer Relevancy** | **0.9609** |
66
- | **Context Recall** | **1.0000** |
67
- | **Answer Correctness** | **0.2689** |
68
- | **Semantic Similarity** | **0.7756** |
69
 
70
- πŸ” **[View Full Evaluation Report on WandB](https://api.wandb.ai/links/drsyedfaizan1987-northeastern-university/xrlfa4vq)**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ---
73
 
 
74
 
 
 
 
75
 
 
 
 
 
 
76
 
77
  ---
78
 
79
- ## πŸ“– Evaluation Method
80
- - Used **RAGAS evaluation framework with GPT-4** to assess **answer correctness, relevancy, and factual consistency**.
81
- - Evaluated on **10 first-aid-related questions**, covering:
82
- - βœ… High fever in infants
83
- - βœ… Low blood sugar treatment
84
- - βœ… First-aid steps for burns, seizures, choking, etc.
85
- - **Expected answers** were sourced from **medical literature**.
86
 
87
  ---
88
 
 
89
  ## πŸ“₯ Download & Re-Evaluate
90
  You can **re-evaluate the chatbot** by running the following:
91
 
 
56
  pip install -r requirements.txt
57
  python raggpt.py
58
  ```
 
 
59
 
 
 
 
 
 
 
 
60
 
61
+ # 🩺 First Aid Assistant - Model Evaluation Report
62
+
63
+ This repository presents the evaluation results of the **First Aid Assistant** chatbot, which provides first aid guidance based on common emergency conditions. The model has been evaluated using the **RAGAS** framework with metrics that assess the quality of the generated answers.
64
+
65
+ ## πŸ“Š **Evaluation Metrics**
66
+
67
+ The chatbot was evaluated based on the following RAGAS metrics:
68
+
69
+ - **Answer Relevancy:** Measures how relevant the response is to the user's question.
70
+ - **Answer Correctness:** Compares the generated response to the ground truth to assess factual correctness.
71
+ - **Semantic Similarity:** Evaluates how semantically similar the generated answer is to the reference answer.
72
+
73
+ ---
74
+
75
+ ## πŸš€ **Performance Summary**
76
+
77
+ | **Metric** | **Average Score** |
78
+ |:--------------------------|:-----------------:|
79
+ | **Answer Relevancy** | **0.94** |
80
+ | **Answer Correctness** | **0.91** |
81
+ | **Semantic Similarity** | **0.97** |
82
+
83
+ ---
84
+
85
+ ## πŸ“ˆ **Detailed Results**
86
+
87
+ Here’s a snapshot of the evaluation for some sample questions:
88
+
89
+ | **Question** | **Answer Relevancy** | **Answer Correctness** | **Semantic Similarity** |
90
+ |---------------------------------------------------------|----------------------|------------------------|-------------------------|
91
+ | What are the first aid measures for high fever in infants? | 0.93 | 0.85 | 0.98 |
92
+ | What are the signs and symptoms of low blood sugar? | 0.85 | 0.98 | 0.94 |
93
+ | What does RICE stand for in first aid treatment? | 0.99 | 1.00 | 0.98 |
94
+ | What is the treatment of snake bite? | 0.96 | 1.00 | 0.98 |
95
+ | How do you provide first aid for choking? | 0.96 | 0.97 | 0.98 |
96
 
97
  ---
98
 
99
+ ## πŸ“‹ **Key Insights**
100
 
101
+ - The chatbot performed exceptionally well in **semantic similarity** (average score of **0.97**), indicating that responses are closely aligned with the ground truth.
102
+ - **Answer correctness** is strong overall but showed slight variability, suggesting room for improvement in handling complex queries.
103
+ - The **relevancy** of responses remained consistently high, reflecting the model's ability to address user questions effectively.
104
 
105
+ ---
106
+
107
+ ## πŸ“ **Evaluation Artifacts**
108
+
109
+ - **RAGAS Evaluation Report:** [View Full Report](./ragas_evaluation_results.html)
110
 
111
  ---
112
 
113
+ ## 🌟 **Conclusion**
114
+
115
+ The **First Aid Assistant** demonstrates reliable performance in answering first aid-related queries with high semantic accuracy and relevancy. Continuous improvement in factual correctness will further enhance its capability to provide life-saving information in emergency situations.
 
 
 
 
116
 
117
  ---
118
 
119
+
120
  ## πŸ“₯ Download & Re-Evaluate
121
  You can **re-evaluate the chatbot** by running the following:
122