Spaces:

rishitpant
/

AskTheHandbook-2

Running

Rishit Pant commited on 22 days ago

Commit

bc9ec9b

1 Parent(s): b9d54f5

Improved ingestion, retrieve and generate (#4)

* Eval prompts and scripts

* Update dependencies

* Answer generation using LLLM

* Evaluation metrics using DeepEval

* Fix: mismatched heading tags

* Add noise regex and context

* Fixed noise patterns

* Improve system prompt

Files changed (10) hide show

.deepeval/.deepeval_telemetry.txt +2 -0
data/grading-document.md +41 -54
eval/eval_prompts.json +37 -0
eval/evaluate.py +423 -0
pyproject.toml +1 -1
requirements.txt +2 -2
src/generate.py +124 -0
src/ingest.py +30 -5
src/retrieve.py +2 -1
uv.lock +18 -5

.deepeval/.deepeval_telemetry.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ DEEPEVAL_ID=5f67d0f6-8daa-40e0-8e0c-a6e888223837
2	+ DEEPEVAL_STATUS=old

data/grading-document.md CHANGED Viewed

@@ -1019,7 +1019,7 @@ BS-DS_ May 2026 Grading document (Student) Updated automatically every 5 minutes
 (W11/W12 contents will be included for the final exam. Hence, please practice and submit W11/W12 assignment).
-# 6. English 2
 **Quiz 1:** July 19 2026 **Quiz 2:** August 16 2026 **End term:** September 13 2026
@@ -1041,9 +1041,9 @@ F = score in final exam
 (W11/W12 contents will be included for the final exam. Hence, please practice and submit W11/W12 assignment).
-# 7. Intro to python programming
-## Academic policies
 1. In each programming assignment, be it any course or any OPPE, taking help from LLMs (e.g. ChatGPT, Gemini) partially or completely is considered plagiarism.
@@ -1291,7 +1291,7 @@ Updated automatically every 5 minutes
 Suggested pathway to register and study
-## Diploma level courses:
 1. **Most aggressive pathway** - completing in 4 terms - ONLY IF YOU ARE DOING THIS AS FULL TIME AND NOTHING ELSE AND CAN SPEND <u>70 HRs PER WEEK</u> MINIMUM
@@ -1872,16 +1872,9 @@ July 3rd week (Tentative): We will release the slots for OPPE 1 & Dates for OPPE
 https://docs.google.com/document/d/e/2PACX-1vT5PBOz4OH663W0IJPVGVjG_nfmYZGfFI7W1j-6wTLcex13O_7BZmf6a96Q6liO0W-mLZB5hOGZ… 13/40
-6/9/26, 7:10 PM BS-DS_ May 2026 Grading document (Student)
-Google Docs info icon Published using Google Docs Report abuse Learn more
 to attend the exam and if you are eligible, exam will be scheduled as per the slots allocated. Please choose courses for the May 2026 term keeping all these points in mind.
-BS-DS_ May 2026 Grading document (Student) Updated automatically every 5 minutes
-# Diploma level courses
 ## 1. Machine Learning foundations (DS Diploma)
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
@@ -1947,17 +1940,6 @@ SoP for the SCT Exam is as follows: <u>Click Here for OPPE SCT SoP Document</u>
 https://docs.google.com/document/d/e/2PACX-1vT5PBOz4OH663W0IJPVGVjG_nfmYZGfFI7W1j-6wTLcex13O_7BZmf6a96Q6liO0W-mLZB5hOGZ… 14/40
-6/9/26, 7:10 PM
-BS-DS_ May 2026 Grading document (Student)
-Google Docs icon Published using Google Docs
-Report abuse Learn more
-BS-DS_ May 2026 Grading document (Student)
-Updated automatically every 5 minutes
 OPPE1 will not be scheduled for students who fail to complete the OPPE SCT exam.
 <u>Repeat_OPPE Criteria</u> (w.e.f from Jan 2026 term) [updated on Jan 05, 2026]
@@ -2096,7 +2078,9 @@ Google Docs logo Published using Google Docs Report abuse Learn more
 # BS-DS_ May 2026 Grading document (Student)
 Updated automatically every 5 minutes
-**September 13 2026**
 **Above to be attended in person at designated centres.**
@@ -2117,7 +2101,7 @@ Overall score for eligible students:
 (Though the W11/W12 assignment score is not included in GAA, W11/W12 contents will be included for the final exam. Hence, please practice and submit W11/W12 assignment).
-# 5. Business Analytics (Diploma in DS)
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
@@ -2182,7 +2166,7 @@ A = Sum of the Best 2 out of (Assignment 1, Assignment 2, Assignment 3)
   </tbody>
 </table>
-# 6. Tools in Data Science (Diploma in DS) [Co-
 https://docs.google.com/document/d/e/2PACX-1vT5PBOz4OH663W0IJPVGVjG_nfmYZGfFI7W1j-6wTLcex13O_7BZmf6a96Q6liO0W-mLZB5hOGZ... 16/40
@@ -2195,7 +2179,7 @@ Report abuse Learn more
 # BS-DS_ May 2026 Grading document (Student)
 Updated automatically every 5 minutes
-## Academic policies
 1. The Tools in Data Science portal has all the course content: Graded Assignments, Projects and ROE links. The Seek Portal is not used for submissions.
@@ -2248,7 +2232,7 @@ Before registering for TDS, please attempt the **Entrance Exam** at **https://ex
 <mark>**Final course score T = 0.2 GAA + 0.2 ROE + 0.2 P1 + 0.2 P2 + 0.2 F**</mark>
-# 7. Programming Data structures and algorithms using Python (PDSA) - Diploma in Programming
 **Weekly assignments:** Mix of autograded assignment and Programming assignments
@@ -2301,9 +2285,9 @@ OP = Score in Online proctored remote exam
 F = score in final exam
 **T = 0.05GAA + 0.2OP + 0.45F + max (0.2max(Qz1, Qz2), (0.10Qz1+0.20Qz2 ))**
-# 8. Database management system (DBMS) -
-## Diploma in Programming
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
@@ -2479,7 +2463,7 @@ BS-DS_ May 2026 Grading document (Student) Updated automatically every 5 minutes
   </tbody>
 </table>
-# 9. Application development - 1 (Diploma in programming)
 **Quiz 1**: July 19 2026
 **Quiz 2**: August 16 2026
@@ -2533,7 +2517,7 @@ Google Docs logo Published using Google Docs Report abuse Learn more
 **Final course score T = 0.05 GLA + max (0.6F + 0.25max(Qz1, Qz2), 0.4F + 0.25Qz1 + 0.3Qz2)**
-# 10. Programming concepts using Java (Diploma in programming)
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
@@ -2744,7 +2728,7 @@ BPT 4 - August 14, 2026
 Will be conducted in the course VM - Each BPT has 4 Questions
-## Repeat_OPPE Criteria
 **(w.e.f from Jan 2026 term) [updated on Jan 05, 2026]**
 **Students who were OPPE eligible and received an I_OP in the Jan 2026 term, and have registered as Repeat OPPE for the May 2026 term, will continue to be considered eligible for OPPE in May 2026. However, students who were marked OPPE Ineligible with I_OP or I_Both and have registered for the respective course as Repeat OPPE or Repeat OPPE & End Term must regain OPPE eligibility in the May 2026 term by submitting the required assignments and OPPE SCT. If OPPE eligibility is not attained, the OPPE will not be scheduled and a U grade will be awarded for the May 2026 term.**
@@ -2752,13 +2736,13 @@ Will be conducted in the course VM - Each BPT has 4 Questions
 **SCT for OPPE and exam day rules:**
 https://docs.google.com/document/d/e/2PACX-1vS4Hhh4MsKD2WL8_D26Vw2WJKw0CBtPihZyKrnEM_kefRXm_O75GqTcJA6IR0X_xCiVL5gUi5y6_bjw/pub
-## Eligibility to attend the end term exam:
 Average of the best 5 out of the first 7 weekly assessments (objective and programming) scores >= 40/100 AND OPPE should be eligible
-## Eligibility to get the course grade:
 Attending the end semester exam AND programming exam (OPPE) score >= 40/100
-## There will be ONE OPPE based on weeks 1-9.
 Students have to mandatorily attend the OPPE on the first date.
 * If you fail in this, you get a chance to reappear the next weekend. So you get 2 chances to attempt the exam.
@@ -2952,7 +2936,7 @@ Overall score for eligible students:
 ## Project Courses:
-Project courses are now for 2 credits each - BDM, MLP, App Dev 1, App Dev 2, DI-GenAi project.
 The Project courses are not part of CCC. The CCC is only for theory courses.
@@ -2992,7 +2976,7 @@ Submission deadline :
   </tbody>
 </table>
-## FOR, MAD1, MAD2 PROJECTS:
 <table>
   <thead>
@@ -3034,7 +3018,7 @@ Updated automatically every 5 minutes
 https://docs.google.com/document/d/e/2PACX-1vT5PBOz4OH663W0IJPVGVjG_nfmYZGfFI7W1j-6wTLcex13O_7BZmf6a96Q6liO0W-mLZB5hOGZ-A9dMIXVrfe069kqUkplPMRAiZBBIMJcp9XTCR/pub
-## MAD I Project
 **Project Document:**
 https://docs.google.com/document/d/e/2PACX-1vR9pOLPvPzsxqhtrmtLi8GAV2j2JxpisWkIHCjEb8WuYrgm4ZPq8S_Sor-MygixO4hvGcPvO6Ei_W/pub
@@ -3042,7 +3026,7 @@ https://docs.google.com/document/d/e/2PACX-1vR9pOLPvPzsxqhtrmtLi8GAV2j2JxpisWkIH
 **Project statement - Trekking Management Application:**
 https://docs.google.com/document/d/e/2PACX-1vQvqzWz2tFt96B8VApnHqWqIP3LtPDbnXwApYr8VOffLCm_Zh2JuTa51z7d1CNJbrZKC0oWPredYcV/pub
-## MAD II Project
 **Project document:**
 https://docs.google.com/document/d/e/2PACX-1vTKQBIsllAp4VUq_rST3_rbpMSoADyfb3agZ1E-jrGQxMeN2IBNOXu3GS0mchzRWaxWa204q52tbFYA/pub
@@ -3055,11 +3039,11 @@ https://docs.google.com/document/d/e/2PACX-1vTcayCKq8OPAVeTZHXNwCXxvJfiQRCmxiigX
 **BDM Project: BDM PROJECT SUBMISSION TIMELINE**
-## Deep learning and Generative AI Project
 https://docs.google.com/document/d/e/2PACX-1vTTorjQe5LH-TP_c0x-iDjdm-zX30uvJ3jv_QKtDvGsVoQydTkCWsfKwTUtCQv5syWsnSxmN4W3KFzE/pub_
-## Rules regarding project fees:
 The fee paid for each of the 4 projects is valid for 2 terms. Please read this carefully so that you register properly.
@@ -3118,7 +3102,7 @@ who will be entering Degree level in the Sep 2026 term. **Without completing the
 **Students are requested to complete the projects at least two terms before their diploma completion term i.e DO NOT REGISTER OR HAVE PROJECTS ALONE LEFT IN YOUR LAST TERM OF DIPLOMA.**
-## Very important:
 1. <u>Viva policy for MLP, DL&GEN AI</u>, <u>Viva Policy for Appdev</u>
@@ -3187,7 +3171,7 @@ It is important you learn more on programming and data science outside of what t
 4. Please do not share your assignments with others before the deadlines. If similarities are found between submissions, all will be penalised irrespective of who did it first and who shared it with whom.
-## 1. Software Testing
 **Quiz 1**: July 19 2026 **Quiz 2**: August 16 2026 **End term**: September 13 2026
@@ -3204,7 +3188,7 @@ Qz2 = score in Quiz II (0, if not attempted)
 F - score in End Term exam
 **T = 0.1GAA + 0.4F + 0.25Qz1 + 0.25Qz2**
-## 2. Software Engineering
 **Quiz 1**: No Quiz **Quiz 2**: August 16 2026 **End term**: September 13 2026
@@ -3269,7 +3253,7 @@ BS-DS_ May 2026 Grading document (Student)
 Updated automatically every 5 minutes
-## 3. Deep Learning
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
@@ -3296,7 +3280,7 @@ F - score in End Term exam
 **T = 0.05GAA + 0.25Qz1 + 0.25Qz2 + 0.45F**
-## 4. AI: Search Methods for Problem Solving
 **Academic policies**
@@ -3339,7 +3323,7 @@ Programming Assignment 1 will be released between Quiz 1 and Quiz 2.
 The dates will be announced in the forum. This assignment will be evaluated offline.
-## 5. Strategies for Professional Growth
 **Academic policies**
@@ -3407,7 +3391,7 @@ Project:
 More details about the Group Project will be given in the course.
 Bonus marks for additional activities may be awarded at the discretion of faculty or instructor, provided the student passes the course.
-## 6. Programming in C
 **Quiz 1: July 19 2026** **Quiz 2: No Quiz 2** **End term: September 13 2026**
 **Above to be attended in person at designated centres**
@@ -3516,6 +3500,9 @@ Google Docs icon Published using Google Docs
 Updated automatically every 5 minutes
 <span style="color: red">Above to be attended in person at designated centres.</span>
 Quiz1 - based on content taught by Prof Mitesh (Based on weeks 1-4)
@@ -3721,7 +3708,7 @@ Qz2 = score in Quiz II (0, if not attempted), Syllabus: Week 5-8
 F = score in final exam, Syllabus: Week 1-12
 **T = 0.075 GAA + 0.025 GRPA + 0.25Qz1 + 0.25Qz2 + 0.4F**
-## 13.Market Research
 **Quiz 1:** July 19 2026 **Quiz 2:** August 16 2026 **End term:** September 13 2026
@@ -3752,7 +3739,7 @@ Overall score for eligible students:
 **Case release date:** YTD
 **Case presentation:** Depending on numbers, if needed can extend one more day for presentation.
-## 14.Managerial Economics
 **Quiz 1:** July 19 2026 **Quiz 2:** August 16 2026 **End term:** September 13 2026
@@ -3773,7 +3760,7 @@ F = score in final exam
 **T = 0.1GAA + 0.4F + 0.25Qz1 + 0.25Qz2**
-## 15. MLOps (Machine Learning Operations)
 **Quiz 1:** NA **Quiz 2:** NA **End term:** September 13 2026
@@ -3981,7 +3968,7 @@ Qz2 = score in Quiz II (0, if not attempted)
 F - score in End Term exam
 T = 0.2 GAA + 0.3F + 0.25Qz1 + 0.25Qz2
-## 20. Computer Systems Design
 **Quiz 1**: July 19 2026 **Quiz 2**: August 16 2026 **End term**: September 13 2026
@@ -3999,7 +3986,7 @@ F = score in final exam
 T = 0.1GAA + 0.4F + 0.2Qz1 + 0.25Qz2 + 0.05 circuit verse assignment
-## 21. Game Theory and Strategy
 **Quiz 1**: July 19 2026 **Quiz 2**: August 16 2026 **End term**: September 13 2026
@@ -4021,7 +4008,7 @@ F = score in final exam
 T = 0.1GAA + 0.4F + 0.25Qz1 + 0.25Qz2
-## 22. Algorithms for Data Science (ADS)
 **Quiz 1**: NA **Quiz 2**: August 16 2026 **End term**: September 13 2026

 (W11/W12 contents will be included for the final exam. Hence, please practice and submit W11/W12 assignment).
+## 6. English 2
 **Quiz 1:** July 19 2026 **Quiz 2:** August 16 2026 **End term:** September 13 2026
 (W11/W12 contents will be included for the final exam. Hence, please practice and submit W11/W12 assignment).
+## 7. Intro to python programming
+### Academic policies
 1. In each programming assignment, be it any course or any OPPE, taking help from LLMs (e.g. ChatGPT, Gemini) partially or completely is considered plagiarism.
 Suggested pathway to register and study
+# Diploma level courses:
 1. **Most aggressive pathway** - completing in 4 terms - ONLY IF YOU ARE DOING THIS AS FULL TIME AND NOTHING ELSE AND CAN SPEND <u>70 HRs PER WEEK</u> MINIMUM
 https://docs.google.com/document/d/e/2PACX-1vT5PBOz4OH663W0IJPVGVjG_nfmYZGfFI7W1j-6wTLcex13O_7BZmf6a96Q6liO0W-mLZB5hOGZ… 13/40
 to attend the exam and if you are eligible, exam will be scheduled as per the slots allocated. Please choose courses for the May 2026 term keeping all these points in mind.
 ## 1. Machine Learning foundations (DS Diploma)
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
 https://docs.google.com/document/d/e/2PACX-1vT5PBOz4OH663W0IJPVGVjG_nfmYZGfFI7W1j-6wTLcex13O_7BZmf6a96Q6liO0W-mLZB5hOGZ… 14/40
 OPPE1 will not be scheduled for students who fail to complete the OPPE SCT exam.
 <u>Repeat_OPPE Criteria</u> (w.e.f from Jan 2026 term) [updated on Jan 05, 2026]
 # BS-DS_ May 2026 Grading document (Student)
 Updated automatically every 5 minutes
+## 4. Business Data Management (DS Diploma)
+**Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
 **Above to be attended in person at designated centres.**
 (Though the W11/W12 assignment score is not included in GAA, W11/W12 contents will be included for the final exam. Hence, please practice and submit W11/W12 assignment).
+## 5. Business Analytics (Diploma in DS)
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
   </tbody>
 </table>
+## 6. Tools in Data Science (Diploma in DS)
 https://docs.google.com/document/d/e/2PACX-1vT5PBOz4OH663W0IJPVGVjG_nfmYZGfFI7W1j-6wTLcex13O_7BZmf6a96Q6liO0W-mLZB5hOGZ... 16/40
 # BS-DS_ May 2026 Grading document (Student)
 Updated automatically every 5 minutes
+### Academic policies
 1. The Tools in Data Science portal has all the course content: Graded Assignments, Projects and ROE links. The Seek Portal is not used for submissions.
 <mark>**Final course score T = 0.2 GAA + 0.2 ROE + 0.2 P1 + 0.2 P2 + 0.2 F**</mark>
+## 7. Programming Data structures and algorithms using Python (PDSA) - Diploma in Programming
 **Weekly assignments:** Mix of autograded assignment and Programming assignments
 F = score in final exam
 **T = 0.05GAA + 0.2OP + 0.45F + max (0.2max(Qz1, Qz2), (0.10Qz1+0.20Qz2 ))**
+## 8. Database management system (DBMS) -
+### Diploma in Programming
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
   </tbody>
 </table>
+## 9. Application development - 1 (Diploma in programming)
 **Quiz 1**: July 19 2026
 **Quiz 2**: August 16 2026
 **Final course score T = 0.05 GLA + max (0.6F + 0.25max(Qz1, Qz2), 0.4F + 0.25Qz1 + 0.3Qz2)**
+## 10. Programming concepts using Java (Diploma in programming)
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
 Will be conducted in the course VM - Each BPT has 4 Questions
+### Repeat_OPPE Criteria
 **(w.e.f from Jan 2026 term) [updated on Jan 05, 2026]**
 **Students who were OPPE eligible and received an I_OP in the Jan 2026 term, and have registered as Repeat OPPE for the May 2026 term, will continue to be considered eligible for OPPE in May 2026. However, students who were marked OPPE Ineligible with I_OP or I_Both and have registered for the respective course as Repeat OPPE or Repeat OPPE & End Term must regain OPPE eligibility in the May 2026 term by submitting the required assignments and OPPE SCT. If OPPE eligibility is not attained, the OPPE will not be scheduled and a U grade will be awarded for the May 2026 term.**
 **SCT for OPPE and exam day rules:**
 https://docs.google.com/document/d/e/2PACX-1vS4Hhh4MsKD2WL8_D26Vw2WJKw0CBtPihZyKrnEM_kefRXm_O75GqTcJA6IR0X_xCiVL5gUi5y6_bjw/pub
+### Eligibility to attend the end term exam:
 Average of the best 5 out of the first 7 weekly assessments (objective and programming) scores >= 40/100 AND OPPE should be eligible
+### Eligibility to get the course grade:
 Attending the end semester exam AND programming exam (OPPE) score >= 40/100
+### There will be ONE OPPE based on weeks 1-9.
 Students have to mandatorily attend the OPPE on the first date.
 * If you fail in this, you get a chance to reappear the next weekend. So you get 2 chances to attempt the exam.
 ## Project Courses:
+Project courses are now for 2 credits each - BDM, MLP, App Dev 1, App Dev 2, DL-GenAI project.
 The Project courses are not part of CCC. The CCC is only for theory courses.
   </tbody>
 </table>
+### MAD1, MAD2 PROJECTS:
 <table>
   <thead>
 https://docs.google.com/document/d/e/2PACX-1vT5PBOz4OH663W0IJPVGVjG_nfmYZGfFI7W1j-6wTLcex13O_7BZmf6a96Q6liO0W-mLZB5hOGZ-A9dMIXVrfe069kqUkplPMRAiZBBIMJcp9XTCR/pub
+### MAD I Project
 **Project Document:**
 https://docs.google.com/document/d/e/2PACX-1vR9pOLPvPzsxqhtrmtLi8GAV2j2JxpisWkIHCjEb8WuYrgm4ZPq8S_Sor-MygixO4hvGcPvO6Ei_W/pub
 **Project statement - Trekking Management Application:**
 https://docs.google.com/document/d/e/2PACX-1vQvqzWz2tFt96B8VApnHqWqIP3LtPDbnXwApYr8VOffLCm_Zh2JuTa51z7d1CNJbrZKC0oWPredYcV/pub
+### MAD II Project
 **Project document:**
 https://docs.google.com/document/d/e/2PACX-1vTKQBIsllAp4VUq_rST3_rbpMSoADyfb3agZ1E-jrGQxMeN2IBNOXu3GS0mchzRWaxWa204q52tbFYA/pub
 **BDM Project: BDM PROJECT SUBMISSION TIMELINE**
+### Deep learning and Generative AI Project
 https://docs.google.com/document/d/e/2PACX-1vTTorjQe5LH-TP_c0x-iDjdm-zX30uvJ3jv_QKtDvGsVoQydTkCWsfKwTUtCQv5syWsnSxmN4W3KFzE/pub_
+### Rules regarding project fees:
 The fee paid for each of the 4 projects is valid for 2 terms. Please read this carefully so that you register properly.
 **Students are requested to complete the projects at least two terms before their diploma completion term i.e DO NOT REGISTER OR HAVE PROJECTS ALONE LEFT IN YOUR LAST TERM OF DIPLOMA.**
+### Very important:
 1. <u>Viva policy for MLP, DL&GEN AI</u>, <u>Viva Policy for Appdev</u>
 4. Please do not share your assignments with others before the deadlines. If similarities are found between submissions, all will be penalised irrespective of who did it first and who shared it with whom.
+# 1. Software Testing
 **Quiz 1**: July 19 2026 **Quiz 2**: August 16 2026 **End term**: September 13 2026
 F - score in End Term exam
 **T = 0.1GAA + 0.4F + 0.25Qz1 + 0.25Qz2**
+# 2. Software Engineering
 **Quiz 1**: No Quiz **Quiz 2**: August 16 2026 **End term**: September 13 2026
 Updated automatically every 5 minutes
+# 3. Deep Learning
 **Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
 **T = 0.05GAA + 0.25Qz1 + 0.25Qz2 + 0.45F**
+# 4. AI: Search Methods for Problem Solving
 **Academic policies**
 The dates will be announced in the forum. This assignment will be evaluated offline.
+# 5. Strategies for Professional Growth
 **Academic policies**
 More details about the Group Project will be given in the course.
 Bonus marks for additional activities may be awarded at the discretion of faculty or instructor, provided the student passes the course.
+# 6. Programming in C
 **Quiz 1: July 19 2026** **Quiz 2: No Quiz 2** **End term: September 13 2026**
 **Above to be attended in person at designated centres**
 Updated automatically every 5 minutes
+# 9. Deep Learning Practice
+**Quiz 1: July 19 2026** **Quiz 2: August 16 2026** **End term: September 13 2026**
 <span style="color: red">Above to be attended in person at designated centres.</span>
 Quiz1 - based on content taught by Prof Mitesh (Based on weeks 1-4)
 F = score in final exam, Syllabus: Week 1-12
 **T = 0.075 GAA + 0.025 GRPA + 0.25Qz1 + 0.25Qz2 + 0.4F**
+# 13.Market Research
 **Quiz 1:** July 19 2026 **Quiz 2:** August 16 2026 **End term:** September 13 2026
 **Case release date:** YTD
 **Case presentation:** Depending on numbers, if needed can extend one more day for presentation.
+# 14.Managerial Economics
 **Quiz 1:** July 19 2026 **Quiz 2:** August 16 2026 **End term:** September 13 2026
 **T = 0.1GAA + 0.4F + 0.25Qz1 + 0.25Qz2**
+# 15. MLOps (Machine Learning Operations)
 **Quiz 1:** NA **Quiz 2:** NA **End term:** September 13 2026
 F - score in End Term exam
 T = 0.2 GAA + 0.3F + 0.25Qz1 + 0.25Qz2
+# 20. Computer Systems Design
 **Quiz 1**: July 19 2026 **Quiz 2**: August 16 2026 **End term**: September 13 2026
 T = 0.1GAA + 0.4F + 0.2Qz1 + 0.25Qz2 + 0.05 circuit verse assignment
+# 21. Game Theory and Strategy
 **Quiz 1**: July 19 2026 **Quiz 2**: August 16 2026 **End term**: September 13 2026
 T = 0.1GAA + 0.4F + 0.25Qz1 + 0.25Qz2
+# 22. Algorithms for Data Science (ADS)
 **Quiz 1**: NA **Quiz 2**: August 16 2026 **End term**: September 13 2026

eval/eval_prompts.json CHANGED Viewed

	@@ -0,0 +1,37 @@

+[
+  {
+    "id": "fee_001",
+    "question": "What is the foundation level fee for students joining from Jan 2026 onwards?",
+    "expected_keywords": ["48,000", "48000"],
+    "source": "student-handbook.md",
+    "category": "fees"
+  },
+  {
+    "id": "fee_002",
+    "question": "What is the total fee for Foundation + Two Diplomas under the revised fee structure?",
+    "expected_keywords": ["210,000", "210000"],
+    "source": "student-handbook.md",
+    "category": "fees"
+  },
+  {
+    "id": "fee_003",
+    "question": "What is the total fee for the BS Degree under the revised fee structure?",
+    "expected_keywords": ["386,000", "450,000"],
+    "source": "student-handbook.md",
+    "category": "fees"
+  },
+  {
+    "id": "fee_004",
+    "question": "What is the foundation level fee for students who joined the program before Sep 2025?",
+    "expected_keywords": ["32,000", "32000"],
+    "source": "student-handbook.md",
+    "category": "fees"
+  },
+  {
+    "id": "fee_005",
+    "question": "What fee waiver does an SC/ST student with family income above 5 LPA get?",
+    "expected_keywords": ["50%"],
+    "source": "student-handbook.md",
+    "category": "fees"
+  }
+]

eval/evaluate.py CHANGED Viewed

	@@ -0,0 +1,423 @@

+"""
+evaluate.py — Offline evaluation using DeepEval
+for the IITM BS Degree RAG System
+Judge LLM : Groq (llama-3.1-8b-instant) via DeepEvalBaseLLM wrapper
+Metrics    :
+  - FaithfulnessMetric        — are answer claims grounded in retrieved chunks?
+  - AnswerRelevancyMetric     — is the answer relevant to the question?
+  - ContextualPrecisionMetric — are the top-ranked chunks the most useful ones?
+Plus a fast keyword-hit check (no LLM needed) as a CI gate.
+Usage:
+  python eval/evaluate.py                          # full DeepEval run
+  python eval/evaluate.py --category fees          # one category only
+  python eval/evaluate.py --threshold 0.75         # custom pass threshold
+  python eval/evaluate.py --no-deepeval            # keyword-only, fast CI mode
+  python eval/evaluate.py --save-report            # write results to eval/report.json
+"""
+import os
+import sys
+import json
+import argparse
+import time
+import re
+from datetime import datetime
+from pathlib import Path
+from dotenv import load_dotenv
+from groq import Groq
+from groq import RateLimitError
+from pydantic import BaseModel
+from deepeval.models import DeepEvalBaseLLM
+from deepeval.metrics import (
+    AnswerRelevancyMetric,
+    FaithfulnessMetric,
+    ContextualPrecisionMetric,
+)
+from deepeval.test_case import LLMTestCase
+# ── Path setup ─────────────────────────────────────────────────────────────────
+ROOT = Path(__file__).parent.parent
+sys.path.insert(0, str(ROOT / "src"))
+from retrieve import Retriever  # noqa: E402
+load_dotenv()
+API_KEY = os.getenv("GROQ_API_KEY")
+if not API_KEY:
+    print("ERROR: GROQ_API_KEY not found in environment.")
+    sys.exit(1)
+EVAL_DATA_PATH    = Path(__file__).parent / "eval_prompts.json"
+REPORT_PATH       = Path(__file__).parent / "report.json"
+DEFAULT_THRESHOLD = 0.5
+JUDGE_MODEL       = "llama-3.1-8b-instant"
+GEN_MODEL         = "llama-3.1-8b-instant"
+# Retry / throttle settings
+MAX_RETRIES   = 6
+BACKOFF_BASE  = 2    # seconds — used only if retry delay isn't parseable
+BETWEEN_CALLS = 3    # polite gap after every successful Groq call
+# ── Retry helper ───────────────────────────────────────────────────────────────
+def _parse_retry_delay(error: RateLimitError) -> float | None:
+    """
+    Extract the suggested wait time from a Groq 429 error message.
+    Groq says things like:
+      'Please try again in 760ms'
+      'Please try again in 1.2s'
+    We parse that and add a small buffer.
+    """
+    msg = str(error)
+    # milliseconds: "760ms"
+    ms_match = re.search(r'try again in (\d+(?:\.\d+)?)ms', msg, re.I)
+    if ms_match:
+        return float(ms_match.group(1)) / 1000.0 + 0.5
+    # seconds: "1.2s"
+    s_match = re.search(r'try again in (\d+(?:\.\d+)?)s', msg, re.I)
+    if s_match:
+        return float(s_match.group(1)) + 0.5
+    return None
+def groq_call_with_retry(fn, *args, **kwargs):
+    """
+    Call any Groq SDK function with automatic retry on 429 rate limits.
+    Reads the suggested delay from the error response when available,
+    otherwise falls back to exponential backoff.
+    Usage:
+        response = groq_call_with_retry(
+            client.chat.completions.create,
+            model=..., messages=..., ...
+        )
+    """
+    for attempt in range(1, MAX_RETRIES + 1):
+        try:
+            result = fn(*args, **kwargs)
+            time.sleep(BETWEEN_CALLS)
+            return result
+        except RateLimitError as e:
+            if attempt == MAX_RETRIES:
+                raise
+            suggested = _parse_retry_delay(e)
+            wait = suggested if suggested else (BACKOFF_BASE ** attempt)
+            print(f"\n    ⏳ 429 rate-limited — waiting {wait:.2f}s (attempt {attempt}/{MAX_RETRIES})...")
+            time.sleep(wait)
+# ── Groq wrapper for DeepEval ──────────────────────────────────────────────────
+class GroqJudge(DeepEvalBaseLLM):
+    """
+    Wraps the Groq SDK so DeepEval can use it as its judge LLM.
+    DeepEval calls generate() with either:
+      - just a prompt string       → return a plain string
+      - a prompt + Pydantic schema → return a parsed schema instance
+    The schema path is used for metric scoring (DeepEval asks the judge
+    to fill in a structured verdict JSON). We force json_object mode so
+    Groq always returns valid JSON for those calls.
+    """
+    def __init__(self, api_key: str, model_name: str = JUDGE_MODEL):
+        self.api_key    = api_key
+        self.model_name = model_name
+        self._client    = Groq(api_key=api_key)
+    def load_model(self):
+        return self._client
+    def generate(self, prompt: str, schema: BaseModel = None):
+        client = self.load_model()
+        kwargs = dict(
+            model=self.model_name,
+            messages=[{"role": "user", "content": prompt}],
+            temperature=0.0,
+            max_tokens=1024,
+        )
+        if schema is not None:
+            kwargs["response_format"] = {"type": "json_object"}
+        response = groq_call_with_retry(client.chat.completions.create, **kwargs)
+        raw = response.choices[0].message.content.strip()
+        if schema is not None:
+            for candidate in [raw] + raw.split("```"):
+                candidate = candidate.lstrip("json").strip()
+                try:
+                    return schema(**json.loads(candidate))
+                except Exception:
+                    continue
+            raise ValueError(f"GroqJudge: could not parse schema: {raw[:200]}")
+        return raw
+    async def a_generate(self, prompt: str, schema: BaseModel = None):
+        return self.generate(prompt, schema)
+    def get_model_name(self) -> str:
+        return f"Groq/{self.model_name}"
+# ── Answer generator ───────────────────────────────────────────────────────────
+def generate_answer(question: str, chunks: list[dict], client: Groq) -> str:
+    """Generate a grounded answer from retrieved chunks using Groq."""
+    context_parts = [
+        f"[{c['source']} — Section: {c['page']}]\n{c['text']}"
+        for c in chunks
+    ]
+    prompt = (
+        "You are an official academic advisor AI for the IITM BS Degree Programme.\n"
+        "Answer the student question ONLY from the provided context. Be concise and factual.\n"
+        'If the answer is not in the context, say "I don\'t have that information."\n\n'
+        "CONTEXT:\n"
+        + "\n---\n".join(context_parts)
+        + f"\n\nSTUDENT QUESTION: {question}\n"
+    )
+    response = groq_call_with_retry(
+        client.chat.completions.create,
+        model=GEN_MODEL,
+        messages=[
+            {"role": "system", "content": "Answer only from the provided context."},
+            {"role": "user",   "content": prompt},
+        ],
+        temperature=0.0,
+        max_tokens=200,
+    )
+    return response.choices[0].message.content.strip()
+# ── Keyword hit ────────────────────────────────────────────────────────────────
+def keyword_hit(answer: str, expected_keywords: list[str]) -> bool:
+    """Check if any expected keyword appears in the answer (case-insensitive)."""
+    a = answer.lower()
+    return any(kw.lower() in a for kw in expected_keywords)
+# ── Main evaluation ────────────────────────────────────────────────────────────
+def run_evaluation(
+    category: str      = None,
+    use_deepeval: bool = True,
+    threshold: float   = DEFAULT_THRESHOLD,
+    save_report: bool  = False,
+):
+    print("\n" + "=" * 65)
+    print("  IITM BS RAG — DEEPEVAL EVALUATION")
+    print("=" * 65)
+    if not EVAL_DATA_PATH.exists():
+        print(f"ERROR: eval_prompts.json not found at {EVAL_DATA_PATH}")
+        sys.exit(1)
+    with open(EVAL_DATA_PATH) as f:
+        eval_data = json.load(f)
+    if category:
+        eval_data = [q for q in eval_data if q.get("category") == category]
+        print(f"  Category filter : '{category}' → {len(eval_data)} questions")
+    else:
+        print(f"  Total questions : {len(eval_data)}")
+    print(f"  Judge model     : {JUDGE_MODEL}")
+    print(f"  DeepEval        : {'enabled' if use_deepeval else 'disabled (keyword-only)'}")
+    print(f"  Threshold       : {threshold}\n")
+    print("Initializing retriever...")
+    retriever   = Retriever()
+    groq_client = Groq(api_key=API_KEY)
+    # ── Phase 1: retrieve + generate ──────────────────────────────────────────
+    print("\nPhase 1 — Retrieve & Generate\n" + "-" * 40)
+    test_cases  : list[LLMTestCase] = []
+    kw_hits     : list[bool]        = []
+    item_map    : list[dict]        = []
+    chunk_scores: list[list[float]] = []
+    for i, item in enumerate(eval_data, 1):
+        question = item["question"]
+        print(f"  [{i:02d}/{len(eval_data)}] {question[:70]}")
+        chunks   = retriever.retrieve(question, top_n=2)
+        answer   = generate_answer(question, chunks, groq_client)
+        contexts = [c["text"][:1000] for c in chunks]
+        scores   = [round(c["rerank_score"], 3) for c in chunks]
+        kw = keyword_hit(answer, item["expected_keywords"])
+        kw_hits.append(kw)
+        chunk_scores.append(scores)
+        print(f"         rerank scores : {scores}")
+        print(f"         keyword       : {'✅' if kw else '❌'}  {answer[:80]}{'…' if len(answer) > 80 else ''}\n")
+        test_cases.append(LLMTestCase(
+            input=question,
+            actual_output=answer,
+            retrieval_context=contexts,
+            expected_output=" | ".join(item["expected_keywords"]),
+        ))
+        item_map.append(item)
+    kw_rate = sum(kw_hits) / len(kw_hits)
+    # ── Phase 2: DeepEval scoring ──────────────────────────────────────────────
+    results_by_metric   : dict[str, list[float]] = {}
+    per_question_scores : list[dict]             = []
+    if use_deepeval:
+        print("\nPhase 2 — DeepEval Metrics\n" + "-" * 40)
+        print(f"  Judge model       : {JUDGE_MODEL}")
+        print(f"  Gap between calls : {BETWEEN_CALLS}s  |  Max retries on 429 : {MAX_RETRIES}\n")
+        judge = GroqJudge(api_key=API_KEY)
+        metrics = [
+            FaithfulnessMetric(
+                threshold=threshold, model=judge,
+                include_reason=True, async_mode=False,
+            ),
+            AnswerRelevancyMetric(
+                threshold=threshold, model=judge,
+                include_reason=True, async_mode=False,
+            ),
+            ContextualPrecisionMetric(
+                threshold=threshold, model=judge,
+                include_reason=True, async_mode=False,
+            ),
+        ]
+        for i, tc in enumerate(test_cases, 1):
+            print(f"  [{i:02d}/{len(test_cases)}] {tc.input[:65]}")
+            q_scores = {"question": tc.input, "keyword_hit": kw_hits[i - 1]}
+            for m in metrics:
+                mname = type(m).__name__
+                try:
+                    m.measure(tc)
+                    score  = m.score if m.score is not None else 0.0
+                    reason = (m.reason or "—")[:110]
+                    icon   = "✅" if score >= threshold else "❌"
+                    print(f"    {mname:<32} {icon} {score:.3f}  {reason}")
+                except RateLimitError as e:
+                    score = 0.0
+                    print(f"    {mname:<32} ⚠️  rate limit exhausted after {MAX_RETRIES} retries: {e}")
+                except Exception as e:
+                    score = 0.0
+                    print(f"    {mname:<32} ⚠️  error: {e}")
+                results_by_metric.setdefault(mname, []).append(score)
+                q_scores[mname] = round(score, 4)
+            per_question_scores.append(q_scores)
+            print()
+    # ── Phase 3: Aggregate summary ─────────────────────────────────────────────
+    print("=" * 65)
+    print("  AGGREGATE RESULTS")
+    print("=" * 65)
+    print(f"  Questions evaluated        : {len(eval_data)}")
+    print(f"  Keyword Hit Rate           : {kw_rate:.1%}  {'✅' if kw_rate >= threshold else '❌'}")
+    def _avg(lst: list[float]) -> float:
+        valid = [s for s in lst if s is not None]
+        return sum(valid) / len(valid) if valid else 0.0
+    avg_faith = avg_rel = avg_prec = None
+    if use_deepeval and results_by_metric:
+        avg_faith = _avg(results_by_metric.get("FaithfulnessMetric",       []))
+        avg_rel   = _avg(results_by_metric.get("AnswerRelevancyMetric",    []))
+        avg_prec  = _avg(results_by_metric.get("ContextualPrecisionMetric",[]))
+        print(f"  Faithfulness (avg)         : {avg_faith:.3f}  {'✅' if avg_faith >= threshold else '❌'}")
+        print(f"  Answer Relevancy (avg)     : {avg_rel:.3f}  {'✅' if avg_rel   >= threshold else '❌'}")
+        print(f"  Contextual Precision (avg) : {avg_prec:.3f}  {'✅' if avg_prec  >= threshold else '❌'}")
+        print("\n  Per-question breakdown:")
+        header = f"  {'ID':<28} {'kw':>3}  {'Faith':>6}  {'Rel':>6}  {'Prec':>6}"
+        print(header)
+        print("  " + "-" * (len(header) - 2))
+        for item, kw, pq in zip(item_map, kw_hits, per_question_scores):
+            f = pq.get("FaithfulnessMetric",       0)
+            r = pq.get("AnswerRelevancyMetric",     0)
+            p = pq.get("ContextualPrecisionMetric", 0)
+            print(
+                f"  {item['id']:<28} {'✅' if kw else '❌':>3} "
+                f" {f:>6.3f}  {r:>6.3f}  {p:>6.3f}"
+            )
+    # ── CI gate ────────────────────────────────────────────────────────────────
+    if avg_faith is not None:
+        gate_metric = min(kw_rate, avg_faith)
+        gate_label  = f"min(keyword={kw_rate:.1%}, faithfulness={avg_faith:.3f})"
+    else:
+        gate_metric = kw_rate
+        gate_label  = f"keyword hit rate = {kw_rate:.1%}"
+    print(f"\n  Gate  : {gate_label}")
+    print(f"  Score : {gate_metric:.3f}  (threshold: {threshold:.2f})")
+    # ── Optional JSON report ───────────────────────────────────────────────────
+    if save_report:
+        report = {
+            "timestamp":        datetime.now().isoformat(),
+            "judge_model":      JUDGE_MODEL,
+            "threshold":        threshold,
+            "category":         category,
+            "num_questions":    len(eval_data),
+            "keyword_hit_rate": round(kw_rate, 4),
+            "averages": {
+                "faithfulness":         round(avg_faith, 4) if avg_faith is not None else None,
+                "answer_relevancy":     round(avg_rel,   4) if avg_rel   is not None else None,
+                "contextual_precision": round(avg_prec,  4) if avg_prec  is not None else None,
+            },
+            "gate_score":  round(gate_metric, 4),
+            "passed":      gate_metric >= threshold,
+            "per_question": per_question_scores,
+        }
+        REPORT_PATH.write_text(json.dumps(report, indent=2))
+        print(f"\n  📄 Report saved → {REPORT_PATH}")
+    # ── Exit with CI-friendly code ─────────────────────────────────────────────
+    if gate_metric >= threshold:
+        print(f"\n  ✅ PASSED — RAG quality is above threshold ({threshold:.0%})\n")
+        sys.exit(0)
+    else:
+        print(f"\n  ❌ FAILED — Quality dropped below threshold ({threshold:.0%})")
+        print("     Check ❌ rows above. Re-run ingest.py if documents changed.\n")
+        sys.exit(1)
+# ── CLI ────────────────────────────────────────────────────────────────────────
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Evaluate IITM BS RAG with DeepEval")
+    parser.add_argument("--category",    type=str,   default=None,
+                        help="Filter eval_prompts.json by category field")
+    parser.add_argument("--threshold",   type=float, default=DEFAULT_THRESHOLD,
+                        help="Pass/fail threshold for all metrics (default: 0.5)")
+    parser.add_argument("--no-deepeval", action="store_true",
+                        help="Skip DeepEval metrics; run keyword check only")
+    parser.add_argument("--save-report", action="store_true",
+                        help="Write results to eval/report.json")
+    args = parser.parse_args()
+    run_evaluation(
+        category=args.category,
+        use_deepeval=not args.no_deepeval,
+        threshold=args.threshold,
+        save_report=args.save_report,
+    )

pyproject.toml CHANGED Viewed

@@ -12,7 +12,6 @@ dependencies = [
     "faiss-cpu>=1.14.2",
     "google-genai>=2.8.0",
     "gradio>=6.17.3",
-    "groq>=1.4.0",
     "ipykernel>=7.3.0",
     "langchain-chroma>=1.1.0",
     "langchain>=1.3.7",
@@ -31,4 +30,5 @@ dependencies = [
     "sentence-transformers>=5.5.1",
     "langchain-huggingface>=1.2.2",
     "langchain-classic>=1.0.8",
 ]

     "faiss-cpu>=1.14.2",
     "google-genai>=2.8.0",
     "gradio>=6.17.3",
     "ipykernel>=7.3.0",
     "langchain-chroma>=1.1.0",
     "langchain>=1.3.7",
     "sentence-transformers>=5.5.1",
     "langchain-huggingface>=1.2.2",
     "langchain-classic>=1.0.8",
+    "langchain-groq>=1.1.3",
 ]

requirements.txt CHANGED Viewed

@@ -7,7 +7,6 @@ sentence-transformers
 faiss-cpu
 chromadb
 rank-bm25
-groq
 pyyaml
 datasets
 ragas
@@ -23,4 +22,5 @@ python-dotenv
 deepeval
 langchain_chroma
 langchain_huggingface
-langchain-classic

 faiss-cpu
 chromadb
 rank-bm25
 pyyaml
 datasets
 ragas
 deepeval
 langchain_chroma
 langchain_huggingface
+langchain-classic
+langchain_groq

src/generate.py CHANGED Viewed

	@@ -0,0 +1,124 @@

+import os
+import sys
+from dotenv import load_dotenv
+from langchain_groq import ChatGroq
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.runnables import RunnablePassthrough, RunnableLambda
+from retrieve import Retriever
+load_dotenv()
+if not os.getenv("GROQ_API_KEY"):
+    print("ERROR: GROQ_API_KEY environment variable not found.")
+    sys.exit(1)
+SYSTEM_PROMPT = """
+You are the official academic advisor AI for the IITM BS Degree Programme. Answer student questions accurately, professionally, and concisely.
+Follow these strict rules:
+1. **Strict Grounding:** Answer ONLY using the provided Context Documents. If the answer is not explicitly stated, state that you do not have the information. Do not guess or hallucinate.
+2. **Data Extraction:** Carefully align rows and columns when extracting tabular data.
+3. **Citations:** Always cite the source document name when stating factual rules or fees (e.g., "According to student-handbook.md...").
+4. **Definitions First:** Define any acronyms, fee names, or policy terms before giving specific details. Never omit context that changes how a number or fact should be interpreted.
+5. **Redundancy Handling:** If the context contains numerically-related figures across different sections that might represent the same fee, explicitly surface this potential overlap rather than stating them as definitively separate.
+6. **Course Focus:** Detail only the single most relevant course. If courses with similar names exist, mention them in one brief line at the end as alternatives; do not provide their grading details unless explicitly requested.
+"""
+HUMAN_PROMPT = """CONTEXT DOCUMENTS:
+{context}
+STUDENT QUESTION:
+{question}"""
+prompt = ChatPromptTemplate.from_messages([
+    ("system", SYSTEM_PROMPT),
+    ("human",  HUMAN_PROMPT),
+])
+def format_context(chunks: list[dict]) -> str:
+    """Convert retrieved chunk dicts into a formatted context string."""
+    parts = []
+    for i, c in enumerate(chunks, 1):
+        tag = f"[{c['source']} — {c['page']}]"
+        parts.append(f"--- Chunk {i} {tag} ---\n{c['text']}")
+    return "\n\n".join(parts)
+class Generator:
+    def __init__(self):
+        print("Initializing retrieval system...")
+        self.retriever = Retriever()
+        llm = ChatGroq(
+            model="llama-3.3-70b-versatile", #llama-3.3-70b-versatile
+            temperature=0.1,
+            streaming=True,
+        )
+        # LCEL chain:
+        # 1. Retrieve chunks for the question
+        # 2. Format them into a context string
+        # 3. Pass context + question into the prompt
+        # 4. Send to LLM and parse output
+        self.chain = (
+            {
+                "context":  RunnableLambda(lambda q: format_context(self.retriever.retrieve(q))),
+                "question": RunnablePassthrough(),
+            }
+            | prompt
+            | llm
+            | StrOutputParser()
+        )
+        # Keep a reference so answer() can show sources
+        self._last_chunks: list[dict] = []
+    def answer(self, query: str, top_n: int = 4) -> str:
+        """Run the full RAG chain and stream the answer to stdout."""
+        print(f"\n🔍 Retrieving context for: '{query}'...")
+        self._last_chunks = self.retriever.retrieve(query, top_n=top_n)
+        if not self._last_chunks:
+            msg = "I couldn't find any official documentation related to that question."
+            print(msg)
+            return msg
+        print("🧠 Generating answer...\n")
+        full_response = ""
+        for token in self.chain.stream(query):
+            print(token, end="", flush=True)
+            full_response += token
+        print("\n\n" + "-" * 60)
+        print("SOURCES USED:")
+        seen = set()
+        for c in self._last_chunks:
+            label = f"- {c['source']}  (Section: {c['page'][:60]})"
+            if label not in seen:
+                print(label)
+                seen.add(label)
+        return full_response
+if __name__ == "__main__":
+    agent = Generator()
+    print("\n🎓 IITM BS Degree RAG Assistant Online!")
+    print("Type 'exit' or 'quit' to close.\n")
+    while True:
+        try:
+            user_input = input("\nStudent Question: ").strip()
+            if user_input.lower() in ("exit", "quit"):
+                print("Shutting down...")
+                break
+            if not user_input:
+                continue
+            agent.answer(user_input)
+        except KeyboardInterrupt:
+            print("\nShutting down...")
+            break

src/ingest.py CHANGED Viewed

@@ -10,8 +10,8 @@ DB_DIR = "db"
 COLLECTION_NAME = "handbook_docs"
 EMBED_MODEL = "BAAI/bge-small-en-v1.5"
-CHUNK_SIZE = 1500
-CHUNK_OVERLAP = 150
 HEADERS_TO_SPLIT = [
     ("#",   "h1"),
@@ -76,12 +76,17 @@ def clean_markdown(raw_text: str) -> str:
         # Google Docs URLs
         r'https://docs\.google\.com/\S+',
-        # Bare page-number artifacts: "  38/66  " or "10/66" on their own
-        r'(?<!\d)\d{1,3}/\d{2,3}(?!\d)(?=\s|$)',
     ]
     text = raw_text
     for pattern in noise_patterns:
-        text = re.sub(pattern, '', text, flags=re.IGNORECASE)
     text = html_tables_to_markdown(text)
     # Collapse runs of whitespace left behind by removals
@@ -108,6 +113,26 @@ def load_and_split(md_path: Path) -> list[Document]:
     )
     final_docs = char_splitter.split_documents(header_docs)
     for doc in final_docs:
         doc.metadata["source"] = md_path.name

 COLLECTION_NAME = "handbook_docs"
 EMBED_MODEL = "BAAI/bge-small-en-v1.5"
+CHUNK_SIZE = 2000
+CHUNK_OVERLAP = 200
 HEADERS_TO_SPLIT = [
     ("#",   "h1"),
         # Google Docs URLs
         r'https://docs\.google\.com/\S+',
+        # Bare page-number artifacts: "  38/66  " or "10/66" on their own line
+        # (excludes /100 since that's always a score threshold, e.g. "40/100")
+        r'(?<![\d>=])\b\d{1,3}/(?!100\b)\d{2,3}\b(?!\d)\s*(?=\n|$)',
+        r'^#{1,3}\s*BS-DS_\s*May\s*2026\s*Grading\s*document\s*\(Student\)\s*$',
+        r'^BS-DS_\s*May\s*2026\s*Grading\s*document\s*\(Student\)\s*$',
+        r'^Updated\s+automatically\s+every\s+\d+\s+minutes\s*$',
     ]
     text = raw_text
     for pattern in noise_patterns:
+        text = re.sub(pattern, '', text, flags=re.IGNORECASE | re.MULTILINE)
     text = html_tables_to_markdown(text)
     # Collapse runs of whitespace left behind by removals
     )
     final_docs = char_splitter.split_documents(header_docs)
+    # Prepend the section's header path to every chunk's content.
+    # Without this, a chunk containing e.g. a course's T formula but not
+    # the course name (because the header line landed in an earlier chunk)
+    # won't match queries that mention the course name — for either the
+    # vector embedding or BM25.
+    for doc in final_docs:
+        header_parts = [
+            doc.metadata.get("h1", ""),
+            doc.metadata.get("h2", ""),
+            doc.metadata.get("h3", ""),
+        ]
+        header_path = " > ".join(p for p in header_parts if p)
+        if header_path:
+            doc.page_content = (
+    f"[Course: {header_path}]\n"
+    f"{doc.page_content}\n"
+    f"[/Course: {header_path}]"
+)
     for doc in final_docs:
         doc.metadata["source"] = md_path.name

src/retrieve.py CHANGED Viewed

@@ -28,7 +28,8 @@ _NOISE_PATTERNS = [
     re.compile(r'Report\s+abuse\s+Learn\s+more[^\n]*', re.I),
     re.compile(r'Updated\s+automatically\s+every\s+\d+\s+minutes[^\n]*', re.I),
     re.compile(r'https://docs\.google\.com/\S+'),
-    re.compile(r'(?<!\d)\d{1,3}/\d{2,3}(?!\d)(?=\s|$)'),
 ]
 def _scrub_noise(text: str) -> str:

     re.compile(r'Report\s+abuse\s+Learn\s+more[^\n]*', re.I),
     re.compile(r'Updated\s+automatically\s+every\s+\d+\s+minutes[^\n]*', re.I),
     re.compile(r'https://docs\.google\.com/\S+'),
+    re.compile(r'(?<![\d>=])\b\d{1,3}/(?!100\b)\d{2,3}\b(?!\d)\s*(?=\n|$)'),
+    re.compile(r'^#{1,3}\s*BS-DS_\s*May\s*2026\s*Grading\s*document\s*\(Student\)\s*$', re.I | re.M),
 ]
 def _scrub_noise(text: str) -> str:

uv.lock CHANGED Viewed

@@ -1748,7 +1748,7 @@ wheels = [
 [[package]]
 name = "groq"
-version = "1.4.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "anyio" },
@@ -1758,9 +1758,9 @@ dependencies = [
     { name = "sniffio" },
     { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/26/84/d99c4894d32ed52bf2763127804343d9323dce22beb61d42aebc7d9c5f4d/groq-1.4.0.tar.gz", hash = "sha256:09b1ed51408c6969a11ef1a4dfe44d42ec975b5f1510e5de3f3dab56e22dffc6", size = 158123 }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/3a/5b/28cfd8937be95c0814fd9458710a8a257fb8424a39e291b7bbd494476108/groq-1.4.0-py3-none-any.whl", hash = "sha256:99a3bcd57c71538f69cf11c75cdae91598983d2681b9a14008636a018c4b6d17", size = 143699 },
 ]
 [[package]]
@@ -2467,6 +2467,19 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/04/a2/af563ff45208d22abc28d0a0e44a7fb8aceaadac201afa6a11c77bfb6338/langchain_google_vertexai-3.2.4-py3-none-any.whl", hash = "sha256:65b5615e596fdabc2e149f0160fded88bebef2bbc1ea70095ff81714f7570183", size = 118884 },
 ]
 [[package]]
 name = "langchain-huggingface"
 version = "1.2.2"
@@ -4975,7 +4988,6 @@ dependencies = [
     { name = "faiss-cpu" },
     { name = "google-genai" },
     { name = "gradio" },
-    { name = "groq" },
     { name = "ipykernel" },
     { name = "langchain" },
     { name = "langchain-chroma" },
@@ -4984,6 +4996,7 @@ dependencies = [
     { name = "langchain-core" },
     { name = "langchain-google-genai" },
     { name = "langchain-google-vertexai" },
     { name = "langchain-huggingface" },
     { name = "langchain-openai" },
     { name = "pdfplumber" },
@@ -5005,7 +5018,6 @@ requires-dist = [
     { name = "faiss-cpu", specifier = ">=1.14.2" },
     { name = "google-genai", specifier = ">=2.8.0" },
     { name = "gradio", specifier = ">=6.17.3" },
-    { name = "groq", specifier = ">=1.4.0" },
     { name = "ipykernel", specifier = ">=7.3.0" },
     { name = "langchain", specifier = ">=1.3.7" },
     { name = "langchain-chroma", specifier = ">=1.1.0" },
@@ -5014,6 +5026,7 @@ requires-dist = [
     { name = "langchain-core", specifier = ">=1.4.5" },
     { name = "langchain-google-genai", specifier = ">=4.2.5" },
     { name = "langchain-google-vertexai", specifier = ">=3.2.4" },
     { name = "langchain-huggingface", specifier = ">=1.2.2" },
     { name = "langchain-openai", specifier = ">=1.3.0" },
     { name = "pdfplumber", specifier = ">=0.11.9" },

 [[package]]
 name = "groq"
+version = "0.37.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "anyio" },
     { name = "sniffio" },
     { name = "typing-extensions" },
 ]
+sdist = { url = "https://files.pythonhosted.org/packages/e9/78/18948a9056e1509c87e10ab8316a90ecce87035fbd53342dffdf97f4de00/groq-0.37.1.tar.gz", hash = "sha256:7353d6dfb60834fd7aacbb86af106e2dc2aeaff6d0edd65fb2fd0f16bd39314c", size = 145289 }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/5f/d6/645a081750e43f858b7d09dce5d8e1e76cf11e7e4bdba81252e04f78963d/groq-0.37.1-py3-none-any.whl", hash = "sha256:b49f8c8898c55eaec9f71f1342f3fcacc9560d67a08ce5f35fbfb84e8dacd3da", size = 137494 },
 ]
 [[package]]
     { url = "https://files.pythonhosted.org/packages/04/a2/af563ff45208d22abc28d0a0e44a7fb8aceaadac201afa6a11c77bfb6338/langchain_google_vertexai-3.2.4-py3-none-any.whl", hash = "sha256:65b5615e596fdabc2e149f0160fded88bebef2bbc1ea70095ff81714f7570183", size = 118884 },
 ]
+[[package]]
+name = "langchain-groq"
+version = "1.1.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "groq" },
+    { name = "langchain-core" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/6e/4d/64b5e57e9e43f009fcb1591f567f0095bcf38f0b705a2d63cec4c04e0d56/langchain_groq-1.1.3.tar.gz", hash = "sha256:890c099a55526bceafc3e696d123cb9d36464c6664a3ead34ae6e09e0d50caeb", size = 192228 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a7/5d/2f862bf5623d5c8ca6ed8a8917a7b1410e6d3595ee0bac12aff508556f76/langchain_groq-1.1.3-py3-none-any.whl", hash = "sha256:a69bb8212b7a699f407c033bf41ca526db8de68f438d51a41740a72bf6dc09bf", size = 20779 },
+]
 [[package]]
 name = "langchain-huggingface"
 version = "1.2.2"
     { name = "faiss-cpu" },
     { name = "google-genai" },
     { name = "gradio" },
     { name = "ipykernel" },
     { name = "langchain" },
     { name = "langchain-chroma" },
     { name = "langchain-core" },
     { name = "langchain-google-genai" },
     { name = "langchain-google-vertexai" },
+    { name = "langchain-groq" },
     { name = "langchain-huggingface" },
     { name = "langchain-openai" },
     { name = "pdfplumber" },
     { name = "faiss-cpu", specifier = ">=1.14.2" },
     { name = "google-genai", specifier = ">=2.8.0" },
     { name = "gradio", specifier = ">=6.17.3" },
     { name = "ipykernel", specifier = ">=7.3.0" },
     { name = "langchain", specifier = ">=1.3.7" },
     { name = "langchain-chroma", specifier = ">=1.1.0" },
     { name = "langchain-core", specifier = ">=1.4.5" },
     { name = "langchain-google-genai", specifier = ">=4.2.5" },
     { name = "langchain-google-vertexai", specifier = ">=3.2.4" },
+    { name = "langchain-groq", specifier = ">=1.1.3" },
     { name = "langchain-huggingface", specifier = ">=1.2.2" },
     { name = "langchain-openai", specifier = ">=1.3.0" },
     { name = "pdfplumber", specifier = ">=0.11.9" },