chuckfinca commited on
Commit
d558ce2
·
1 Parent(s): 4267727

refactor(data): Pivot data pipeline from PDF parsing to pre-processed JSON

Browse files

This commit removes the entire programmatic PDF extraction pipeline, which proved to be brittle and time-consuming. It refactors the application to load a manually curated and high-quality JSON file (`knowledge_base_raw.json`) as its starting point.

- Deletes the complex PDF parsing logic (`knowledge_base.py`, `manual_extractions.py`) and its configuration (`document_complexity_map.csv`).
- Removes obsolete data extraction dependencies (`pymupdf`, `pdfplumber`, `pandas`) from `pyproject.toml`, simplifying the environment.
- Updates `main.py` to load the pre-processed JSON directly, making the data ingestion process instant and 100% reliable.
- Simplifies `config.py` to remove paths to now-deleted files.

This strategic shift unblocks development on the core RAG functionality by decoupling the application's runtime from the challenges of PDF data extraction.

data/processed/knowledge_base_raw.json ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
4
+ "pdf_page": 1,
5
+ "fot_page": 43,
6
+ "title": "Tool Set C - Developing and Tracking Interventions - Introduction",
7
+ "content": "Purpose: One responsibility of the Success Team is to develop and track interventions. Team members should give thoughtful consideration to design to ensure that interventions are meeting the needs of all students, including those who are underperforming and those who are high achieving. The team must also determine the effectiveness of interventions and their impact on student achievement.\n\nHow & When to Use: Planning effective student interventions can be a challenging task for both new and established Success Teams. This set of tools provides support for teams that are creating and/or fine-tuning their student intervention systems by encouraging teams to describe, analyze, and reflect on their current practices. Furthermore, these tools provide team members with opportunities to adjust interventions to better serve students.\n\nContents:\n• Considerations for Planning Tier 2 Interventions\n• Quick Guide to Tracking Interventions\n• Intervention Evaluation Flowchart\n• Intervention Success Monthly Action Plan (IS-MAP)\n• Student Success Intervention Plan\n• Behavior, Attendance, and Grades (BAG) Report"
8
+ },
9
+ {
10
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
11
+ "pdf_page": 2,
12
+ "fot_page": 44,
13
+ "title": "Connections to Freshman Success Framework",
14
+ "content": "The Freshman Success Framework is the foundation for effective school practice on On-Track and student success. The Network for College Success has seen the greatest and most sustainable gains for freshmen when schools develop high-functioning educator professional learning communities, which we call Success Teams.\n\nThis Tool Set focuses on the below actions of a Success Team stemming from the Freshman Success Framework.\n\nSuccess Team Elements:\n• Setting Conditions: Engages in regular, calendared Success Team meetings to 1) analyze data and 2) develop, monitor, and adjust interventions\n• Implementation: Develops, implements, tracks, and evaluates Tier 2 interventions, making adjustments when appropriate. Refers students to appropriate level of intervention\n• Communication: Engages faculty in frequent communication on student progress and successful strategies\n\nTeam Lead Role:\n• Setting Conditions: With principal and Success Team, sets freshman success goals for On-Track and student connection, and develops benchmarks to monitor progress\n• Implementation: Develops action-oriented meeting agendas that consistently address freshman success goals generally and intervention development, tracking, and evaluation specifically\n• Implementation: Works with data technician to bring actionable student-level data at regular intervals\n\nPrincipal Role:\n• Implementation: Reviews and interrogates interim freshman success-related data in light of Success Team goals, and strategizes with team leadership around next steps"
15
+ },
16
+ {
17
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
18
+ "pdf_page": 3,
19
+ "fot_page": 45,
20
+ "title": "Considerations for Planning Tier 2 Interventions - Overview",
21
+ "content": "A set of guiding questions to use during the development of an intervention system. Questions are focused on looking at student data, targeting students, and intervention selection, implementation, and effectiveness."
22
+ },
23
+ {
24
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
25
+ "pdf_page": 4,
26
+ "fot_page": 46,
27
+ "title": "Considerations When Planning Tier 2 Interventions - Data, Targeting, Selection, and Implementation",
28
+ "content": "For information on the tiered systems of student support, please refer to the RTI Action Network.\n\nData Questions:\n• To what degree is attendance playing a role in student performance? To whom do you refer Tier 3 students who have serious attendance issues (inside and outside of the school) so that the Success Team can really concentrate on supporting Tier 2 students?\n• How does the grade distribution look by teacher? Are there teachers who are failing a disproportionate number of students?\n• Do your assessment policies create opportunities for students to demonstrate mastery, or do they cause students to feel overwhelmed and fall off track?\n\nTargeting Students:\n• How many students have you identified for Success Team intervention? Does this number fall in the 15 – 25% range for Tier 2 supports? Are there students who are really Tier 3 being included into Tier 2 supports?\n\nIntervention Selection:\n• What issue is the intervention addressing? (academic/social-emotional/behavioral)\n• What programs/resources already exist in the building that could possibly address the issue? How closely do these programs/resources align with the identified needs of students? For example, if tutoring is being offered already, is it designed to help students with real-time issues they face in their classes or is it specifically designed for remediation of basic skills?\n\nIntervention Implementation:\n• Who will implement the intervention?\n• Who will coordinate the intervention (logistics)?\n• Who will own the tracking of the intervention's effectiveness?\n• What does successful implementation look like?"
29
+ },
30
+ {
31
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
32
+ "pdf_page": 5,
33
+ "fot_page": 47,
34
+ "title": "Considerations When Planning Tier 2 Interventions - Tracking Effectiveness",
35
+ "content": "Tracking Effectiveness:\n• Does tracking your intervention include the following information: targeted students' names, participation (such as the number of times targeted students participate within a specified period), grade check dates, and grades in targeted courses?\n• Does your tracking tool allow you to aggregate point-in-time data in different ways so that you can accurately monitor targeted student progress?\n• What is your timeline for course correction?"
36
+ },
37
+ {
38
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
39
+ "pdf_page": 6,
40
+ "fot_page": 48,
41
+ "title": "Quick Guide to Tracking Interventions - Overview",
42
+ "content": "Guidelines for designing an intervention tracking tool."
43
+ },
44
+ {
45
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
46
+ "pdf_page": 7,
47
+ "fot_page": 49,
48
+ "title": "Quick Guide to Tracking Interventions - Features and Guidelines",
49
+ "content": "Tracking is necessary to determine the efficacy of an intervention so that adjustments can be made in a timely manner. A tracking tool is more effective when it is in a teacher-friendly format that can be disaggregated to pull data for specific subgroups. For example, if your team is using tutoring as an intervention, and the targeted student group requires tutoring in more than one core class, your tracking tool should be able to disaggregate data to ascertain intervention impact by course. Microsoft Excel and Google Sheets can support the tracking of interventions by disaggregating data and creating graphs.\n\nFeatures of Good Intervention Tracking Tools:\n• Name of the intervention and what key performance indicator it addresses (attendance, point-in-time On-Track rates, GPA, behavior metric, etc.)\n• Names of the targeted students\n ° If tracking grades, include each core course's average expressed as a percentage\n• Intervention contacts/implementation evidence\n ° Tutoring attendance\n ° Mentorship contact dates\n ° \"Office hours\" visits\n• Point-in-time progress on the key performance indicator impacted by the intervention\n ° Should include at least 2 checkpoints within a 10-week period\n ° If tracking grades, provide an average expressed as a percentage for each core course\n ° If tracking attendance, provide number of cumulative absences and/or tardies"
50
+ },
51
+ {
52
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
53
+ "pdf_page": 8,
54
+ "fot_page": 50,
55
+ "title": "Example Intervention Tracking Data: Henderson College Prep Mentorship Program",
56
+ "content": "Example: Henderson College Prep, Quarter 3 Mentorship Program\nKey Performance Indicator Addressed: Point-in-Time On-Track\nMeeting 1 Dates: 02/01 - 02/04/2016",
57
+ "table_data": [
58
+ { "Student": "A", "Mentor": "Isom", "Abs": "1", "Tardies": "1", "ENG": "75", "MATH": "83", "SCI": "56", "SOC": "80" },
59
+ { "Student": "B", "Mentor": "Shields", "Abs": "1", "Tardies": "0", "ENG": "90", "MATH": "57", "SCI": "83", "SOC": "32" },
60
+ { "Student": "C", "Mentor": "Wells", "Abs": "0", "Tardies": "3", "ENG": "48", "MATH": "67", "SCI": "77", "SOC": "93" },
61
+ { "Student": "D", "Mentor": "Wells", "Abs": "0", "Tardies": "5", "ENG": "79", "MATH": "78", "SCI": "76", "SOC": "57" },
62
+ { "Student": "E", "Mentor": "Pitcher", "Abs": "0", "Tardies": "1", "ENG": "82", "MATH": "84", "SCI": "88", "SOC": "32" },
63
+ { "Student": "F", "Mentor": "Muldrow", "Abs": "1", "Tardies": "1", "ENG": "44", "MATH": "78", "SCI": "88", "SOC": "57" },
64
+ { "Student": "G", "Mentor": "Pitcher", "Abs": "2", "Tardies": "6", "ENG": "75", "MATH": "81", "SCI": "87", "SOC": "71" },
65
+ { "Student": "H", "Mentor": "Martinez", "Abs": "1", "Tardies": "1", "ENG": "61", "MATH": "55", "SCI": "62", "SOC": "71" },
66
+ { "Student": "I", "Mentor": "Pitcher", "Abs": "1", "Tardies": "1", "ENG": "68", "MATH": "90", "SCI": "83", "SOC": "83" },
67
+ { "Student": "J", "Mentor": "Isom", "Abs": "0", "Tardies": "3", "ENG": "59", "MATH": "65", "SCI": "83", "SOC": "93" },
68
+ { "Student": "K", "Mentor": "Isom", "Abs": "0", "Tardies": "2", "ENG": "88", "MATH": "82", "SCI": "88", "SOC": "32" },
69
+ { "Student": "L", "Mentor": "Martinez", "Abs": "0", "Tardies": "1", "ENG": "88", "MATH": "92", "SCI": "51", "SOC": "85" },
70
+ { "Student": "M", "Mentor": "Shields", "Abs": "0", "Tardies": "0", "ENG": "75", "MATH": "78", "SCI": "88", "SOC": "76" },
71
+ { "Student": "N", "Mentor": "Shields", "Abs": "1", "Tardies": "0", "ENG": "83", "MATH": "83", "SCI": "88", "SOC": "57" }
72
+ ]
73
+ },
74
+ {
75
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
76
+ "pdf_page": 9,
77
+ "fot_page": 51,
78
+ "title": "Intervention Evaluation Flowchart - Overview",
79
+ "content": "A flowchart to determine if individual interventions are working for schools and to improve the use of data to successfully implement interventions."
80
+ },
81
+ {
82
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
83
+ "pdf_page": 10,
84
+ "fot_page": 52,
85
+ "title": "Intervention Evaluation Flowchart - Decision Tree",
86
+ "content": "Is our student success intervention working for our students?\n\nYES:\n• What are you doing that works for students? (What is your evidence?)\n• What are you doing that works for the adults implementing the intervention? (What is your evidence?)\n• What parts of your implementation plan can you tweak for even greater success?\n\nNO: Is there a true opportunity for recovery if students participate with fidelity?\n\nYES - Check these areas:\n\nIs it an implementation fidelity issue?\n• Are there other school programs/initiatives competing with effective implementation?\n• Are teachers/owners aware of implementation procedures?\n• Are teachers compensated when appropriate?\n• Is there sufficient and reasonable time to implement the intervention?\n• Have you implemented the intervention long enough?\n• Is the intervention publicized effectively to appropriate stakeholders?\n\nIs it a student participation issue?\n• How are students held accountable for not participating? By whom?\n• Do they see the results of their participation?\n• Are students encouraged by multiple adults to participate?\n• Does the intervention respect student time and effort?\n• Is the intervention viewed as punitive?\n\nNO - Check these areas:\n\nIs the issue a mismatch between the intervention and student needs?\n• Does the intervention provide supports for students struggling academically?\n• Is the intervention frequent enough to be effective?\n• How was the intervention selected? Based on identified student need? Adult preference? Feasibility?\n\nIs the issue one that cannot be addressed by a Success Team intervention?\nExamples:\n• Teacher philosophy\n• Grading policies\n• Chronic truancy\n• Chronic suspensions\n• Curriculum pacing"
87
+ },
88
+ {
89
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
90
+ "pdf_page": 11,
91
+ "fot_page": 53,
92
+ "title": "Data Components Key To Successful Implementation",
93
+ "content": "What data structures and practices, if addressed, will increase your team's efficacy in improving student achievement?\n\nAccess to timely Gradebook data:\n• Are grades updated in a timely manner according to the grade pull schedule?\n• Who can provide the grade-level, course, and student-level data you need?\n• Can you manipulate data into a teacher-friendly format?\n• Do you have or make time to manipulate the data into a teacher-friendly format?\n\nStudent participation data:\n• How are you tracking participation? (intentionally or randomly)\n• Is your tracking tool useful for highlighting trends in participation and its effect on achievement?\n\nIntervention implementation data:\n\nTUTORING:\n• Are teachers actually tutoring students/providing academic support?\n• How are students provided with work to complete during tutoring?\n• If tutoring is administered by external partners, how is communication of student needs and course expectations shared with them?\n\nMENTORING:\n• Do mentoring conversations push students to action around their grades?\n• What information are mentors provided with to drive their mentoring sessions?\n• Are mentors able to advocate professionally with their colleagues?\n\nData analysis:\n• Is sufficient time allocated for analyzing data specific to your intervention?\n• Does your team's analysis of intervention data lead to action toward increasing student achievement?"
94
+ },
95
+ {
96
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
97
+ "pdf_page": 12,
98
+ "fot_page": 54,
99
+ "title": "Some Considerations for Intervention Planning",
100
+ "content": "• Identifying what students need\n• Ensuring intervention is scheduled at accessible times and with a frequency that makes sense\n• Matching adult expertise with student needs\n• Strategizing how to get targeted students to the intervention\n• Connecting what is happening in the intervention to what is happening in the classroom (relational/academic)"
101
+ },
102
+ {
103
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
104
+ "pdf_page": 13,
105
+ "fot_page": 55,
106
+ "title": "The Evidence Process",
107
+ "content": "A circular diagram showing interconnected gears with the following elements:\n\nEssential Questions:\nWhat is the data telling us about our interventions?\nWhat are the underlying values that influence the quality of our interventions?\n\nProcess Components:\n• Making Data-informed Decisions Using Protocols\n• Implementing Interventions (as supported by data)\n• Tracking Interventions (gathering evidence)\n• Examining and Discussing Evidence with Colleagues\n• Documenting and Reflecting on Process\n\nExternal factors:\n• OUTSIDE FORCES\n• OUTSIDE RESOURCES\n\nCentral question: \"What gear is getting stuck?\""
108
+ },
109
+ {
110
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
111
+ "pdf_page": 14,
112
+ "fot_page": 56,
113
+ "title": "Intervention Success Monthly Action Plan (IS-MAP) - Overview",
114
+ "content": "A plan to support action planning using results from the Intervention Evaluation Flowchart."
115
+ },
116
+ {
117
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
118
+ "pdf_page": 15,
119
+ "fot_page": 57,
120
+ "title": "Intervention Success Monthly Action Plan (IS-MAP) - Form Template",
121
+ "content": "Based on quarterly student achievement data and your reflection using the Intervention Evaluation Flowchart, what area needs refinement and what is the change you will make? (refer to bolded categories on the Flowchart)\n\nForm Fields:\n• Area of Refinement: _______________\n• Planned Change: _______________\n\nPlanning Questions:\n1. Why am I planning to do this?\n What's at stake? What do I hope will happen as a result of this change in our team's practice?\n\n2. How will I initiate this change?\n What action do I need to take to bring this change to fruition?\n\n3. What supports do I need to be successful?\n Who can help me and what do I need from them?\n\n4. How will I know if my team has made progress?\n What evidence will tell our team we're on the right track with the intervention?\n\nAction Planning Table:\nColumns: Action Item | Due Date | What I Need | Resource Person\n\nAdapted from the School Reform Initiative I-MAP protocol"
122
+ },
123
+ {
124
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
125
+ "pdf_page": 16,
126
+ "fot_page": 58,
127
+ "title": "Student Success Intervention Plan - Overview",
128
+ "content": "A planning tool for student interventions that includes the identification of baseline data, criteria for success, status checkpoints, and plans for reflection."
129
+ },
130
+ {
131
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
132
+ "pdf_page": 17,
133
+ "fot_page": 59,
134
+ "title": "Student Success Intervention Plan - Form Template",
135
+ "content": "Student Success Intervention Plan: Quarter ___\n\nBasic Information:\n• Date: ___\n• School: ___\n• Grade: ___\n\nFocus Areas (check applicable):\n□ Attendance □ Ds/Fs □ GPA □ On-Track Rate □ Behavior □ Other: ___\n\nTarget Group:\n• Number of Students: ___\n• Baseline data used to select target group: ___\n\nIntervention Description:\n• What it is: ___\n• When it takes place (dates/times): ___\n• Where it takes place: ___\n• Description of activities involved: ___\n\nGoals and Success Criteria:\n• Goal of intervention: ___\n• Criteria for success: ___\n\nPersonnel:\n• Owner(s) of intervention: ___\n• Participants in intervention: ___\n\nTimeline of Intervention:\n• Planning and preparation: ___\n• Introduction to staff: ___\n• Introduction to targeted students: ___\n• Introduction to parents and stakeholders: ___\n• Intervention start date: ___\n• Intervention end date: ___\n\nStatus Checkpoints:\n• Checkpoint 1: ___\n• Checkpoint 2: ___\n• Checkpoint 3: ___\n• Checkpoint 4: ___\n\nTracking and Reflection:\n• Summary of action taken after each checkpoint: ___\n• Reflection at end of intervention: ___"
136
+ },
137
+ {
138
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
139
+ "pdf_page": 18,
140
+ "fot_page": 60,
141
+ "title": "Behavior, Attendance, and Grades (BAG) Report - Overview",
142
+ "content": "A school-generated tool for educators to interact with students on behavior, attendance, and grades. Ideally, schools will produce these reports every five weeks. BAG Reports use real-time data so students understand where and how they are struggling, and which educators they may need to reach out to for support. They also help students understand their current status in relation to their goals. Schools can use BAG Reports in different ways, including individual conversations with students or holding \"town hall\" meetings for all freshmen to review the data and set next steps."
143
+ },
144
+ {
145
+ "source_document": "FOT_Toolkit_ToolSetC.pdf",
146
+ "pdf_page": 19,
147
+ "fot_page": 61,
148
+ "title": "Example BAG Report for Student 'Keith'",
149
+ "content": "Student: Keith\nGrade Level: 9\n8th Period Teacher: Donson\nThe numbers below reflect totals through Semester 1\n\nBEHAVIOR - In what ways do I contribute to a Safe and Respectful school climate?\n• # of Infractions (# of Major Infractions): 5 (1)\n• # of Days of In-School-Suspension (ISS): 10\n• # of Days of Out-of-School-Suspension (OSS): 0\nIf I have any questions regarding my misconducts, I should schedule an appointment with the Dean of Discipline.\n\nATTENDANCE - Do my actions reflect the real me?\n• Days Enrolled: 80\n• Days Present: 73\n• Days Absent: 7\n• My Year-to-Date Attendance Rate is 91%\nIf I have any questions regarding my attendance, I should schedule an appointment with the Attendance Dean.\n\nGRADES - How am I doing academically in my classes? Do my grades represent my true ability?\nPeriod | Courses | Teacher | Grade\nP1 | Algebra 1 | Flint | D\nP2 | English 1 | Lemon | B\nP3 | World Studies | Moeller | C\nP4 | PE I-Health | Spann | A\nP5 | Lunch | | \nP6 | Science | Tyson | D\nP7 | Photography | McCain | B\nP8 | Intro to Comp | Penny | A\n\nMy Estimated GPA is 2.57\n(this estimate does NOT include any previous semesters)\n\nIf I have any questions regarding my grade in a course, I should schedule an appointment with my Teacher."
150
+ }
151
+ ]
docs/document_complexity_map.csv DELETED
@@ -1,16 +0,0 @@
1
- Document Name,Page Range(s),FOT Page,Content Type,Proposed Tool,Notes,,tool,type,description
2
- FOT_Toolkit_ToolSetC.pdf,1,43,Complex Layouts ,Manual,Title page with introductory text.,,PyMuPDF,Simple Text,These are pages with a single column of straightforward paragraphs. They look like a standard document or book page.
3
- FOT_Toolkit_ToolSetC.pdf,2,44,"Table, Complex Layout ","pdfplumber, Manual","""Connections to Framework"" has multiple 2 tables (one useful), and a section with columns.",,pdfplumber,Tables,"These pages contain structured, grid-like data. The content is clearly organized into rows and columns. "
4
- FOT_Toolkit_ToolSetC.pdf,3-7,45-49,Simple Text,PyMuPDF,Content is mostly headings and bulleted lists which can be parsed linearly.,,Manual,Complex Layouts ,"These are the tricky ones. Look for multi-column text (like a newsletter), flowcharts, diagrams with connecting lines, or pages with many floating text boxes and images."
5
- FOT_Toolkit_ToolSetC.pdf,8,50,Table ,pdfplumber,Clear tabular data for a mentorship program report.,,,,
6
- FOT_Toolkit_ToolSetC.pdf,9,51,Simple Text,PyMuPDF,Title page for the flowchart section.,,,,
7
- FOT_Toolkit_ToolSetC.pdf,10-11,52,Complex Layouts ,Manual,"""Intervention Evaluation Flowchart."" The logical flow is crucial and cannot be reliably extracted by text parsers.",,,,
8
- ,,53,Complex Layouts ,"Manual, PyMuPDF","""Data Components"" are in multiple columns. PyMuPDF might work, but manual extraction would be more reliable.",,,,
9
- FOT_Toolkit_ToolSetC.pdf,12,54,Simple Text,PyMuPDF,"""Some Considerations for Intervention Planning"" is a simple bulleted list.",,,,
10
- FOT_Toolkit_ToolSetC.pdf,13,55,Complex Layouts ,Manual,"""The Evidence Process"" is a gear-based diagram. Text is embedded within a complex graphic.",,,,
11
- FOT_Toolkit_ToolSetC.pdf,14,56,Simple Text,PyMuPDF,"Title page for the ""IS-MAP"" section.",,,,
12
- FOT_Toolkit_ToolSetC.pdf,15,57,"Simple Text, Table","PyMuPDF, pdfplumber","""Intervention Success Monthly Action Plan"" is a structured form, best treated as a table.",,,,
13
- FOT_Toolkit_ToolSetC.pdf,16,58,Simple Text,PyMuPDF,"Title page for the ""Student Success Intervention Plan"" section.",,,,
14
- FOT_Toolkit_ToolSetC.pdf,17,59,Table,pdfplumber,"""Student Success Intervention Plan"" is a structured form suitable for table extraction.",,,,
15
- FOT_Toolkit_ToolSetC.pdf,18,60,Simple Text,PyMuPDF,"Title page for the ""Behavior, Attendance, and Grades (BAG) Report.""",,,,
16
- FOT_Toolkit_ToolSetC.pdf,19,61,Complex Layouts ,Manual,"""BAG Report."" This page contains three distinct blocks, two of which are tables. Best handled with a hybrid approach, possibly using pdfplumber on defined regions of the page",,,,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pyproject.toml CHANGED
@@ -20,9 +20,6 @@ dependencies = [
20
  "langchain",
21
  "sentence-transformers",
22
  "faiss-cpu",
23
- "pandas",
24
- "pymupdf",
25
- "pdfplumber",
26
  "transformers",
27
  ]
28
 
 
20
  "langchain",
21
  "sentence-transformers",
22
  "faiss-cpu",
 
 
 
23
  "transformers",
24
  ]
25
 
src/fot_recommender/config.py CHANGED
@@ -1,15 +1,11 @@
1
- # src/fot_recommender/config.py
2
-
3
  from pathlib import Path
4
 
5
  # Define the root directory of the project
6
- # This assumes your src folder is at the top level of your project
7
  PROJECT_ROOT = Path(__file__).parent.parent.parent
8
 
9
- # Define paths to data and documentation files
10
  DATA_DIR = PROJECT_ROOT / "data"
11
- DOCS_DIR = PROJECT_ROOT / "docs"
12
 
13
- # Specific file paths
14
- PDF_PATH = DATA_DIR / "source_pdfs" / "FOT_Toolkit_ToolSetC.pdf"
15
- COMPLEXITY_MAP_PATH = DOCS_DIR / "document_complexity_map.csv"
 
 
 
1
  from pathlib import Path
2
 
3
  # Define the root directory of the project
 
4
  PROJECT_ROOT = Path(__file__).parent.parent.parent
5
 
6
+ # Define paths to data files
7
  DATA_DIR = PROJECT_ROOT / "data"
8
+ PROCESSED_DATA_DIR = DATA_DIR / "processed"
9
 
10
+ # The single source of truth for our knowledge base
11
+ RAW_KB_PATH = PROCESSED_DATA_DIR / "knowledge_base_raw.json"
 
src/fot_recommender/knowledge_base.py DELETED
@@ -1,131 +0,0 @@
1
- import fitz # PyMuPDF
2
- import pdfplumber
3
- import pandas as pd
4
- from typing import List, Dict, Any
5
- from pathlib import Path
6
-
7
- # Import configurations and manual data
8
- from .config import PDF_PATH, COMPLEXITY_MAP_PATH
9
- from .manual_extractions import MANUAL_CONTENT
10
-
11
- def extract_text_with_pymupdf(pdf_path: Path, page_number: int) -> str:
12
- """Extracts all text from a single page using PyMuPDF."""
13
- try:
14
- doc = fitz.open(pdf_path)
15
- page = doc.load_page(page_number - 1)
16
- text = page.get_text("text") # type: ignore
17
- doc.close()
18
- return text
19
- except Exception as e:
20
- return f"Error extracting page {page_number}: {e}"
21
-
22
- def extract_tables_with_pdfplumber(pdf_path: Path, page_number: int) -> List[Dict[str, Any]]:
23
- """Extracts all tables from a single page into a list of JSON-serializable objects."""
24
- tables_content = []
25
- try:
26
- with pdfplumber.open(pdf_path) as pdf:
27
- page = pdf.pages[page_number - 1]
28
- extracted_tables = page.extract_tables()
29
- if not extracted_tables:
30
- return []
31
- for i, table in enumerate(extracted_tables):
32
- if not table or len(table) == 0:
33
- continue
34
- header = table[0]
35
- if not all(h is not None for h in header):
36
- continue
37
- table_data = [dict(zip(header, row)) for row in table[1:]]
38
- tables_content.append({"table_index": i, "data": table_data})
39
- return tables_content
40
- except Exception as e:
41
- return [{"error": f"Could not process tables on page {page_number}: {e}"}]
42
-
43
- def get_manual_extraction(fot_page: int) -> Dict[str, Any]:
44
- """Retrieves a manually extracted content block by its FOT page number."""
45
- for key, value in MANUAL_CONTENT.items():
46
- if str(fot_page) in key:
47
- return value
48
- return {"error": f"No manual content found for FOT page {fot_page}"}
49
-
50
-
51
- def build_knowledge_base() -> List[Dict[str, Any]]:
52
- """
53
- Main function to process all source documents according to the complexity map.
54
- It now correctly handles single pages and page ranges (e.g., '45-49').
55
- """
56
- print("--- Starting Knowledge Base Construction ---")
57
-
58
- if not COMPLEXITY_MAP_PATH.exists():
59
- print(f"ERROR: Complexity map not found at {COMPLEXITY_MAP_PATH}")
60
- return []
61
-
62
- try:
63
- columns_to_use = ["FOT Page", "Content Type", "Proposed Tool"]
64
- df = pd.read_csv(COMPLEXITY_MAP_PATH, usecols=columns_to_use) # type: ignore
65
- complexity_map = df.dropna(subset=["FOT Page"]).copy()
66
- except Exception as e:
67
- print(f"Error loading or parsing CSV: {e}")
68
- return []
69
-
70
- knowledge_base = []
71
- print(f"Loaded complexity map. Processing {len(complexity_map)} entries...")
72
-
73
- # Iterate through the map to drive the extraction process
74
- for index, row in complexity_map.iterrows():
75
- fot_page_str = str(row['FOT Page']).strip()
76
- tool = str(row['Proposed Tool']).strip()
77
-
78
- page_numbers_to_process = []
79
-
80
- if '-' in fot_page_str:
81
- try:
82
- start_str, end_str = fot_page_str.split('-')
83
- start = int(start_str.strip())
84
- end = int(end_str.strip())
85
- page_numbers_to_process.extend(range(start, end + 1))
86
- except ValueError:
87
- print(f"WARNING: Could not parse page range '{fot_page_str}'. Skipping row.")
88
- continue
89
- else:
90
- try:
91
- page_numbers_to_process.append(int(fot_page_str))
92
- except ValueError:
93
- print(f"WARNING: Could not parse page number '{fot_page_str}'. Skipping row.")
94
- continue
95
-
96
- # Process each page determined (either single or from a range)
97
- for page_num in page_numbers_to_process:
98
- print(f"Processing FOT Page {page_num} using tool: {tool}")
99
-
100
- content_block = {
101
- "source_document": PDF_PATH.name,
102
- "fot_page": page_num,
103
- "extraction_tool": tool,
104
- "content": None
105
- }
106
-
107
- tools_to_run = [t.strip() for t in tool.split(',')]
108
- extracted_content = {}
109
-
110
- if "Manual" in tools_to_run:
111
- extracted_content["manual"] = get_manual_extraction(page_num)
112
- if "PyMuPDF" in tools_to_run:
113
- extracted_content["text"] = extract_text_with_pymupdf(PDF_PATH, page_num)
114
- if "pdfplumber" in tools_to_run:
115
- tables = extract_tables_with_pdfplumber(PDF_PATH, page_num)
116
- if tables: # Only add tables if they are found
117
- extracted_content["tables"] = tables
118
-
119
- # Simplify content structure if only one tool was used
120
- if len(extracted_content) == 1:
121
- content_block["content"] = list(extracted_content.values())[0]
122
- # Handle cases where no content was extracted
123
- elif not extracted_content:
124
- content_block["content"] = "No content extracted."
125
- else:
126
- content_block["content"] = extracted_content
127
-
128
- knowledge_base.append(content_block)
129
-
130
- print(f"--- Knowledge base construction complete. {len(knowledge_base)} total pages processed. ---")
131
- return knowledge_base
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/fot_recommender/main.py CHANGED
@@ -1,24 +1,35 @@
 
1
  import pprint
2
- from fot_recommender.knowledge_base import build_knowledge_base
3
 
4
  def main():
5
  """
6
  Main entry point for the FOT Intervention Recommender application.
 
7
  """
8
  print("--- FOT Intervention Recommender ---")
9
 
10
- knowledge_base = build_knowledge_base()
 
 
 
 
 
 
 
 
 
 
 
 
11
 
 
12
  if knowledge_base:
13
- print(f"\nSuccessfully built knowledge base with {len(knowledge_base)} items.")
14
- print("\n--- Sample of First Extracted Intervention ---")
15
- # Pretty-print the first item to verify its structure
16
  pprint.pprint(knowledge_base[0])
17
- print("--------------------------------------------")
18
- else:
19
- print("\nKnowledge base construction returned 0 items. Check implementation.")
20
 
21
- print("\nNext steps: Implement Phase 2 (RAG Pipeline)...")
22
 
23
  if __name__ == "__main__":
24
- main()
 
1
+ import json
2
  import pprint
3
+ from .config import RAW_KB_PATH
4
 
5
  def main():
6
  """
7
  Main entry point for the FOT Intervention Recommender application.
8
+ This version loads a pre-processed knowledge base directly from a JSON file.
9
  """
10
  print("--- FOT Intervention Recommender ---")
11
 
12
+ # --- PHASE 1: LOAD PRE-PROCESSED KNOWLEDGE BASE ---
13
+ print(f"Loading knowledge base from: {RAW_KB_PATH}")
14
+ try:
15
+ with open(RAW_KB_PATH, 'r', encoding='utf-8') as f:
16
+ knowledge_base = json.load(f)
17
+ except FileNotFoundError:
18
+ print(f"FATAL ERROR: The knowledge base file was not found at {RAW_KB_PATH}")
19
+ return
20
+ except json.JSONDecodeError:
21
+ print(f"FATAL ERROR: The file at {RAW_KB_PATH} is not a valid JSON file.")
22
+ return
23
+
24
+ print(f"Successfully loaded {len(knowledge_base)} items.")
25
 
26
+ print("\n--- Sample of First Knowledge Base Item ---")
27
  if knowledge_base:
 
 
 
28
  pprint.pprint(knowledge_base[0])
29
+ print("------------------------------------------")
30
+
31
+ print("\nData loading is complete. Next step: Semantic Chunking.")
32
 
 
33
 
34
  if __name__ == "__main__":
35
+ main()
src/fot_recommender/manual_extractions.py DELETED
@@ -1,37 +0,0 @@
1
- # This file stores manually transcribed text from complex PDF pages.
2
-
3
- MANUAL_CONTENT = {
4
- "page_43": {
5
- "source": "FOT_Toolkit_ToolSetC.pdf, page 43",
6
- "content": """
7
- Purpose
8
- One responsibility of the Success Team is to develop and track interventions...
9
-
10
- How & When to Use
11
- Planning effective student interventions can be a challenging task...
12
- """
13
- },
14
- "page_44_complex": {
15
- "source": "FOT_Toolkit_ToolSetC.pdf, page 44",
16
- "content": """
17
- Team Lead
18
- - Setting Conditions: With principal and Success Team, sets fresh-man success goals...
19
- - Implementation: Develops action-oriented meeting agendas...
20
-
21
- Principal
22
- - Implementation: Reviews and interrogates interim freshman success-related data...
23
- """
24
- },
25
- "page_52_flowchart": {
26
- "source": "FOT_Toolkit_ToolSetC.pdf, page 52",
27
- "content": """
28
- Intervention Evaluation Flowchart Logic:
29
- 1. Identify student need based on FOT indicators.
30
- 2. Was the intervention implemented with fidelity?
31
- 3. If yes, did the student show progress?
32
- 4. If yes, continue and monitor. If no, consider a different intervention.
33
- (etc...)
34
- """
35
- }
36
- # ... add other manual extractions here
37
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
uv.lock CHANGED
@@ -147,41 +147,6 @@ wheels = [
147
  { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" },
148
  ]
149
 
150
- [[package]]
151
- name = "cryptography"
152
- version = "45.0.5"
153
- source = { registry = "https://pypi.org/simple" }
154
- dependencies = [
155
- { name = "cffi", marker = "platform_python_implementation != 'PyPy'" },
156
- ]
157
- sdist = { url = "https://files.pythonhosted.org/packages/95/1e/49527ac611af559665f71cbb8f92b332b5ec9c6fbc4e88b0f8e92f5e85df/cryptography-45.0.5.tar.gz", hash = "sha256:72e76caa004ab63accdf26023fccd1d087f6d90ec6048ff33ad0445abf7f605a", size = 744903, upload-time = "2025-07-02T13:06:25.941Z" }
158
- wheels = [
159
- { url = "https://files.pythonhosted.org/packages/f0/fb/09e28bc0c46d2c547085e60897fea96310574c70fb21cd58a730a45f3403/cryptography-45.0.5-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:101ee65078f6dd3e5a028d4f19c07ffa4dd22cce6a20eaa160f8b5219911e7d8", size = 7043092, upload-time = "2025-07-02T13:05:01.514Z" },
160
- { url = "https://files.pythonhosted.org/packages/b1/05/2194432935e29b91fb649f6149c1a4f9e6d3d9fc880919f4ad1bcc22641e/cryptography-45.0.5-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:3a264aae5f7fbb089dbc01e0242d3b67dffe3e6292e1f5182122bdf58e65215d", size = 4205926, upload-time = "2025-07-02T13:05:04.741Z" },
161
- { url = "https://files.pythonhosted.org/packages/07/8b/9ef5da82350175e32de245646b1884fc01124f53eb31164c77f95a08d682/cryptography-45.0.5-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:e74d30ec9c7cb2f404af331d5b4099a9b322a8a6b25c4632755c8757345baac5", size = 4429235, upload-time = "2025-07-02T13:05:07.084Z" },
162
- { url = "https://files.pythonhosted.org/packages/7c/e1/c809f398adde1994ee53438912192d92a1d0fc0f2d7582659d9ef4c28b0c/cryptography-45.0.5-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:3af26738f2db354aafe492fb3869e955b12b2ef2e16908c8b9cb928128d42c57", size = 4209785, upload-time = "2025-07-02T13:05:09.321Z" },
163
- { url = "https://files.pythonhosted.org/packages/d0/8b/07eb6bd5acff58406c5e806eff34a124936f41a4fb52909ffa4d00815f8c/cryptography-45.0.5-cp311-abi3-manylinux_2_28_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:e6c00130ed423201c5bc5544c23359141660b07999ad82e34e7bb8f882bb78e0", size = 3893050, upload-time = "2025-07-02T13:05:11.069Z" },
164
- { url = "https://files.pythonhosted.org/packages/ec/ef/3333295ed58d900a13c92806b67e62f27876845a9a908c939f040887cca9/cryptography-45.0.5-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:dd420e577921c8c2d31289536c386aaa30140b473835e97f83bc71ea9d2baf2d", size = 4457379, upload-time = "2025-07-02T13:05:13.32Z" },
165
- { url = "https://files.pythonhosted.org/packages/d9/9d/44080674dee514dbb82b21d6fa5d1055368f208304e2ab1828d85c9de8f4/cryptography-45.0.5-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:d05a38884db2ba215218745f0781775806bde4f32e07b135348355fe8e4991d9", size = 4209355, upload-time = "2025-07-02T13:05:15.017Z" },
166
- { url = "https://files.pythonhosted.org/packages/c9/d8/0749f7d39f53f8258e5c18a93131919ac465ee1f9dccaf1b3f420235e0b5/cryptography-45.0.5-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:ad0caded895a00261a5b4aa9af828baede54638754b51955a0ac75576b831b27", size = 4456087, upload-time = "2025-07-02T13:05:16.945Z" },
167
- { url = "https://files.pythonhosted.org/packages/09/d7/92acac187387bf08902b0bf0699816f08553927bdd6ba3654da0010289b4/cryptography-45.0.5-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9024beb59aca9d31d36fcdc1604dd9bbeed0a55bface9f1908df19178e2f116e", size = 4332873, upload-time = "2025-07-02T13:05:18.743Z" },
168
- { url = "https://files.pythonhosted.org/packages/03/c2/840e0710da5106a7c3d4153c7215b2736151bba60bf4491bdb421df5056d/cryptography-45.0.5-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:91098f02ca81579c85f66df8a588c78f331ca19089763d733e34ad359f474174", size = 4564651, upload-time = "2025-07-02T13:05:21.382Z" },
169
- { url = "https://files.pythonhosted.org/packages/2e/92/cc723dd6d71e9747a887b94eb3827825c6c24b9e6ce2bb33b847d31d5eaa/cryptography-45.0.5-cp311-abi3-win32.whl", hash = "sha256:926c3ea71a6043921050eaa639137e13dbe7b4ab25800932a8498364fc1abec9", size = 2929050, upload-time = "2025-07-02T13:05:23.39Z" },
170
- { url = "https://files.pythonhosted.org/packages/1f/10/197da38a5911a48dd5389c043de4aec4b3c94cb836299b01253940788d78/cryptography-45.0.5-cp311-abi3-win_amd64.whl", hash = "sha256:b85980d1e345fe769cfc57c57db2b59cff5464ee0c045d52c0df087e926fbe63", size = 3403224, upload-time = "2025-07-02T13:05:25.202Z" },
171
- { url = "https://files.pythonhosted.org/packages/fe/2b/160ce8c2765e7a481ce57d55eba1546148583e7b6f85514472b1d151711d/cryptography-45.0.5-cp37-abi3-macosx_10_9_universal2.whl", hash = "sha256:f3562c2f23c612f2e4a6964a61d942f891d29ee320edb62ff48ffb99f3de9ae8", size = 7017143, upload-time = "2025-07-02T13:05:27.229Z" },
172
- { url = "https://files.pythonhosted.org/packages/c2/e7/2187be2f871c0221a81f55ee3105d3cf3e273c0a0853651d7011eada0d7e/cryptography-45.0.5-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:3fcfbefc4a7f332dece7272a88e410f611e79458fab97b5efe14e54fe476f4fd", size = 4197780, upload-time = "2025-07-02T13:05:29.299Z" },
173
- { url = "https://files.pythonhosted.org/packages/b9/cf/84210c447c06104e6be9122661159ad4ce7a8190011669afceeaea150524/cryptography-45.0.5-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:460f8c39ba66af7db0545a8c6f2eabcbc5a5528fc1cf6c3fa9a1e44cec33385e", size = 4420091, upload-time = "2025-07-02T13:05:31.221Z" },
174
- { url = "https://files.pythonhosted.org/packages/3e/6a/cb8b5c8bb82fafffa23aeff8d3a39822593cee6e2f16c5ca5c2ecca344f7/cryptography-45.0.5-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:9b4cf6318915dccfe218e69bbec417fdd7c7185aa7aab139a2c0beb7468c89f0", size = 4198711, upload-time = "2025-07-02T13:05:33.062Z" },
175
- { url = "https://files.pythonhosted.org/packages/04/f7/36d2d69df69c94cbb2473871926daf0f01ad8e00fe3986ac3c1e8c4ca4b3/cryptography-45.0.5-cp37-abi3-manylinux_2_28_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:2089cc8f70a6e454601525e5bf2779e665d7865af002a5dec8d14e561002e135", size = 3883299, upload-time = "2025-07-02T13:05:34.94Z" },
176
- { url = "https://files.pythonhosted.org/packages/82/c7/f0ea40f016de72f81288e9fe8d1f6748036cb5ba6118774317a3ffc6022d/cryptography-45.0.5-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:0027d566d65a38497bc37e0dd7c2f8ceda73597d2ac9ba93810204f56f52ebc7", size = 4450558, upload-time = "2025-07-02T13:05:37.288Z" },
177
- { url = "https://files.pythonhosted.org/packages/06/ae/94b504dc1a3cdf642d710407c62e86296f7da9e66f27ab12a1ee6fdf005b/cryptography-45.0.5-cp37-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:be97d3a19c16a9be00edf79dca949c8fa7eff621763666a145f9f9535a5d7f42", size = 4198020, upload-time = "2025-07-02T13:05:39.102Z" },
178
- { url = "https://files.pythonhosted.org/packages/05/2b/aaf0adb845d5dabb43480f18f7ca72e94f92c280aa983ddbd0bcd6ecd037/cryptography-45.0.5-cp37-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:7760c1c2e1a7084153a0f68fab76e754083b126a47d0117c9ed15e69e2103492", size = 4449759, upload-time = "2025-07-02T13:05:41.398Z" },
179
- { url = "https://files.pythonhosted.org/packages/91/e4/f17e02066de63e0100a3a01b56f8f1016973a1d67551beaf585157a86b3f/cryptography-45.0.5-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:6ff8728d8d890b3dda5765276d1bc6fb099252915a2cd3aff960c4c195745dd0", size = 4319991, upload-time = "2025-07-02T13:05:43.64Z" },
180
- { url = "https://files.pythonhosted.org/packages/f2/2e/e2dbd629481b499b14516eed933f3276eb3239f7cee2dcfa4ee6b44d4711/cryptography-45.0.5-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:7259038202a47fdecee7e62e0fd0b0738b6daa335354396c6ddebdbe1206af2a", size = 4554189, upload-time = "2025-07-02T13:05:46.045Z" },
181
- { url = "https://files.pythonhosted.org/packages/f8/ea/a78a0c38f4c8736287b71c2ea3799d173d5ce778c7d6e3c163a95a05ad2a/cryptography-45.0.5-cp37-abi3-win32.whl", hash = "sha256:1e1da5accc0c750056c556a93c3e9cb828970206c68867712ca5805e46dc806f", size = 2911769, upload-time = "2025-07-02T13:05:48.329Z" },
182
- { url = "https://files.pythonhosted.org/packages/79/b3/28ac139109d9005ad3f6b6f8976ffede6706a6478e21c889ce36c840918e/cryptography-45.0.5-cp37-abi3-win_amd64.whl", hash = "sha256:90cb0a7bb35959f37e23303b7eed0a32280510030daba3f7fdfbb65defde6a97", size = 3390016, upload-time = "2025-07-02T13:05:50.811Z" },
183
- ]
184
-
185
  [[package]]
186
  name = "faiss-cpu"
187
  version = "1.11.0.post1"
@@ -226,9 +191,6 @@ source = { editable = "." }
226
  dependencies = [
227
  { name = "faiss-cpu" },
228
  { name = "langchain" },
229
- { name = "pandas" },
230
- { name = "pdfplumber" },
231
- { name = "pymupdf" },
232
  { name = "sentence-transformers" },
233
  { name = "setuptools" },
234
  { name = "torch" },
@@ -249,9 +211,6 @@ requires-dist = [
249
  { name = "faiss-cpu" },
250
  { name = "langchain" },
251
  { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.16.1" },
252
- { name = "pandas" },
253
- { name = "pdfplumber" },
254
- { name = "pymupdf" },
255
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.4.1" },
256
  { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.12.2" },
257
  { name = "sentence-transformers" },
@@ -825,40 +784,6 @@ wheels = [
825
  { url = "https://files.pythonhosted.org/packages/20/12/38679034af332785aac8774540895e234f4d07f7545804097de4b666afd8/packaging-25.0-py3-none-any.whl", hash = "sha256:29572ef2b1f17581046b3a2227d5c611fb25ec70ca1ba8554b24b0e69331a484", size = 66469, upload-time = "2025-04-19T11:48:57.875Z" },
826
  ]
827
 
828
- [[package]]
829
- name = "pandas"
830
- version = "2.3.1"
831
- source = { registry = "https://pypi.org/simple" }
832
- dependencies = [
833
- { name = "numpy" },
834
- { name = "python-dateutil" },
835
- { name = "pytz" },
836
- { name = "tzdata" },
837
- ]
838
- sdist = { url = "https://files.pythonhosted.org/packages/d1/6f/75aa71f8a14267117adeeed5d21b204770189c0a0025acbdc03c337b28fc/pandas-2.3.1.tar.gz", hash = "sha256:0a95b9ac964fe83ce317827f80304d37388ea77616b1425f0ae41c9d2d0d7bb2", size = 4487493, upload-time = "2025-07-07T19:20:04.079Z" }
839
- wheels = [
840
- { url = "https://files.pythonhosted.org/packages/46/de/b8445e0f5d217a99fe0eeb2f4988070908979bec3587c0633e5428ab596c/pandas-2.3.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:689968e841136f9e542020698ee1c4fbe9caa2ed2213ae2388dc7b81721510d3", size = 11588172, upload-time = "2025-07-07T19:18:52.054Z" },
841
- { url = "https://files.pythonhosted.org/packages/1e/e0/801cdb3564e65a5ac041ab99ea6f1d802a6c325bb6e58c79c06a3f1cd010/pandas-2.3.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:025e92411c16cbe5bb2a4abc99732a6b132f439b8aab23a59fa593eb00704232", size = 10717365, upload-time = "2025-07-07T19:18:54.785Z" },
842
- { url = "https://files.pythonhosted.org/packages/51/a5/c76a8311833c24ae61a376dbf360eb1b1c9247a5d9c1e8b356563b31b80c/pandas-2.3.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9b7ff55f31c4fcb3e316e8f7fa194566b286d6ac430afec0d461163312c5841e", size = 11280411, upload-time = "2025-07-07T19:18:57.045Z" },
843
- { url = "https://files.pythonhosted.org/packages/da/01/e383018feba0a1ead6cf5fe8728e5d767fee02f06a3d800e82c489e5daaf/pandas-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7dcb79bf373a47d2a40cf7232928eb7540155abbc460925c2c96d2d30b006eb4", size = 11988013, upload-time = "2025-07-07T19:18:59.771Z" },
844
- { url = "https://files.pythonhosted.org/packages/5b/14/cec7760d7c9507f11c97d64f29022e12a6cc4fc03ac694535e89f88ad2ec/pandas-2.3.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:56a342b231e8862c96bdb6ab97170e203ce511f4d0429589c8ede1ee8ece48b8", size = 12767210, upload-time = "2025-07-07T19:19:02.944Z" },
845
- { url = "https://files.pythonhosted.org/packages/50/b9/6e2d2c6728ed29fb3d4d4d302504fb66f1a543e37eb2e43f352a86365cdf/pandas-2.3.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ca7ed14832bce68baef331f4d7f294411bed8efd032f8109d690df45e00c4679", size = 13440571, upload-time = "2025-07-07T19:19:06.82Z" },
846
- { url = "https://files.pythonhosted.org/packages/80/a5/3a92893e7399a691bad7664d977cb5e7c81cf666c81f89ea76ba2bff483d/pandas-2.3.1-cp312-cp312-win_amd64.whl", hash = "sha256:ac942bfd0aca577bef61f2bc8da8147c4ef6879965ef883d8e8d5d2dc3e744b8", size = 10987601, upload-time = "2025-07-07T19:19:09.589Z" },
847
- { url = "https://files.pythonhosted.org/packages/32/ed/ff0a67a2c5505e1854e6715586ac6693dd860fbf52ef9f81edee200266e7/pandas-2.3.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:9026bd4a80108fac2239294a15ef9003c4ee191a0f64b90f170b40cfb7cf2d22", size = 11531393, upload-time = "2025-07-07T19:19:12.245Z" },
848
- { url = "https://files.pythonhosted.org/packages/c7/db/d8f24a7cc9fb0972adab0cc80b6817e8bef888cfd0024eeb5a21c0bb5c4a/pandas-2.3.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:6de8547d4fdb12421e2d047a2c446c623ff4c11f47fddb6b9169eb98ffba485a", size = 10668750, upload-time = "2025-07-07T19:19:14.612Z" },
849
- { url = "https://files.pythonhosted.org/packages/0f/b0/80f6ec783313f1e2356b28b4fd8d2148c378370045da918c73145e6aab50/pandas-2.3.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:782647ddc63c83133b2506912cc6b108140a38a37292102aaa19c81c83db2928", size = 11342004, upload-time = "2025-07-07T19:19:16.857Z" },
850
- { url = "https://files.pythonhosted.org/packages/e9/e2/20a317688435470872885e7fc8f95109ae9683dec7c50be29b56911515a5/pandas-2.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2ba6aff74075311fc88504b1db890187a3cd0f887a5b10f5525f8e2ef55bfdb9", size = 12050869, upload-time = "2025-07-07T19:19:19.265Z" },
851
- { url = "https://files.pythonhosted.org/packages/55/79/20d746b0a96c67203a5bee5fb4e00ac49c3e8009a39e1f78de264ecc5729/pandas-2.3.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:e5635178b387bd2ba4ac040f82bc2ef6e6b500483975c4ebacd34bec945fda12", size = 12750218, upload-time = "2025-07-07T19:19:21.547Z" },
852
- { url = "https://files.pythonhosted.org/packages/7c/0f/145c8b41e48dbf03dd18fdd7f24f8ba95b8254a97a3379048378f33e7838/pandas-2.3.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:6f3bf5ec947526106399a9e1d26d40ee2b259c66422efdf4de63c848492d91bb", size = 13416763, upload-time = "2025-07-07T19:19:23.939Z" },
853
- { url = "https://files.pythonhosted.org/packages/b2/c0/54415af59db5cdd86a3d3bf79863e8cc3fa9ed265f0745254061ac09d5f2/pandas-2.3.1-cp313-cp313-win_amd64.whl", hash = "sha256:1c78cf43c8fde236342a1cb2c34bcff89564a7bfed7e474ed2fffa6aed03a956", size = 10987482, upload-time = "2025-07-07T19:19:42.699Z" },
854
- { url = "https://files.pythonhosted.org/packages/48/64/2fd2e400073a1230e13b8cd604c9bc95d9e3b962e5d44088ead2e8f0cfec/pandas-2.3.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:8dfc17328e8da77be3cf9f47509e5637ba8f137148ed0e9b5241e1baf526e20a", size = 12029159, upload-time = "2025-07-07T19:19:26.362Z" },
855
- { url = "https://files.pythonhosted.org/packages/d8/0a/d84fd79b0293b7ef88c760d7dca69828d867c89b6d9bc52d6a27e4d87316/pandas-2.3.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:ec6c851509364c59a5344458ab935e6451b31b818be467eb24b0fe89bd05b6b9", size = 11393287, upload-time = "2025-07-07T19:19:29.157Z" },
856
- { url = "https://files.pythonhosted.org/packages/50/ae/ff885d2b6e88f3c7520bb74ba319268b42f05d7e583b5dded9837da2723f/pandas-2.3.1-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:911580460fc4884d9b05254b38a6bfadddfcc6aaef856fb5859e7ca202e45275", size = 11309381, upload-time = "2025-07-07T19:19:31.436Z" },
857
- { url = "https://files.pythonhosted.org/packages/85/86/1fa345fc17caf5d7780d2699985c03dbe186c68fee00b526813939062bb0/pandas-2.3.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2f4d6feeba91744872a600e6edbbd5b033005b431d5ae8379abee5bcfa479fab", size = 11883998, upload-time = "2025-07-07T19:19:34.267Z" },
858
- { url = "https://files.pythonhosted.org/packages/81/aa/e58541a49b5e6310d89474333e994ee57fea97c8aaa8fc7f00b873059bbf/pandas-2.3.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:fe37e757f462d31a9cd7580236a82f353f5713a80e059a29753cf938c6775d96", size = 12704705, upload-time = "2025-07-07T19:19:36.856Z" },
859
- { url = "https://files.pythonhosted.org/packages/d5/f9/07086f5b0f2a19872554abeea7658200824f5835c58a106fa8f2ae96a46c/pandas-2.3.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:5db9637dbc24b631ff3707269ae4559bce4b7fd75c1c4d7e13f40edc42df4444", size = 13189044, upload-time = "2025-07-07T19:19:39.999Z" },
860
- ]
861
-
862
  [[package]]
863
  name = "pathspec"
864
  version = "0.12.1"
@@ -868,33 +793,6 @@ wheels = [
868
  { url = "https://files.pythonhosted.org/packages/cc/20/ff623b09d963f88bfde16306a54e12ee5ea43e9b597108672ff3a408aad6/pathspec-0.12.1-py3-none-any.whl", hash = "sha256:a0d503e138a4c123b27490a4f7beda6a01c6f288df0e4a8b79c7eb0dc7b4cc08", size = 31191, upload-time = "2023-12-10T22:30:43.14Z" },
869
  ]
870
 
871
- [[package]]
872
- name = "pdfminer-six"
873
- version = "20250506"
874
- source = { registry = "https://pypi.org/simple" }
875
- dependencies = [
876
- { name = "charset-normalizer" },
877
- { name = "cryptography" },
878
- ]
879
- sdist = { url = "https://files.pythonhosted.org/packages/78/46/5223d613ac4963e1f7c07b2660fe0e9e770102ec6bda8c038400113fb215/pdfminer_six-20250506.tar.gz", hash = "sha256:b03cc8df09cf3c7aba8246deae52e0bca7ebb112a38895b5e1d4f5dd2b8ca2e7", size = 7387678, upload-time = "2025-05-06T16:17:00.787Z" }
880
- wheels = [
881
- { url = "https://files.pythonhosted.org/packages/73/16/7a432c0101fa87457e75cb12c879e1749c5870a786525e2e0f42871d6462/pdfminer_six-20250506-py3-none-any.whl", hash = "sha256:d81ad173f62e5f841b53a8ba63af1a4a355933cfc0ffabd608e568b9193909e3", size = 5620187, upload-time = "2025-05-06T16:16:58.669Z" },
882
- ]
883
-
884
- [[package]]
885
- name = "pdfplumber"
886
- version = "0.11.7"
887
- source = { registry = "https://pypi.org/simple" }
888
- dependencies = [
889
- { name = "pdfminer-six" },
890
- { name = "pillow" },
891
- { name = "pypdfium2" },
892
- ]
893
- sdist = { url = "https://files.pythonhosted.org/packages/6d/0d/4135821aa7b1a0b77a29fac881ef0890b46b0b002290d04915ed7acc0043/pdfplumber-0.11.7.tar.gz", hash = "sha256:fa67773e5e599de1624255e9b75d1409297c5e1d7493b386ce63648637c67368", size = 115518, upload-time = "2025-06-12T11:30:49.864Z" }
894
- wheels = [
895
- { url = "https://files.pythonhosted.org/packages/db/e0/52b67d4f00e09e497aec4f71bc44d395605e8ebcea52543242ed34c25ef9/pdfplumber-0.11.7-py3-none-any.whl", hash = "sha256:edd2195cca68bd770da479cf528a737e362968ec2351e62a6c0b71ff612ac25e", size = 60029, upload-time = "2025-06-12T11:30:48.89Z" },
896
- ]
897
-
898
  [[package]]
899
  name = "pillow"
900
  version = "11.3.0"
@@ -1054,41 +952,6 @@ wheels = [
1054
  { url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" },
1055
  ]
1056
 
1057
- [[package]]
1058
- name = "pymupdf"
1059
- version = "1.26.3"
1060
- source = { registry = "https://pypi.org/simple" }
1061
- sdist = { url = "https://files.pythonhosted.org/packages/6d/d4/70a265e4bcd43e97480ae62da69396ef4507c8f9cfd179005ee731c92a04/pymupdf-1.26.3.tar.gz", hash = "sha256:b7d2c3ffa9870e1e4416d18862f5ccd356af5fe337b4511093bbbce2ca73b7e5", size = 75990308, upload-time = "2025-07-02T21:34:22.243Z" }
1062
- wheels = [
1063
- { url = "https://files.pythonhosted.org/packages/70/d3/c7af70545cd3097a869fd635bb6222108d3a0fb28c0b8254754a126c4cbb/pymupdf-1.26.3-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:ded891963944e5f13b03b88f6d9e982e816a4ec8689fe360876eef000c161f2b", size = 23057205, upload-time = "2025-07-02T21:26:16.326Z" },
1064
- { url = "https://files.pythonhosted.org/packages/04/3d/ec5b69bfeaa5deefa7141fc0b20d77bb20404507cf17196b4eb59f1f2977/pymupdf-1.26.3-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:436a33c738bb10eadf00395d18a6992b801ffb26521ee1f361ae786dd283327a", size = 22406630, upload-time = "2025-07-02T21:27:10.112Z" },
1065
- { url = "https://files.pythonhosted.org/packages/fc/20/661d3894bb05ad75ed6ca103ee2c3fa44d88a458b5c8d4a946b9c0f2569b/pymupdf-1.26.3-cp39-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:a2d7a3cd442f12f05103cb3bb1415111517f0a97162547a3720f3bbbc5e0b51c", size = 23450287, upload-time = "2025-07-03T07:22:19.317Z" },
1066
- { url = "https://files.pythonhosted.org/packages/9c/7f/21828f018e65b16a033731d21f7b46d93fa81c6e8257f769ca4a1c2a1cb0/pymupdf-1.26.3-cp39-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:454f38c8cf07eb333eb4646dca10517b6e90f57ce2daa2265a78064109d85555", size = 24057319, upload-time = "2025-07-02T21:28:26.697Z" },
1067
- { url = "https://files.pythonhosted.org/packages/71/5d/e8f88cd5a45b8f5fa6590ce8cef3ce0fad30eac6aac8aea12406f95bee7d/pymupdf-1.26.3-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:759b75d2f710ff4edf8d097d2e98f60e9ecef47632cead6f949b3412facdb9f0", size = 24261350, upload-time = "2025-07-02T21:29:21.733Z" },
1068
- { url = "https://files.pythonhosted.org/packages/82/22/ecc560e4f281b5dffafbf3a81f023d268b1746d028044f495115b74a2e70/pymupdf-1.26.3-cp39-abi3-win32.whl", hash = "sha256:a839ed44742faa1cd4956bb18068fe5aae435d67ce915e901318646c4e7bbea6", size = 17116371, upload-time = "2025-07-02T21:30:23.253Z" },
1069
- { url = "https://files.pythonhosted.org/packages/4a/26/8c72973b8833a72785cedc3981eb59b8ac7075942718bbb7b69b352cdde4/pymupdf-1.26.3-cp39-abi3-win_amd64.whl", hash = "sha256:b4cd5124d05737944636cf45fc37ce5824f10e707b0342efe109c7b6bd37a9cc", size = 18735124, upload-time = "2025-07-02T21:31:10.992Z" },
1070
- ]
1071
-
1072
- [[package]]
1073
- name = "pypdfium2"
1074
- version = "4.30.0"
1075
- source = { registry = "https://pypi.org/simple" }
1076
- sdist = { url = "https://files.pythonhosted.org/packages/a1/14/838b3ba247a0ba92e4df5d23f2bea9478edcfd72b78a39d6ca36ccd84ad2/pypdfium2-4.30.0.tar.gz", hash = "sha256:48b5b7e5566665bc1015b9d69c1ebabe21f6aee468b509531c3c8318eeee2e16", size = 140239, upload-time = "2024-05-09T18:33:17.552Z" }
1077
- wheels = [
1078
- { url = "https://files.pythonhosted.org/packages/c7/9a/c8ff5cc352c1b60b0b97642ae734f51edbab6e28b45b4fcdfe5306ee3c83/pypdfium2-4.30.0-py3-none-macosx_10_13_x86_64.whl", hash = "sha256:b33ceded0b6ff5b2b93bc1fe0ad4b71aa6b7e7bd5875f1ca0cdfb6ba6ac01aab", size = 2837254, upload-time = "2024-05-09T18:32:48.653Z" },
1079
- { url = "https://files.pythonhosted.org/packages/21/8b/27d4d5409f3c76b985f4ee4afe147b606594411e15ac4dc1c3363c9a9810/pypdfium2-4.30.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:4e55689f4b06e2d2406203e771f78789bd4f190731b5d57383d05cf611d829de", size = 2707624, upload-time = "2024-05-09T18:32:51.458Z" },
1080
- { url = "https://files.pythonhosted.org/packages/11/63/28a73ca17c24b41a205d658e177d68e198d7dde65a8c99c821d231b6ee3d/pypdfium2-4.30.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4e6e50f5ce7f65a40a33d7c9edc39f23140c57e37144c2d6d9e9262a2a854854", size = 2793126, upload-time = "2024-05-09T18:32:53.581Z" },
1081
- { url = "https://files.pythonhosted.org/packages/d1/96/53b3ebf0955edbd02ac6da16a818ecc65c939e98fdeb4e0958362bd385c8/pypdfium2-4.30.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:3d0dd3ecaffd0b6dbda3da663220e705cb563918249bda26058c6036752ba3a2", size = 2591077, upload-time = "2024-05-09T18:32:55.99Z" },
1082
- { url = "https://files.pythonhosted.org/packages/ec/ee/0394e56e7cab8b5b21f744d988400948ef71a9a892cbeb0b200d324ab2c7/pypdfium2-4.30.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:cc3bf29b0db8c76cdfaac1ec1cde8edf211a7de7390fbf8934ad2aa9b4d6dfad", size = 2864431, upload-time = "2024-05-09T18:32:57.911Z" },
1083
- { url = "https://files.pythonhosted.org/packages/65/cd/3f1edf20a0ef4a212a5e20a5900e64942c5a374473671ac0780eaa08ea80/pypdfium2-4.30.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f1f78d2189e0ddf9ac2b7a9b9bd4f0c66f54d1389ff6c17e9fd9dc034d06eb3f", size = 2812008, upload-time = "2024-05-09T18:32:59.886Z" },
1084
- { url = "https://files.pythonhosted.org/packages/c8/91/2d517db61845698f41a2a974de90762e50faeb529201c6b3574935969045/pypdfium2-4.30.0-py3-none-musllinux_1_1_aarch64.whl", hash = "sha256:5eda3641a2da7a7a0b2f4dbd71d706401a656fea521b6b6faa0675b15d31a163", size = 6181543, upload-time = "2024-05-09T18:33:02.597Z" },
1085
- { url = "https://files.pythonhosted.org/packages/ba/c4/ed1315143a7a84b2c7616569dfb472473968d628f17c231c39e29ae9d780/pypdfium2-4.30.0-py3-none-musllinux_1_1_i686.whl", hash = "sha256:0dfa61421b5eb68e1188b0b2231e7ba35735aef2d867d86e48ee6cab6975195e", size = 6175911, upload-time = "2024-05-09T18:33:05.376Z" },
1086
- { url = "https://files.pythonhosted.org/packages/7a/c4/9e62d03f414e0e3051c56d5943c3bf42aa9608ede4e19dc96438364e9e03/pypdfium2-4.30.0-py3-none-musllinux_1_1_x86_64.whl", hash = "sha256:f33bd79e7a09d5f7acca3b0b69ff6c8a488869a7fab48fdf400fec6e20b9c8be", size = 6267430, upload-time = "2024-05-09T18:33:08.067Z" },
1087
- { url = "https://files.pythonhosted.org/packages/90/47/eda4904f715fb98561e34012826e883816945934a851745570521ec89520/pypdfium2-4.30.0-py3-none-win32.whl", hash = "sha256:ee2410f15d576d976c2ab2558c93d392a25fb9f6635e8dd0a8a3a5241b275e0e", size = 2775951, upload-time = "2024-05-09T18:33:10.567Z" },
1088
- { url = "https://files.pythonhosted.org/packages/25/bd/56d9ec6b9f0fc4e0d95288759f3179f0fcd34b1a1526b75673d2f6d5196f/pypdfium2-4.30.0-py3-none-win_amd64.whl", hash = "sha256:90dbb2ac07be53219f56be09961eb95cf2473f834d01a42d901d13ccfad64b4c", size = 2892098, upload-time = "2024-05-09T18:33:13.107Z" },
1089
- { url = "https://files.pythonhosted.org/packages/be/7a/097801205b991bc3115e8af1edb850d30aeaf0118520b016354cf5ccd3f6/pypdfium2-4.30.0-py3-none-win_arm64.whl", hash = "sha256:119b2969a6d6b1e8d55e99caaf05290294f2d0fe49c12a3f17102d01c441bd29", size = 2752118, upload-time = "2024-05-09T18:33:15.489Z" },
1090
- ]
1091
-
1092
  [[package]]
1093
  name = "pytest"
1094
  version = "8.4.1"
@@ -1105,27 +968,6 @@ wheels = [
1105
  { url = "https://files.pythonhosted.org/packages/29/16/c8a903f4c4dffe7a12843191437d7cd8e32751d5de349d45d3fe69544e87/pytest-8.4.1-py3-none-any.whl", hash = "sha256:539c70ba6fcead8e78eebbf1115e8b589e7565830d7d006a8723f19ac8a0afb7", size = 365474, upload-time = "2025-06-18T05:48:03.955Z" },
1106
  ]
1107
 
1108
- [[package]]
1109
- name = "python-dateutil"
1110
- version = "2.9.0.post0"
1111
- source = { registry = "https://pypi.org/simple" }
1112
- dependencies = [
1113
- { name = "six" },
1114
- ]
1115
- sdist = { url = "https://files.pythonhosted.org/packages/66/c0/0c8b6ad9f17a802ee498c46e004a0eb49bc148f2fd230864601a86dcf6db/python-dateutil-2.9.0.post0.tar.gz", hash = "sha256:37dd54208da7e1cd875388217d5e00ebd4179249f90fb72437e91a35459a0ad3", size = 342432, upload-time = "2024-03-01T18:36:20.211Z" }
1116
- wheels = [
1117
- { url = "https://files.pythonhosted.org/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl", hash = "sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427", size = 229892, upload-time = "2024-03-01T18:36:18.57Z" },
1118
- ]
1119
-
1120
- [[package]]
1121
- name = "pytz"
1122
- version = "2025.2"
1123
- source = { registry = "https://pypi.org/simple" }
1124
- sdist = { url = "https://files.pythonhosted.org/packages/f8/bf/abbd3cdfb8fbc7fb3d4d38d320f2441b1e7cbe29be4f23797b4a2b5d8aac/pytz-2025.2.tar.gz", hash = "sha256:360b9e3dbb49a209c21ad61809c7fb453643e048b38924c765813546746e81c3", size = 320884, upload-time = "2025-03-25T02:25:00.538Z" }
1125
- wheels = [
1126
- { url = "https://files.pythonhosted.org/packages/81/c4/34e93fe5f5429d7570ec1fa436f1986fb1f00c3e0f43a589fe2bbcd22c3f/pytz-2025.2-py2.py3-none-any.whl", hash = "sha256:5ddf76296dd8c44c26eb8f4b6f35488f3ccbf6fbbd7adee0b7262d43f0ec2f00", size = 509225, upload-time = "2025-03-25T02:24:58.468Z" },
1127
- ]
1128
-
1129
  [[package]]
1130
  name = "pyyaml"
1131
  version = "6.0.2"
@@ -1389,15 +1231,6 @@ wheels = [
1389
  { url = "https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl", hash = "sha256:062d34222ad13e0cc312a4c02d73f059e86a4acbfbdea8f8f76b28c99f306922", size = 1201486, upload-time = "2025-05-27T00:56:49.664Z" },
1390
  ]
1391
 
1392
- [[package]]
1393
- name = "six"
1394
- version = "1.17.0"
1395
- source = { registry = "https://pypi.org/simple" }
1396
- sdist = { url = "https://files.pythonhosted.org/packages/94/e7/b2c673351809dca68a0e064b6af791aa332cf192da575fd474ed7d6f16a2/six-1.17.0.tar.gz", hash = "sha256:ff70335d468e7eb6ec65b95b99d3a2836546063f63acc5171de367e834932a81", size = 34031, upload-time = "2024-12-04T17:35:28.174Z" }
1397
- wheels = [
1398
- { url = "https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl", hash = "sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274", size = 11050, upload-time = "2024-12-04T17:35:26.475Z" },
1399
- ]
1400
-
1401
  [[package]]
1402
  name = "sniffio"
1403
  version = "1.3.1"
@@ -1576,15 +1409,6 @@ wheels = [
1576
  { url = "https://files.pythonhosted.org/packages/17/69/cd203477f944c353c31bade965f880aa1061fd6bf05ded0726ca845b6ff7/typing_inspection-0.4.1-py3-none-any.whl", hash = "sha256:389055682238f53b04f7badcb49b989835495a96700ced5dab2d8feae4b26f51", size = 14552, upload-time = "2025-05-21T18:55:22.152Z" },
1577
  ]
1578
 
1579
- [[package]]
1580
- name = "tzdata"
1581
- version = "2025.2"
1582
- source = { registry = "https://pypi.org/simple" }
1583
- sdist = { url = "https://files.pythonhosted.org/packages/95/32/1a225d6164441be760d75c2c42e2780dc0873fe382da3e98a2e1e48361e5/tzdata-2025.2.tar.gz", hash = "sha256:b60a638fcc0daffadf82fe0f57e53d06bdec2f36c4df66280ae79bce6bd6f2b9", size = 196380, upload-time = "2025-03-23T13:54:43.652Z" }
1584
- wheels = [
1585
- { url = "https://files.pythonhosted.org/packages/5c/23/c7abc0ca0a1526a0774eca151daeb8de62ec457e77262b66b359c3c7679e/tzdata-2025.2-py2.py3-none-any.whl", hash = "sha256:1a403fada01ff9221ca8044d701868fa132215d84beb92242d9acd2147f667a8", size = 347839, upload-time = "2025-03-23T13:54:41.845Z" },
1586
- ]
1587
-
1588
  [[package]]
1589
  name = "urllib3"
1590
  version = "2.5.0"
 
147
  { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" },
148
  ]
149
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
150
  [[package]]
151
  name = "faiss-cpu"
152
  version = "1.11.0.post1"
 
191
  dependencies = [
192
  { name = "faiss-cpu" },
193
  { name = "langchain" },
 
 
 
194
  { name = "sentence-transformers" },
195
  { name = "setuptools" },
196
  { name = "torch" },
 
211
  { name = "faiss-cpu" },
212
  { name = "langchain" },
213
  { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.16.1" },
 
 
 
214
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.4.1" },
215
  { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.12.2" },
216
  { name = "sentence-transformers" },
 
784
  { url = "https://files.pythonhosted.org/packages/20/12/38679034af332785aac8774540895e234f4d07f7545804097de4b666afd8/packaging-25.0-py3-none-any.whl", hash = "sha256:29572ef2b1f17581046b3a2227d5c611fb25ec70ca1ba8554b24b0e69331a484", size = 66469, upload-time = "2025-04-19T11:48:57.875Z" },
785
  ]
786
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
787
  [[package]]
788
  name = "pathspec"
789
  version = "0.12.1"
 
793
  { url = "https://files.pythonhosted.org/packages/cc/20/ff623b09d963f88bfde16306a54e12ee5ea43e9b597108672ff3a408aad6/pathspec-0.12.1-py3-none-any.whl", hash = "sha256:a0d503e138a4c123b27490a4f7beda6a01c6f288df0e4a8b79c7eb0dc7b4cc08", size = 31191, upload-time = "2023-12-10T22:30:43.14Z" },
794
  ]
795
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
796
  [[package]]
797
  name = "pillow"
798
  version = "11.3.0"
 
952
  { url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" },
953
  ]
954
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
955
  [[package]]
956
  name = "pytest"
957
  version = "8.4.1"
 
968
  { url = "https://files.pythonhosted.org/packages/29/16/c8a903f4c4dffe7a12843191437d7cd8e32751d5de349d45d3fe69544e87/pytest-8.4.1-py3-none-any.whl", hash = "sha256:539c70ba6fcead8e78eebbf1115e8b589e7565830d7d006a8723f19ac8a0afb7", size = 365474, upload-time = "2025-06-18T05:48:03.955Z" },
969
  ]
970
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
971
  [[package]]
972
  name = "pyyaml"
973
  version = "6.0.2"
 
1231
  { url = "https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl", hash = "sha256:062d34222ad13e0cc312a4c02d73f059e86a4acbfbdea8f8f76b28c99f306922", size = 1201486, upload-time = "2025-05-27T00:56:49.664Z" },
1232
  ]
1233
 
 
 
 
 
 
 
 
 
 
1234
  [[package]]
1235
  name = "sniffio"
1236
  version = "1.3.1"
 
1409
  { url = "https://files.pythonhosted.org/packages/17/69/cd203477f944c353c31bade965f880aa1061fd6bf05ded0726ca845b6ff7/typing_inspection-0.4.1-py3-none-any.whl", hash = "sha256:389055682238f53b04f7badcb49b989835495a96700ced5dab2d8feae4b26f51", size = 14552, upload-time = "2025-05-21T18:55:22.152Z" },
1410
  ]
1411
 
 
 
 
 
 
 
 
 
 
1412
  [[package]]
1413
  name = "urllib3"
1414
  version = "2.5.0"