chuckfinca commited on
Commit
79336f1
·
1 Parent(s): b0ef2dc

docs(readme): Create comprehensive project README

Browse files

This commit introduces a complete, professional README.md to serve as the primary documentation for the FOT Intervention Recommender PoC.

The new README is designed to align with the final project state and the requirements of the performance task brief. It provides a clear and comprehensive overview for both technical and non-technical audiences.

Key additions include:
- A clear project goal and value proposition.
- A link to the live Gradio demo on Hugging Face Spaces.
- Professional instructions for requesting an access key via GitHub Issues.
- A detailed breakdown of the final RAG architecture and strategic decisions.
- Step-by-step local setup and execution instructions using `uv`.
- A clear guide for running development tools (pytest, black, ruff).
- A project structure diagram for easy navigation.

Files changed (1) hide show
  1. README.md +117 -37
README.md CHANGED
@@ -13,71 +13,151 @@ short_description: POC - Freshman On-Track RAG Intervention Recommender
13
  ---
14
 
15
 
16
- # Demo Application
17
 
18
- A Python project for coding exercises and development.
 
 
19
 
20
- ## Setup for Development
21
 
22
- This project has dependencies with specific hardware requirements (e.g., PyTorch). To ensure a smooth setup on any machine, follow this two-step process.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  1. **Create the virtual environment:**
25
  ```bash
26
  uv venv
27
  ```
 
 
 
28
 
29
  2. **Install PyTorch Separately:**
30
- This command lets PyTorch's installer find the correct version for your specific hardware (Intel Mac, Apple Silicon, Windows, Linux, etc.).
31
  ```bash
32
  uv pip install torch --index-url https://download.pytorch.org/whl/cpu
33
  ```
34
- *Note: We are explicitly using the CPU-only version of PyTorch, which is perfect for this project and avoids complex CUDA dependencies.*
35
 
36
  3. **Install the Project:**
37
- Now that the difficult dependency is handled, install our application and its other development tools.
38
  ```bash
39
  uv pip install -e ".[dev]"
40
  ```
41
 
42
- This command will now see that a compatible version of `torch` is already installed and will proceed without errors.
43
 
44
-
45
- ## Running the Application
46
-
47
- After setup, run the application using its console script entry point. This is the standard way to run the app and avoids any warnings.
48
 
49
  ```bash
50
  uv run fot-recommender
51
  ```
52
 
53
- ## Development Tools
54
 
55
- After setting up for development, you can use the following tools.
56
 
57
- **Run Tests:**
58
- ```bash
59
- uv run pytest
60
- ```
61
 
62
- **Format Code:**
63
- ```bash
64
- uv run black .
65
- ```
66
-
67
- **Lint Code:**
68
- ```bash
69
- uv run ruff check .
70
- ```
71
-
72
- **Type Checking:**
73
- ```bash
74
- uv run mypy src/
75
- ```
 
 
76
 
77
- ## Project Structure
78
 
79
  ```
80
- src/demo_application/ # Main package
81
- tests/ # Test files
82
- pyproject.toml # Project configuration
83
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
 
16
+ # Freshman On-Track (FOT) Intervention Recommender
17
 
18
+ [![Python Version](https://img.shields.io/badge/Python-3.12-blue.svg)](https://www.python.org/downloads/release/python-3120/)
19
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
20
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/chuckfinca/fot-recommender-api)
21
 
22
+ This repository contains the proof-of-concept for the Freshman On-Track (FOT) Intervention Recommender, an AI-powered tool designed to empower educators.
23
 
24
+ ## 🚀 Live Demo
25
+
26
+ The full application is deployed as an interactive web API on Hugging Face Spaces.
27
+
28
+ **[👉 Click Here to Launch the Live FOT Recommender API](https://huggingface.co/spaces/chuckfinca/fot-recommender-api)**
29
+
30
+ **Note on Access:** The public demo is protected by an access key. If you would like to try the live application, please **[open a GitHub issue in this repository](https://github.com/chuckfinca/fot-intervention-recommender/issues/new)** to request access, and I will be happy to provide a key.
31
+
32
+ ## 1. Project Goal
33
+
34
+ Freshman year performance is the strongest predictor of high school graduation. However, educators often lack systematic tools to match at-risk 9th graders with the specific, evidence-based interventions they need.
35
+
36
+ This project addresses that gap by providing a **Retrieval-Augmented Generation (RAG)** system that transforms a simple narrative about a student's challenges into a set of clear, actionable, and evidence-based recommendations. It turns scattered educational research into targeted guidance, enabling educators to support their students more effectively.
37
+
38
+ ## 2. Features
39
+
40
+ * **Advanced RAG Architecture**: Utilizes a sophisticated pipeline to ensure recommendations are relevant and grounded in evidence.
41
+ * **Retrieval**: Employs a `FAISS` vector database and the `all-MiniLM-L6-v2` sentence-transformer model to perform semantic search over the knowledge base.
42
+ * **Generation**: Uses Google's `gemini-1.5-flash-latest` model to synthesize the retrieved evidence into a coherent, actionable plan.
43
+ * **Persona-Based Recommendations**: Delivers tailored advice for different audiences, fulfilling a key project bonus goal. The system can generate distinct outputs for a **teacher**, **parent**, or **principal**.
44
+ * **Evidence-Backed**: Every recommendation is based on a curated knowledge base of best-practice documents from reputable sources like the Network for College Success, the Institute of Education Sciences, and Attendance Works.
45
+ * **Interactive Web Application**: A user-friendly Gradio UI allows for easy interaction, example scenarios, and a secure access key system for the demo.
46
+ * **Full Transparency**: The "Evidence Base" section in the output shows the exact source documents, page numbers, and content snippets used to generate the recommendation, along with a relevance score for each.
47
+
48
+ ## 3. System Architecture
49
+
50
+ The project follows a modern RAG architecture designed for quality and scalability.
51
+
52
+ 1. **Knowledge Base Curation**: A strategic decision was made to manually curate a high-quality `knowledge_base_raw.json` file from the source documents. For this proof-of-concept, this approach ensured maximum quality for the RAG pipeline, bypassing the complexities of programmatic PDF extraction.
53
+ 2. **Data Preprocessing**: A `build_knowledge_base.py` script processes the raw JSON. It uses a semantic chunking strategy to group related concepts, creating a final `knowledge_base_final_chunks.json` file.
54
+ 3. **Vector Indexing**: During the build process, the pre-processed chunks are encoded into vector embeddings and stored in a `faiss_index.bin` file for efficient similarity search.
55
+ 4. **RAG Pipeline (At Runtime)**:
56
+ * The user enters a student narrative into the Gradio app.
57
+ * The narrative is converted into a vector embedding.
58
+ * FAISS performs a similarity search on the vector index to retrieve the most relevant intervention chunks.
59
+ * The retrieved chunks and the original narrative are formatted into a detailed prompt, tailored to the selected persona (teacher, parent, or principal).
60
+ * The prompt is sent to the Gemini API, which generates a synthesized recommendation.
61
+ * The final recommendation and its evidence base are formatted and displayed to the user.
62
+
63
+ ## 4. How to Run Locally
64
+
65
+ This project uses `uv` for fast and reliable dependency management.
66
+
67
+ ### Prerequisites
68
+
69
+ 1. **Python >= 3.12**
70
+ 2. **`uv` installed**:
71
+ ```bash
72
+ pip install uv
73
+ ```
74
+ 3. **Environment Variables**: You must create a `.env` file in the project's root directory. The application loads secrets from this file.
75
+ ```
76
+ # .env
77
+ FOT_GOOGLE_API_KEY="your_google_api_key_here"
78
+ DEMO_PASSWORD="your_local_password" # Sets the password for your local instance of the Gradio app.
79
+ ```
80
+
81
+ ### Setup
82
+
83
+ Follow this two-step process to ensure hardware-specific dependencies like PyTorch are installed correctly.
84
 
85
  1. **Create the virtual environment:**
86
  ```bash
87
  uv venv
88
  ```
89
+ *Activate the environment:*
90
+ * macOS/Linux: `source .venv/bin/activate`
91
+ * Windows: `.venv\Scripts\activate`
92
 
93
  2. **Install PyTorch Separately:**
94
+ This command lets `uv` find the correct PyTorch version for your specific hardware (Intel Mac, Apple Silicon, Windows, Linux, etc.).
95
  ```bash
96
  uv pip install torch --index-url https://download.pytorch.org/whl/cpu
97
  ```
98
+ *Note: We explicitly use the CPU-only version of PyTorch, which is perfect for this project and avoids complex CUDA dependencies.*
99
 
100
  3. **Install the Project:**
101
+ Now that the difficult dependency is handled, install the application and its development tools.
102
  ```bash
103
  uv pip install -e ".[dev]"
104
  ```
105
 
106
+ ### Running the Application
107
 
108
+ After setup, run the Gradio web application using its console script entry point.
 
 
 
109
 
110
  ```bash
111
  uv run fot-recommender
112
  ```
113
 
114
+ This will launch the interactive Gradio API, which you can access in your browser.
115
 
116
+ ## 5. Development
117
 
118
+ The project is configured with a suite of standard development tools for maintaining code quality.
 
 
 
119
 
120
+ * **Run Tests:**
121
+ ```bash
122
+ uv run pytest
123
+ ```
124
+ * **Format Code:**
125
+ ```bash
126
+ uv run black .
127
+ ```
128
+ * **Lint Code:**
129
+ ```bash
130
+ uv run ruff check .
131
+ ```
132
+ * **Type Checking:**
133
+ ```bash
134
+ uv run mypy src/
135
+ ```
136
 
137
+ ## 6. Project Structure
138
 
139
  ```
140
+ .
141
+ ├── app.py # Gradio UI and web API entry point
142
+ ├── data/
143
+ │ ├── processed/ # Processed data artifacts
144
+ │ │ ├── citations.json
145
+ │ │ ├── faiss_index.bin
146
+ │ │ ├── knowledge_base_final_chunks.json
147
+ │ │ └── knowledge_base_raw.json
148
+ │ └── source_pdfs/ # Original source documents
149
+ ├── docs/ # Project planning documents
150
+ ├── notebooks/ # Proof-of-concept notebook
151
+ ├── pyproject.toml # Project configuration and dependencies
152
+ ├── README.md # This file
153
+ ├── scripts/
154
+ │ └── build_knowledge_base.py # Script to build data artifacts
155
+ ├── src/
156
+ │ └── fot_recommender/ # Main Python package
157
+ │ ├── __init__.py
158
+ │ ├── config.py # Configuration and environment variables
159
+ │ ├── main.py # Main application logic
160
+ │ ├── prompts.py # Prompts for the generative model
161
+ │ ├── rag_pipeline.py # Core RAG logic
162
+ │ └── semantic_chunker.py # Logic for chunking source data
163
+ └── tests/ # Unit and integration tests