File size: 7,950 Bytes
8bfb8e4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79336f1
0031f83
79336f1
 
 
0031f83
79336f1
0031f83
79336f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1437923
 
 
 
 
79336f1
 
 
1437923
 
79336f1
1437923
 
 
79336f1
1437923
 
79336f1
1437923
 
 
 
79336f1
0031f83
79336f1
0031f83
 
c32c832
0031f83
 
79336f1
0031f83
79336f1
0031f83
79336f1
0031f83
79336f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0031f83
79336f1
0031f83
 
79336f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
title: Fot Recommender Api
emoji: 
colorFrom: green
colorTo: pink
sdk: gradio
sdk_version: 5.41.0
python_version: "3.12"
app_file: app.py
pinned: false
license: mit
short_description: POC - Freshman On-Track RAG Intervention Recommender
---


# Freshman On-Track (FOT) Intervention Recommender

[![Python Version](https://img.shields.io/badge/Python-3.12-blue.svg)](https://www.python.org/downloads/release/python-3120/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/chuckfinca/fot-recommender-api)

This repository contains the proof-of-concept for the Freshman On-Track (FOT) Intervention Recommender, an AI-powered tool designed to empower educators.

## 🚀 Live Demo

The full application is deployed as an interactive web API on Hugging Face Spaces.

**[👉 Click Here to Launch the Live FOT Recommender API](https://huggingface.co/spaces/chuckfinca/fot-recommender-api)**

**Note on Access:** The public demo is protected by an access key. If you would like to try the live application, please **[open a GitHub issue in this repository](https://github.com/chuckfinca/fot-intervention-recommender/issues/new)** to request access, and I will be happy to provide a key.

## 1. Project Goal

Freshman year performance is the strongest predictor of high school graduation. However, educators often lack systematic tools to match at-risk 9th graders with the specific, evidence-based interventions they need.

This project addresses that gap by providing a **Retrieval-Augmented Generation (RAG)** system that transforms a simple narrative about a student's challenges into a set of clear, actionable, and evidence-based recommendations. It turns scattered educational research into targeted guidance, enabling educators to support their students more effectively.

## 2. Features

*   **Advanced RAG Architecture**: Utilizes a sophisticated pipeline to ensure recommendations are relevant and grounded in evidence.
    *   **Retrieval**: Employs a `FAISS` vector database and the `all-MiniLM-L6-v2` sentence-transformer model to perform semantic search over the knowledge base.
    *   **Generation**: Uses Google's `gemini-1.5-flash-latest` model to synthesize the retrieved evidence into a coherent, actionable plan.
*   **Persona-Based Recommendations**: Delivers tailored advice for different audiences, fulfilling a key project bonus goal. The system can generate distinct outputs for a **teacher**, **parent**, or **principal**.
*   **Evidence-Backed**: Every recommendation is based on a curated knowledge base of best-practice documents from reputable sources like the Network for College Success, the Institute of Education Sciences, and Attendance Works.
*   **Interactive Web Application**: A user-friendly Gradio UI allows for easy interaction, example scenarios, and a secure access key system for the demo.
*   **Full Transparency**: The "Evidence Base" section in the output shows the exact source documents, page numbers, and content snippets used to generate the recommendation, along with a relevance score for each.

## 3. System Architecture

The project follows a modern RAG architecture designed for quality and scalability.

1.  **Knowledge Base Curation**: A strategic decision was made to manually curate a high-quality `knowledge_base_raw.json` file from the source documents. For this proof-of-concept, this approach ensured maximum quality for the RAG pipeline, bypassing the complexities of programmatic PDF extraction.
2.  **Data Preprocessing**: A `build_knowledge_base.py` script processes the raw JSON. It uses a semantic chunking strategy to group related concepts, creating a final `knowledge_base_final_chunks.json` file.
3.  **Vector Indexing**: During the build process, the pre-processed chunks are encoded into vector embeddings and stored in a `faiss_index.bin` file for efficient similarity search.
4.  **RAG Pipeline (At Runtime)**:
    *   The user enters a student narrative into the Gradio app.
    *   The narrative is converted into a vector embedding.
    *   FAISS performs a similarity search on the vector index to retrieve the most relevant intervention chunks.
    *   The retrieved chunks and the original narrative are formatted into a detailed prompt, tailored to the selected persona (teacher, parent, or principal).
    *   The prompt is sent to the Gemini API, which generates a synthesized recommendation.
    *   The final recommendation and its evidence base are formatted and displayed to the user.

## 4. How to Run Locally

This project uses `uv` for fast and reliable dependency management.

### Prerequisites

1.  **Python >= 3.12**
2.  **`uv` installed**:
    ```bash
    pip install uv
    ```
3.  **Environment Variables**: You must create a `.env` file in the project's root directory. The application loads secrets from this file.
    ```
    # .env
    FOT_GOOGLE_API_KEY="your_google_api_key_here"
    DEMO_PASSWORD="your_local_password" # Sets the password for your local instance of the Gradio app.
    ```

### Setup

Follow this two-step process to ensure hardware-specific dependencies like PyTorch are installed correctly.

1.  **Create the virtual environment:**
    ```bash
    uv venv
    ```
    *Activate the environment:*
    *   macOS/Linux: `source .venv/bin/activate`
    *   Windows: `.venv\Scripts\activate`

2.  **Install PyTorch Separately:**
    This command lets `uv` find the correct PyTorch version for your specific hardware (Intel Mac, Apple Silicon, Windows, Linux, etc.).
    ```bash
    uv pip install torch --index-url https://download.pytorch.org/whl/cpu
    ```
    *Note: We explicitly use the CPU-only version of PyTorch, which is perfect for this project and avoids complex CUDA dependencies.*

3.  **Install the Project:**
    Now that the difficult dependency is handled, install the application and its development tools.
    ```bash
    uv pip install -e ".[dev]"
    ```

### Running the Application

After setup, run the Gradio web application using its console script entry point.

```bash
uv run fot-recommender
```

This will launch the interactive Gradio API, which you can access in your browser.

## 5. Development

The project is configured with a suite of standard development tools for maintaining code quality.

*   **Run Tests:**
    ```bash
    uv run pytest
    ```
*   **Format Code:**
    ```bash
    uv run black .
    ```
*   **Lint Code:**
    ```bash
    uv run ruff check .
    ```
*   **Type Checking:**
    ```bash
    uv run mypy src/
    ```

## 6. Project Structure

```
.
├── app.py                  # Gradio UI and web API entry point
├── data/
│   ├── processed/          # Processed data artifacts
│   │   ├── citations.json
│   │   ├── faiss_index.bin
│   │   ├── knowledge_base_final_chunks.json
│   │   └── knowledge_base_raw.json
│   └── source_pdfs/        # Original source documents
├── docs/                     # Project planning documents
├── notebooks/                # Proof-of-concept notebook
├── pyproject.toml          # Project configuration and dependencies
├── README.md               # This file
├── scripts/
│   └── build_knowledge_base.py # Script to build data artifacts
├── src/
│   └── fot_recommender/    # Main Python package
│       ├── __init__.py
│       ├── config.py       # Configuration and environment variables
│       ├── main.py         # Main application logic
│       ├── prompts.py      # Prompts for the generative model
│       ├── rag_pipeline.py # Core RAG logic
│       └── semantic_chunker.py # Logic for chunking source data
└── tests/                    # Unit and integration tests