Spaces:

chuckfinca
/

fot-recommender-api

Sleeping

File size: 7,950 Bytes

---
title: Fot Recommender Api
emoji: ⚡
colorFrom: green
colorTo: pink
sdk: gradio
sdk_version: 5.41.0
python_version: "3.12"
app_file: app.py
pinned: false
license: mit
short_description: POC - Freshman On-Track RAG Intervention Recommender
---


# Freshman On-Track (FOT) Intervention Recommender

[![Python Version](https://img.shields.io/badge/Python-3.12-blue.svg)](https://www.python.org/downloads/release/python-3120/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/chuckfinca/fot-recommender-api)

This repository contains the proof-of-concept for the Freshman On-Track (FOT) Intervention Recommender, an AI-powered tool designed to empower educators.

## 🚀 Live Demo

The full application is deployed as an interactive web API on Hugging Face Spaces.

**[👉 Click Here to Launch the Live FOT Recommender API](https://huggingface.co/spaces/chuckfinca/fot-recommender-api)**

**Note on Access:** The public demo is protected by an access key. If you would like to try the live application, please **[open a GitHub issue in this repository](https://github.com/chuckfinca/fot-intervention-recommender/issues/new)** to request access, and I will be happy to provide a key.

## 1. Project Goal

Freshman year performance is the strongest predictor of high school graduation. However, educators often lack systematic tools to match at-risk 9th graders with the specific, evidence-based interventions they need.

This project addresses that gap by providing a **Retrieval-Augmented Generation (RAG)** system that transforms a simple narrative about a student's challenges into a set of clear, actionable, and evidence-based recommendations. It turns scattered educational research into targeted guidance, enabling educators to support their students more effectively.

## 2. Features

*   **Advanced RAG Architecture**: Utilizes a sophisticated pipeline to ensure recommendations are relevant and grounded in evidence.
    *   **Retrieval**: Employs a `FAISS` vector database and the `all-MiniLM-L6-v2` sentence-transformer model to perform semantic search over the knowledge base.
    *   **Generation**: Uses Google's `gemini-1.5-flash-latest` model to synthesize the retrieved evidence into a coherent, actionable plan.
*   **Persona-Based Recommendations**: Delivers tailored advice for different audiences, fulfilling a key project bonus goal. The system can generate distinct outputs for a **teacher**, **parent**, or **principal**.
*   **Evidence-Backed**: Every recommendation is based on a curated knowledge base of best-practice documents from reputable sources like the Network for College Success, the Institute of Education Sciences, and Attendance Works.
*   **Interactive Web Application**: A user-friendly Gradio UI allows for easy interaction, example scenarios, and a secure access key system for the demo.
*   **Full Transparency**: The "Evidence Base" section in the output shows the exact source documents, page numbers, and content snippets used to generate the recommendation, along with a relevance score for each.

## 3. System Architecture

The project follows a modern RAG architecture designed for quality and scalability.

1.  **Knowledge Base Curation**: A strategic decision was made to manually curate a high-quality `knowledge_base_raw.json` file from the source documents. For this proof-of-concept, this approach ensured maximum quality for the RAG pipeline, bypassing the complexities of programmatic PDF extraction.
2.  **Data Preprocessing**: A `build_knowledge_base.py` script processes the raw JSON. It uses a semantic chunking strategy to group related concepts, creating a final `knowledge_base_final_chunks.json` file.
3.  **Vector Indexing**: During the build process, the pre-processed chunks are encoded into vector embeddings and stored in a `faiss_index.bin` file for efficient similarity search.
4.  **RAG Pipeline (At Runtime)**:
    *   The user enters a student narrative into the Gradio app.
    *   The narrative is converted into a vector embedding.
    *   FAISS performs a similarity search on the vector index to retrieve the most relevant intervention chunks.
    *   The retrieved chunks and the original narrative are formatted into a detailed prompt, tailored to the selected persona (teacher, parent, or principal).
    *   The prompt is sent to the Gemini API, which generates a synthesized recommendation.
    *   The final recommendation and its evidence base are formatted and displayed to the user.

## 4. How to Run Locally

This project uses `uv` for fast and reliable dependency management.

### Prerequisites

1.  **Python >= 3.12**
2.  **`uv` installed**:
    ```bash
    pip install uv
    ```
3.  **Environment Variables**: You must create a `.env` file in the project's root directory. The application loads secrets from this file.
    ```
    # .env
    FOT_GOOGLE_API_KEY="your_google_api_key_here"
    DEMO_PASSWORD="your_local_password" # Sets the password for your local instance of the Gradio app.
    ```

### Setup

Follow this two-step process to ensure hardware-specific dependencies like PyTorch are installed correctly.

1.  **Create the virtual environment:**
    ```bash
    uv venv
    ```
    *Activate the environment:*
    *   macOS/Linux: `source .venv/bin/activate`
    *   Windows: `.venv\Scripts\activate`

2.  **Install PyTorch Separately:**
    This command lets `uv` find the correct PyTorch version for your specific hardware (Intel Mac, Apple Silicon, Windows, Linux, etc.).
    ```bash
    uv pip install torch --index-url https://download.pytorch.org/whl/cpu
    ```
    *Note: We explicitly use the CPU-only version of PyTorch, which is perfect for this project and avoids complex CUDA dependencies.*

3.  **Install the Project:**
    Now that the difficult dependency is handled, install the application and its development tools.
    ```bash
    uv pip install -e ".[dev]"
    ```

### Running the Application

After setup, run the Gradio web application using its console script entry point.

```bash
uv run fot-recommender
```

This will launch the interactive Gradio API, which you can access in your browser.

## 5. Development

The project is configured with a suite of standard development tools for maintaining code quality.

*   **Run Tests:**
    ```bash
    uv run pytest
    ```
*   **Format Code:**
    ```bash
    uv run black .
    ```
*   **Lint Code:**
    ```bash
    uv run ruff check .
    ```
*   **Type Checking:**
    ```bash
    uv run mypy src/
    ```

## 6. Project Structure

```
.
├── app.py                  # Gradio UI and web API entry point
├── data/
│   ├── processed/          # Processed data artifacts
│   │   ├── citations.json
│   │   ├── faiss_index.bin
│   │   ├── knowledge_base_final_chunks.json
│   │   └── knowledge_base_raw.json
│   └── source_pdfs/        # Original source documents
├── docs/                     # Project planning documents
├── notebooks/                # Proof-of-concept notebook
├── pyproject.toml          # Project configuration and dependencies
├── README.md               # This file
├── scripts/
│   └── build_knowledge_base.py # Script to build data artifacts
├── src/
│   └── fot_recommender/    # Main Python package
│       ├── __init__.py
│       ├── config.py       # Configuration and environment variables
│       ├── main.py         # Main application logic
│       ├── prompts.py      # Prompts for the generative model
│       ├── rag_pipeline.py # Core RAG logic
│       └── semantic_chunker.py # Logic for chunking source data
└── tests/                    # Unit and integration tests