Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
title: Fot Recommender Api
emoji: ⚡
colorFrom: green
colorTo: pink
sdk: gradio
sdk_version: 5.41.0
python_version: '3.12'
app_file: app.py
pinned: false
license: mit
short_description: POC - Freshman On-Track RAG Intervention Recommender
Freshman On-Track (FOT) Intervention Recommender
This repository contains the proof-of-concept for the Freshman On-Track (FOT) Intervention Recommender, an AI-powered tool designed to empower educators.
🚀 Live Demo
The full application is deployed as an interactive web API on Hugging Face Spaces.
👉 Click Here to Launch the Live FOT Recommender API
Note on Access: The public demo is protected by an access key. If you would like to try the live application, please open a GitHub issue in this repository to request access, and I will be happy to provide a key.
1. Project Goal
Freshman year performance is the strongest predictor of high school graduation. However, educators often lack systematic tools to match at-risk 9th graders with the specific, evidence-based interventions they need.
This project addresses that gap by providing a Retrieval-Augmented Generation (RAG) system that transforms a simple narrative about a student's challenges into a set of clear, actionable, and evidence-based recommendations. It turns scattered educational research into targeted guidance, enabling educators to support their students more effectively.
2. Features
- Advanced RAG Architecture: Utilizes a sophisticated pipeline to ensure recommendations are relevant and grounded in evidence.
- Retrieval: Employs a
FAISSvector database and theall-MiniLM-L6-v2sentence-transformer model to perform semantic search over the knowledge base. - Generation: Uses Google's
gemini-1.5-flash-latestmodel to synthesize the retrieved evidence into a coherent, actionable plan.
- Retrieval: Employs a
- Persona-Based Recommendations: Delivers tailored advice for different audiences, fulfilling a key project bonus goal. The system can generate distinct outputs for a teacher, parent, or principal.
- Evidence-Backed: Every recommendation is based on a curated knowledge base of best-practice documents from reputable sources like the Network for College Success, the Institute of Education Sciences, and Attendance Works.
- Interactive Web Application: A user-friendly Gradio UI allows for easy interaction, example scenarios, and a secure access key system for the demo.
- Full Transparency: The "Evidence Base" section in the output shows the exact source documents, page numbers, and content snippets used to generate the recommendation, along with a relevance score for each.
3. System Architecture
The project follows a modern RAG architecture designed for quality and scalability.
- Knowledge Base Curation: A strategic decision was made to manually curate a high-quality
knowledge_base_raw.jsonfile from the source documents. For this proof-of-concept, this approach ensured maximum quality for the RAG pipeline, bypassing the complexities of programmatic PDF extraction. - Data Preprocessing: A
build_knowledge_base.pyscript processes the raw JSON. It uses a semantic chunking strategy to group related concepts, creating a finalknowledge_base_final_chunks.jsonfile. - Vector Indexing: During the build process, the pre-processed chunks are encoded into vector embeddings and stored in a
faiss_index.binfile for efficient similarity search. - RAG Pipeline (At Runtime):
- The user enters a student narrative into the Gradio app.
- The narrative is converted into a vector embedding.
- FAISS performs a similarity search on the vector index to retrieve the most relevant intervention chunks.
- The retrieved chunks and the original narrative are formatted into a detailed prompt, tailored to the selected persona (teacher, parent, or principal).
- The prompt is sent to the Gemini API, which generates a synthesized recommendation.
- The final recommendation and its evidence base are formatted and displayed to the user.
4. How to Run Locally
This project uses uv for fast and reliable dependency management.
Prerequisites
- Python >= 3.12
uvinstalled:pip install uv- Environment Variables: You must create a
.envfile in the project's root directory. The application loads secrets from this file.# .env FOT_GOOGLE_API_KEY="your_google_api_key_here" DEMO_PASSWORD="your_local_password" # Sets the password for your local instance of the Gradio app.
Setup
Follow this two-step process to ensure hardware-specific dependencies like PyTorch are installed correctly.
Create the virtual environment:
uv venvActivate the environment:
- macOS/Linux:
source .venv/bin/activate - Windows:
.venv\Scripts\activate
- macOS/Linux:
Install PyTorch Separately: This command lets
uvfind the correct PyTorch version for your specific hardware (Intel Mac, Apple Silicon, Windows, Linux, etc.).uv pip install torch --index-url https://download.pytorch.org/whl/cpuNote: We explicitly use the CPU-only version of PyTorch, which is perfect for this project and avoids complex CUDA dependencies.
Install the Project: Now that the difficult dependency is handled, install the application and its development tools.
uv pip install -e ".[dev]"
Running the Application
After setup, run the Gradio web application using its console script entry point.
uv run fot-recommender
This will launch the interactive Gradio API, which you can access in your browser.
5. Development
The project is configured with a suite of standard development tools for maintaining code quality.
- Run Tests:
uv run pytest - Format Code:
uv run black . - Lint Code:
uv run ruff check . - Type Checking:
uv run mypy src/
6. Project Structure
.
├── app.py # Gradio UI and web API entry point
├── data/
│ ├── processed/ # Processed data artifacts
│ │ ├── citations.json
│ │ ├── faiss_index.bin
│ │ ├── knowledge_base_final_chunks.json
│ │ └── knowledge_base_raw.json
│ └── source_pdfs/ # Original source documents
├── docs/ # Project planning documents
├── notebooks/ # Proof-of-concept notebook
├── pyproject.toml # Project configuration and dependencies
├── README.md # This file
├── scripts/
│ └── build_knowledge_base.py # Script to build data artifacts
├── src/
│ └── fot_recommender/ # Main Python package
│ ├── __init__.py
│ ├── config.py # Configuration and environment variables
│ ├── main.py # Main application logic
│ ├── prompts.py # Prompts for the generative model
│ ├── rag_pipeline.py # Core RAG logic
│ └── semantic_chunker.py # Logic for chunking source data
└── tests/ # Unit and integration tests