Spaces:
Sleeping
Sleeping
| title: Fot Recommender Api | |
| emoji: ⚡ | |
| colorFrom: green | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 5.41.0 | |
| python_version: "3.12" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: POC - Freshman On-Track RAG Intervention Recommender | |
| # Freshman On-Track (FOT) Intervention Recommender | |
| [](https://www.python.org/downloads/release/python-3120/) | |
| [](https://opensource.org/licenses/MIT) | |
| [](https://huggingface.co/spaces/chuckfinca/fot-recommender-api) | |
| This repository contains the proof-of-concept for the Freshman On-Track (FOT) Intervention Recommender, an AI-powered tool designed to empower educators. | |
| ## 🚀 Live Demo | |
| The full application is deployed as an interactive web API on Hugging Face Spaces. | |
| **[👉 Click Here to Launch the Live FOT Recommender API](https://huggingface.co/spaces/chuckfinca/fot-recommender-api)** | |
| **Note on Access:** The public demo is protected by an access key. If you would like to try the live application, please **[open a GitHub issue in this repository](https://github.com/chuckfinca/fot-intervention-recommender/issues/new)** to request access, and I will be happy to provide a key. | |
| ## 1. Project Goal | |
| Freshman year performance is the strongest predictor of high school graduation. However, educators often lack systematic tools to match at-risk 9th graders with the specific, evidence-based interventions they need. | |
| This project addresses that gap by providing a **Retrieval-Augmented Generation (RAG)** system that transforms a simple narrative about a student's challenges into a set of clear, actionable, and evidence-based recommendations. It turns scattered educational research into targeted guidance, enabling educators to support their students more effectively. | |
| ## 2. Features | |
| * **Advanced RAG Architecture**: Utilizes a sophisticated pipeline to ensure recommendations are relevant and grounded in evidence. | |
| * **Retrieval**: Employs a `FAISS` vector database and the `all-MiniLM-L6-v2` sentence-transformer model to perform semantic search over the knowledge base. | |
| * **Generation**: Uses Google's `gemini-1.5-flash-latest` model to synthesize the retrieved evidence into a coherent, actionable plan. | |
| * **Persona-Based Recommendations**: Delivers tailored advice for different audiences, fulfilling a key project bonus goal. The system can generate distinct outputs for a **teacher**, **parent**, or **principal**. | |
| * **Evidence-Backed**: Every recommendation is based on a curated knowledge base of best-practice documents from reputable sources like the Network for College Success, the Institute of Education Sciences, and Attendance Works. | |
| * **Interactive Web Application**: A user-friendly Gradio UI allows for easy interaction, example scenarios, and a secure access key system for the demo. | |
| * **Full Transparency**: The "Evidence Base" section in the output shows the exact source documents, page numbers, and content snippets used to generate the recommendation, along with a relevance score for each. | |
| ## 3. System Architecture | |
| The project follows a modern RAG architecture designed for quality and scalability. | |
| 1. **Knowledge Base Curation**: A strategic decision was made to manually curate a high-quality `knowledge_base_raw.json` file from the source documents. For this proof-of-concept, this approach ensured maximum quality for the RAG pipeline, bypassing the complexities of programmatic PDF extraction. | |
| 2. **Data Preprocessing**: A `build_knowledge_base.py` script processes the raw JSON. It uses a semantic chunking strategy to group related concepts, creating a final `knowledge_base_final_chunks.json` file. | |
| 3. **Vector Indexing**: During the build process, the pre-processed chunks are encoded into vector embeddings and stored in a `faiss_index.bin` file for efficient similarity search. | |
| 4. **RAG Pipeline (At Runtime)**: | |
| * The user enters a student narrative into the Gradio app. | |
| * The narrative is converted into a vector embedding. | |
| * FAISS performs a similarity search on the vector index to retrieve the most relevant intervention chunks. | |
| * The retrieved chunks and the original narrative are formatted into a detailed prompt, tailored to the selected persona (teacher, parent, or principal). | |
| * The prompt is sent to the Gemini API, which generates a synthesized recommendation. | |
| * The final recommendation and its evidence base are formatted and displayed to the user. | |
| ## 4. How to Run Locally | |
| This project uses `uv` for fast and reliable dependency management. | |
| ### Prerequisites | |
| 1. **Python >= 3.12** | |
| 2. **`uv` installed**: | |
| ```bash | |
| pip install uv | |
| ``` | |
| 3. **Environment Variables**: You must create a `.env` file in the project's root directory. The application loads secrets from this file. | |
| ``` | |
| # .env | |
| FOT_GOOGLE_API_KEY="your_google_api_key_here" | |
| DEMO_PASSWORD="your_local_password" # Sets the password for your local instance of the Gradio app. | |
| ``` | |
| ### Setup | |
| Follow this two-step process to ensure hardware-specific dependencies like PyTorch are installed correctly. | |
| 1. **Create the virtual environment:** | |
| ```bash | |
| uv venv | |
| ``` | |
| *Activate the environment:* | |
| * macOS/Linux: `source .venv/bin/activate` | |
| * Windows: `.venv\Scripts\activate` | |
| 2. **Install PyTorch Separately:** | |
| This command lets `uv` find the correct PyTorch version for your specific hardware (Intel Mac, Apple Silicon, Windows, Linux, etc.). | |
| ```bash | |
| uv pip install torch --index-url https://download.pytorch.org/whl/cpu | |
| ``` | |
| *Note: We explicitly use the CPU-only version of PyTorch, which is perfect for this project and avoids complex CUDA dependencies.* | |
| 3. **Install the Project:** | |
| Now that the difficult dependency is handled, install the application and its development tools. | |
| ```bash | |
| uv pip install -e ".[dev]" | |
| ``` | |
| ### Running the Application | |
| After setup, run the Gradio web application using its console script entry point. | |
| ```bash | |
| uv run fot-recommender | |
| ``` | |
| This will launch the interactive Gradio API, which you can access in your browser. | |
| ## 5. Development | |
| The project is configured with a suite of standard development tools for maintaining code quality. | |
| * **Run Tests:** | |
| ```bash | |
| uv run pytest | |
| ``` | |
| * **Format Code:** | |
| ```bash | |
| uv run black . | |
| ``` | |
| * **Lint Code:** | |
| ```bash | |
| uv run ruff check . | |
| ``` | |
| * **Type Checking:** | |
| ```bash | |
| uv run mypy src/ | |
| ``` | |
| ## 6. Project Structure | |
| ``` | |
| . | |
| ├── app.py # Gradio UI and web API entry point | |
| ├── data/ | |
| │ ├── processed/ # Processed data artifacts | |
| │ │ ├── citations.json | |
| │ │ ├── faiss_index.bin | |
| │ │ ├── knowledge_base_final_chunks.json | |
| │ │ └── knowledge_base_raw.json | |
| │ └── source_pdfs/ # Original source documents | |
| ├── docs/ # Project planning documents | |
| ├── notebooks/ # Proof-of-concept notebook | |
| ├── pyproject.toml # Project configuration and dependencies | |
| ├── README.md # This file | |
| ├── scripts/ | |
| │ └── build_knowledge_base.py # Script to build data artifacts | |
| ├── src/ | |
| │ └── fot_recommender/ # Main Python package | |
| │ ├── __init__.py | |
| │ ├── config.py # Configuration and environment variables | |
| │ ├── main.py # Main application logic | |
| │ ├── prompts.py # Prompts for the generative model | |
| │ ├── rag_pipeline.py # Core RAG logic | |
| │ └── semantic_chunker.py # Logic for chunking source data | |
| └── tests/ # Unit and integration tests | |