Spaces:

chuckfinca
/

fot-recommender-api

Sleeping

chuckfinca commited on Aug 6, 2025

Commit

1550049

1 Parent(s): c32c832

feat(notebook): Add PoC notebook for RAG demonstration

This commit introduces the primary technical deliverable: a functional, interactive Proof-of-Concept (PoC) in a Google Colab notebook.

The notebook () demonstrates the functionality of the Freshman On-Track Intervention Recommender. It guides the user through the entire process:
- Setting up the local environment.
- Loading the final, semantically chunked knowledge base.
- Initializing the embedding model and FAISS vector database.
- Accepting a user's free-text query about a student's challenges.
- Performing a semantic search and displaying the top 3 recommendations.

Key Changes:
- **Added PoC Notebook**: Created to serve as the interactive demonstration of the RAG system.
- **Created Display Utilities**: Introduced a new module with a helper function. This formats the raw search results into a clean, human-readable output, making the recommendations clear and actionable within the notebook.
- **Updated Dependencies**: Added and to the dependencies in to ensure the project's development environment fully supports working with the notebook.

Files changed (5) hide show

notebooks/fot_recommender_poc.ipynb +373 -0
pyproject.toml +2 -0
requirements.txt +141 -0
src/fot_recommender/utils.py +26 -0
uv.lock +0 -0

notebooks/fot_recommender_poc.ipynb ADDED Viewed

	@@ -0,0 +1,373 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "944d2724-5cbb-4f2d-80f1-4deec31e4058",
+   "metadata": {},
+   "source": [
+    "# Freshman On-Track (FOT) Intervention Recommender\n",
+    "### A Standalone Proof-of-Concept\n",
+    "\n",
+    "This notebook demonstrates a working PoC for an AI-powered intervention recommender.\n",
+    "\n",
+    "**This notebook is designed to run in Google Colab.** It contains all the code needed to set up its environment, download the project from GitHub, and run the demonstration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4be1b92-95cc-421f-9820-9ccfc261aaeb",
+   "metadata": {},
+   "source": [
+    "## 1. Universal Setup\n",
+    "\n",
+    "This cell is the \"magic\" that prepares the entire environment. It intelligently detects where it's running and performs the correct setup automatically.\n",
+    "\n",
+    "Here's what happens when you run the next cell:\n",
+    "1.  **Define Project Source**: We specify the official GitHub repository for this project so it's clear where the code comes from.\n",
+    "2.  **Detect Environment**: The notebook checks if it's running inside the local project folder or as a standalone file.\n",
+    "3.  **Prepare Environment**: A helper script is called to do the heavy lifting:\n",
+    "    - If **local**, it uses your existing project files.\n",
+    "    - If **standalone**, it clones the repository and installs all dependencies for you.\n",
+    "\n",
+    "After running this one cell, the environment will be ready for the demonstration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "1f286cf0-3355-48ff-ade7-43a035db38ea",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "🚀 Setting up LOCAL development environment...\n",
+      "  - Using local project root: /Users/charlesfeinn/Developer/job_applications/fot-intervention-recommender\n",
+      "\n",
+      "🎉 Local environment is ready!\n"
+     ]
+    }
+   ],
+   "source": [
+    "import sys\n",
+    "from pathlib import Path\n",
+    "\n",
+    "# --- Define Project Source ---\n",
+    "REPO_URL = \"https://github.com/chuckfinca/fot-intervention-recommender.git\"\n",
+    "PROJECT_DIR_NAME = \"fot-intervention-recommender\"\n",
+    "\n",
+    "# print(\"🚀 Setting up the environment...\")\n",
+    "\n",
+    "# # --- Clone the Repository & Install Dependencies ---\n",
+    "# !git clone -q {REPO_URL}\n",
+    "# %pip install -q -r {PROJECT_DIR_NAME}/requirements.txt\n",
+    "\n",
+    "# # --- Configure Python Path ---\n",
+    "# project_path = Path.cwd() / PROJECT_DIR_NAME\n",
+    "# src_path = project_path / \"src\"\n",
+    "# sys.path.insert(0, str(src_path))\n",
+    "\n",
+    "# print(\"\\n🎉 Environment is ready!\")\n",
+    "\n",
+    "\n",
+    "print(\"🚀 Setting up LOCAL development environment...\")\n",
+    "\n",
+    "# We assume the notebook is in 'notebooks/'. The project root is one level up.\n",
+    "project_path = Path.cwd().parent \n",
+    "\n",
+    "# Configure Python Path to use the local 'src' directory\n",
+    "src_path = project_path / \"src\"\n",
+    "if str(src_path) not in sys.path:\n",
+    "    sys.path.insert(0, str(src_path))\n",
+    "\n",
+    "print(f\"  - Using local project root: {project_path}\")\n",
+    "print(\"\\n🎉 Local environment is ready!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9b1ad1b-1c20-4eca-b98f-179ad80dc942",
+   "metadata": {},
+   "source": [
+    "## 2. Load the Knowledge Base\n",
+    "\n",
+    "With the environment bootstrapped, we can now import our project's modules and load the data. The `project_path` variable ensures we find the file correctly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "4143ee4b-c9f3-4d18-9d5b-0ee247937961",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/charlesfeinn/Developer/job_applications/fot-intervention-recommender/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Successfully loaded 27 intervention chunks.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'title': 'Strategy: Leadership Roles',\n",
+       " 'source_document': 'NCS_OTToolkit_2ndEd_October_2017_updated.pdf',\n",
+       " 'fot_pages': 'Pages: 44',\n",
+       " 'content_for_embedding': 'Title: Strategy: Leadership Roles. Content: Principal Role:\\n• Implementation: Reviews and interrogates interim freshman success-related data in light of Success Team goals, and strategizes with team leadership around next steps',\n",
+       " 'original_content': 'Principal Role:\\n• Implementation: Reviews and interrogates interim freshman success-related data in light of Success Team goals, and strategizes with team leadership around next steps'}"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Import the functions from our custom Python package (now in the path)\n",
+    "from fot_recommender.rag_pipeline import (\n",
+    "    load_knowledge_base,\n",
+    "    initialize_embedding_model,\n",
+    "    create_embeddings,\n",
+    "    create_vector_db,\n",
+    "    search_interventions,\n",
+    ")\n",
+    "\n",
+    "# Build the path to the knowledge base using the universal project_path variable\n",
+    "kb_path = project_path / \"data\" / \"processed\" / \"knowledge_base_final_chunks.json\"\n",
+    "\n",
+    "# Load the knowledge base\n",
+    "knowledge_base_chunks = load_knowledge_base(str(kb_path))\n",
+    "\n",
+    "print(f\"Successfully loaded {len(knowledge_base_chunks)} intervention chunks.\")\n",
+    "knowledge_base_chunks[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "0d3e673f-17db-4308-991a-5f5b12ffb104",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Initializing embedding model: all-MiniLM-L6-v2...\n",
+      "Model initialized successfully.\n",
+      "Creating embeddings for 27 chunks...\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Batches:   0%|                                                                                                                                                                                                        | 0/1 [00:00<?, ?it/s]/Users/charlesfeinn/Developer/job_applications/fot-intervention-recommender/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1520: FutureWarning: `encoder_attention_mask` is deprecated and will be removed in version 4.55.0 for `BertSdpaSelfAttention.forward`.\n",
+      "  return forward_call(*args, **kwargs)\n",
+      "Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.03s/it]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Embeddings created successfully.\n",
+      "Creating FAISS index with dimension 384...\n",
+      "FAISS index created with 27 vectors.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# --- Build the RAG Pipeline Components ---\n",
+    "#\n",
+    "# Now, we will initialize the core components of our RAG pipeline.\n",
+    "# 1. Embedding Model: We'll load the model that converts text into vectors.\n",
+    "# 2. Vector Embeddings: We'll encode all our knowledge base chunks.\n",
+    "# 3. Vector Database: We'll create a FAISS index for fast searching.\n",
+    "#\n",
+    "# These components will be stored in variables for the rest of the notebook to use.\n",
+    "\n",
+    "# 1. Initialize the embedding model\n",
+    "embedding_model = initialize_embedding_model()\n",
+    "\n",
+    "# 2. Create vector embeddings for the knowledge base\n",
+    "embeddings = create_embeddings(knowledge_base_chunks, embedding_model)\n",
+    "\n",
+    "# 3. Set up the FAISS vector database\n",
+    "vector_db = create_vector_db(embeddings)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c906f11b-9363-4181-91f4-9cb899630caa",
+   "metadata": {},
+   "source": [
+    "## 5. Try It Yourself: Enter Your Own Query\n",
+    "\n",
+    "Now it's your turn. The system is ready to accept a new query.\n",
+    "\n",
+    "Describe the challenges of a hypothetical student in the text box below. For example, you could try:\n",
+    "- \"A student is missing a lot of school and their grades are suffering.\"\n",
+    "- \"This freshman has good attendance but is failing math and science and seems disengaged.\"\n",
+    "- \"A student has multiple behavior incidents and is struggling to connect with teachers.\"\n",
+    "\n",
+    "The RAG system will perform a new semantic search and return the top 3 interventions from the knowledge base that best match your description."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "997c358c-3c9c-486e-88f8-1c032a2ed146",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      "Enter a description of a student's challenges:  asdf\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "🔍 Searching for interventions based on your query...\n",
+      "\n",
+      "Searching for top 3 interventions for query: 'asdf...'\n",
+      "Found 0 relevant interventions.\n",
+      "\n",
+      "No relevant interventions were found for this query.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from fot_recommender.utils import display_recommendations\n",
+    "\n",
+    "# Prompt the user to enter their own query\n",
+    "user_query = input(\"Enter a description of a student's challenges: \")\n",
+    "\n",
+    "if user_query:\n",
+    "    print(\"\\n🔍 Searching for interventions based on your query...\")\n",
+    "    \n",
+    "    # Perform a new search using the user's input\n",
+    "    custom_recommendations = search_interventions(\n",
+    "        query=user_query,\n",
+    "        model=embedding_model,\n",
+    "        index=vector_db,\n",
+    "        knowledge_base=knowledge_base_chunks,\n",
+    "        k=3,\n",
+    "    )\n",
+    "    \n",
+    "    # Display the new results using our helper function\n",
+    "    display_recommendations(custom_recommendations)\n",
+    "    \n",
+    "else:\n",
+    "    print(\"\\nNo query entered. Skipping custom search.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "142c44e7-b75b-46c7-9267-996e44054529",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7977b72-30a8-4146-b420-d0adb824ab99",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b6c8adb-df2f-4b7b-b09a-88058d0cd785",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "665cb647-97da-441f-81b0-ae7b908fdd2f",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "92729636-bd91-4b0c-ac35-5ed82797a1f2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import shutil\n",
+    "from pathlib import Path\n",
+    "\n",
+    "# The path to the project directory we created at the start\n",
+    "project_path_to_clean = Path.cwd() / \"fot-recommender-poc-workspace\"\n",
+    "\n",
+    "if project_path_to_clean.exists():\n",
+    "    print(f\"The project directory '{project_path_to_clean}' was found.\")\n",
+    "    \n",
+    "    # Ask for user confirmation before deleting anything\n",
+    "    response = input(\"Would you like to delete the git repository folder that was downloaded during the running of this notebook? (y/n): \")\n",
+    "    \n",
+    "    if response.lower().strip() == 'y':\n",
+    "        try:\n",
+    "            shutil.rmtree(project_path_to_clean)\n",
+    "            print(f\"✅ Successfully deleted '{project_path_to_clean}'.\")\n",
+    "        except OSError as e:\n",
+    "            print(f\"Error: {e.strerror}. Could not delete the directory.\")\n",
+    "    else:\n",
+    "        print(\"Cleanup skipped.\")\n",
+    "else:\n",
+    "    print(\"Project directory not found. Nothing to clean up.\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "085de4e9-e7d4-4c87-892b-711765a7d8a1",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

pyproject.toml CHANGED Viewed

@@ -34,6 +34,8 @@ dev = [
     "mypy>=1.16.1",
     "pytest>=8.4.1",
     "ruff>=0.12.2",
 ]
 [tool.setuptools.packages.find]

     "mypy>=1.16.1",
     "pytest>=8.4.1",
     "ruff>=0.12.2",
+    "jupyterlab>=4.0",
+    "notebook>=7.0"
 ]
 [tool.setuptools.packages.find]

requirements.txt ADDED Viewed

	@@ -0,0 +1,141 @@

+annotated-types==0.7.0
+anyio==4.10.0
+appnope==0.1.4
+argon2-cffi==25.1.0
+argon2-cffi-bindings==25.1.0
+arrow==1.3.0
+asttokens==3.0.0
+async-lru==2.0.5
+attrs==25.3.0
+babel==2.17.0
+beautifulsoup4==4.13.4
+black==25.1.0
+bleach==6.2.0
+certifi==2025.8.3
+cffi==1.17.1
+charset-normalizer==3.4.2
+click==8.2.1
+comm==0.2.3
+debugpy==1.8.15
+decorator==5.2.1
+defusedxml==0.7.1
+executing==2.2.0
+faiss-cpu==1.11.0.post1
+fastjsonschema==2.21.1
+filelock==3.18.0
+-e file:///Users/charlesfeinn/Developer/job_applications/fot-intervention-recommender
+fqdn==1.5.1
+fsspec==2025.7.0
+greenlet==3.2.3
+h11==0.16.0
+hf-xet==1.1.5
+httpcore==1.0.9
+httpx==0.28.1
+huggingface-hub==0.34.3
+idna==3.10
+iniconfig==2.1.0
+ipykernel==6.30.1
+ipython==9.4.0
+ipython-pygments-lexers==1.1.1
+isoduration==20.11.0
+jedi==0.19.2
+jinja2==3.1.6
+joblib==1.5.1
+json5==0.12.0
+jsonpatch==1.33
+jsonpointer==3.0.0
+jsonschema==4.25.0
+jsonschema-specifications==2025.4.1
+jupyter-client==8.6.3
+jupyter-core==5.8.1
+jupyter-events==0.12.0
+jupyter-lsp==2.2.6
+jupyter-server==2.16.0
+jupyter-server-terminals==0.5.3
+jupyterlab==4.4.5
+jupyterlab-pygments==0.3.0
+jupyterlab-server==2.27.3
+langchain==0.3.27
+langchain-core==0.3.72
+langchain-text-splitters==0.3.9
+langsmith==0.4.10
+lark==1.2.2
+markupsafe==3.0.2
+matplotlib-inline==0.1.7
+mistune==3.1.3
+mpmath==1.3.0
+mypy==1.17.1
+mypy-extensions==1.1.0
+nbclient==0.10.2
+nbconvert==7.16.6
+nbformat==5.10.4
+nest-asyncio==1.6.0
+networkx==3.5
+notebook==7.4.5
+notebook-shim==0.2.4
+numpy==1.26.4
+orjson==3.11.1
+overrides==7.7.0
+packaging==25.0
+pandocfilters==1.5.1
+parso==0.8.4
+pathspec==0.12.1
+pexpect==4.9.0
+pillow==11.3.0
+platformdirs==4.3.8
+pluggy==1.6.0
+prometheus-client==0.22.1
+prompt-toolkit==3.0.51
+psutil==7.0.0
+ptyprocess==0.7.0
+pure-eval==0.2.3
+pycparser==2.22
+pydantic==2.11.7
+pydantic-core==2.33.2
+pygments==2.19.2
+pytest==8.4.1
+python-dateutil==2.9.0.post0
+python-json-logger==3.3.0
+pyyaml==6.0.2
+pyzmq==27.0.1
+referencing==0.36.2
+regex==2025.7.34
+requests==2.32.4
+requests-toolbelt==1.0.0
+rfc3339-validator==0.1.4
+rfc3986-validator==0.1.1
+rfc3987-syntax==1.1.0
+rpds-py==0.26.0
+ruff==0.12.7
+safetensors==0.5.3
+scikit-learn==1.7.1
+scipy==1.16.1
+send2trash==1.8.3
+sentence-transformers==5.0.0
+setuptools==80.9.0
+six==1.17.0
+sniffio==1.3.1
+soupsieve==2.7
+sqlalchemy==2.0.42
+stack-data==0.6.3
+sympy==1.14.0
+tenacity==9.1.2
+terminado==0.18.1
+threadpoolctl==3.6.0
+tinycss2==1.4.0
+tokenizers==0.21.4
+torch==2.2.2
+tornado==6.5.1
+tqdm==4.67.1
+traitlets==5.14.3
+transformers==4.54.1
+types-python-dateutil==2.9.0.20250708
+typing-extensions==4.14.1
+typing-inspection==0.4.1
+uri-template==1.3.0
+urllib3==2.5.0
+wcwidth==0.2.13
+webcolors==24.11.1
+webencodings==0.5.1
+websocket-client==1.8.0
+zstandard==0.23.0

src/fot_recommender/utils.py ADDED Viewed

	@@ -0,0 +1,26 @@

+from typing import List, Dict, Any, Tuple
+def display_recommendations(results: List[Tuple[Dict[str, Any], float]]):
+    """
+    A helper function to neatly print the results of a semantic search.
+    This function is designed to be called from a notebook or a command-line script.
+    Args:
+        results: A list of tuples, where each tuple contains a result chunk (dict)
+                 and its similarity score (float).
+    """
+    if not results:
+        print("\nNo relevant interventions were found for this query.")
+        return
+    print("\n--- Top Recommended Interventions ---")
+    for i, (chunk, score) in enumerate(results):
+        print(f"\n--- Recommendation {i + 1} (Similarity Score: {score:.4f}) ---")
+        print(f"  Title: {chunk['title']}")
+        print(f"  Source: {chunk['source_document']} ({chunk['fot_pages']})")
+        # Indent the content for better readability
+        content = chunk['original_content']
+        indented_content = "\n  ".join(content.splitlines())
+        print(f"  \n  Content Snippet:\n  \"{indented_content[:500]}...\"")
+        print("-" * 50)

uv.lock CHANGED Viewed

The diff for this file is too large to render. See raw diff