# IRIS Detailed System Architecture This document provides a comprehensive look at the IRIS architecture, broken down by functional layers and individual process steps. ## Overall System Flow This tiered diagram shows how data flows through the three main layers of the system. ```mermaid graph TD subgraph "1. Ingestion & Preprocessing" UC[User/Admin] -->|Upload| SS[Supabase Storage] SS -->|Webhook| BE[FastAPI Backend] BE -->|Download| PC[Text Cleaning] PC -->|Anonymize| PA[PII Removal] end subgraph "2. NLP Processing Layer" PA -->|Raw Text| EX[Gemini Extraction] EX -->|JSON| DB[(Supabase DB)] DB -->|Text Fields| EM[BGE-M3 Embedding] EM -->|Vectors| DB end subgraph "3. Matching & AI Analysis" DB -->|Job vs Resume| MS[Semantic Matching] MS -->|Score| MG[Skill Gap Analysis] MG -->|Insights| AI[Gemini Analysis] AI -->|Final Report| UI[Admin Dashboard] end ``` --- ## 1. Data Ingestion & Preprocessing This layer ensures that incoming data is clean, secure, and ready for AI processing. * **File Upload**: Resumes and Job Descriptions are stored securely in Supabase buckets. * **Event Trigger**: Database Webhooks instantly notify the backend when a new file arrives. * **Text Cleaning**: Standardizes encoding, removes special characters, and handles whitespace. * **PII Anonymization**: Uses Regex and NLP patterns to detect and protect sensitive personal information (phone, address) before deep processing. ## 2. NLP Processing Pipeline The "Intelligence" layer that understands the meaning behind the text. * **Structured Extraction**: Google Gemini parses unstructured text into logical objects (Skills, Experience, Education). * **Relational Storage**: Structured data is saved into dedicated PostgreSQL tables for rapid querying. * **Vector Embedding**: The BGE-M3 model creates "mathematical summaries" (vectors) of the candidate's profile and the job requirements. * **Vector Search Index**: These vectors allow the system to find matches based on *meaning* rather than just keywords (e.g., matching "Software Engineer" with "Full Stack Developer"). ## 3. Matching & AI Analysis Layer The decision-making layer that provides final value to the recruiter. * **Semantic Scoring**: Calculates the mathematical distance between a candidate's vector and a job's vector. * **Skill Gap Analysis**: Compares the extracted skill sets to identify exactly what is missing or where the candidate excels. * **AI Insight Generation**: A second pass with Gemini generates a human-readable summary, custom strengths, and potential weaknesses. * **Final Ranking**: Aggregates all scores into a prioritized list for the Admin dashboard. ## Technology Stack | Layer | Technologies | | :--- | :--- | | **Frontend** | React, Vite, Framer Motion, Lucide Icons | | **Backend** | FastAPI, Python, SQLAlchemy/Supabase-py | | **Data** | Supabase (Postgres), pgvector, Supabase Storage | | **AI/ML** | Google Gemini (LLM), BGE-M3 (Embeddings), Sentence Transformers |