iris_backend / system_architecture.txt
Saandraahh's picture
Implemented clustering
4b3a33f
# IRIS Detailed System Architecture
This document provides a comprehensive look at the IRIS architecture, broken down by functional layers and individual process steps.
## Overall System Flow
This tiered diagram shows how data flows through the three main layers of the system.
```mermaid
graph TD
subgraph "1. Ingestion & Preprocessing"
UC[User/Admin] -->|Upload| SS[Supabase Storage]
SS -->|Webhook| BE[FastAPI Backend]
BE -->|Download| PC[Text Cleaning]
PC -->|Anonymize| PA[PII Removal]
end
subgraph "2. NLP Processing Layer"
PA -->|Raw Text| EX[Gemini Extraction]
EX -->|JSON| DB[(Supabase DB)]
DB -->|Text Fields| EM[BGE-M3 Embedding]
EM -->|Vectors| DB
end
subgraph "3. Matching & AI Analysis"
DB -->|Job vs Resume| MS[Semantic Matching]
MS -->|Score| MG[Skill Gap Analysis]
MG -->|Insights| AI[Gemini Analysis]
AI -->|Final Report| UI[Admin Dashboard]
end
```
---
## 1. Data Ingestion & Preprocessing
This layer ensures that incoming data is clean, secure, and ready for AI processing.
* **File Upload**: Resumes and Job Descriptions are stored securely in Supabase buckets.
* **Event Trigger**: Database Webhooks instantly notify the backend when a new file arrives.
* **Text Cleaning**: Standardizes encoding, removes special characters, and handles whitespace.
* **PII Anonymization**: Uses Regex and NLP patterns to detect and protect sensitive personal information (phone, address) before deep processing.
## 2. NLP Processing Pipeline
The "Intelligence" layer that understands the meaning behind the text.
* **Structured Extraction**: Google Gemini parses unstructured text into logical objects (Skills, Experience, Education).
* **Relational Storage**: Structured data is saved into dedicated PostgreSQL tables for rapid querying.
* **Vector Embedding**: The BGE-M3 model creates "mathematical summaries" (vectors) of the candidate's profile and the job requirements.
* **Vector Search Index**: These vectors allow the system to find matches based on *meaning* rather than just keywords (e.g., matching "Software Engineer" with "Full Stack Developer").
## 3. Matching & AI Analysis Layer
The decision-making layer that provides final value to the recruiter.
* **Semantic Scoring**: Calculates the mathematical distance between a candidate's vector and a job's vector.
* **Skill Gap Analysis**: Compares the extracted skill sets to identify exactly what is missing or where the candidate excels.
* **AI Insight Generation**: A second pass with Gemini generates a human-readable summary, custom strengths, and potential weaknesses.
* **Final Ranking**: Aggregates all scores into a prioritized list for the Admin dashboard.
## Technology Stack
| Layer | Technologies |
| :--- | :--- |
| **Frontend** | React, Vite, Framer Motion, Lucide Icons |
| **Backend** | FastAPI, Python, SQLAlchemy/Supabase-py |
| **Data** | Supabase (Postgres), pgvector, Supabase Storage |
| **AI/ML** | Google Gemini (LLM), BGE-M3 (Embeddings), Sentence Transformers |