Spaces:

sameer2026
/

iris_backend

Sleeping

File size: 3,118 Bytes

4b3a33f

# IRIS Detailed System Architecture

This document provides a comprehensive look at the IRIS architecture, broken down by functional layers and individual process steps.

## Overall System Flow

This tiered diagram shows how data flows through the three main layers of the system.

```mermaid
graph TD
    subgraph "1. Ingestion & Preprocessing"
        UC[User/Admin] -->|Upload| SS[Supabase Storage]
        SS -->|Webhook| BE[FastAPI Backend]
        BE -->|Download| PC[Text Cleaning]
        PC -->|Anonymize| PA[PII Removal]
    end

    subgraph "2. NLP Processing Layer"
        PA -->|Raw Text| EX[Gemini Extraction]
        EX -->|JSON| DB[(Supabase DB)]
        DB -->|Text Fields| EM[BGE-M3 Embedding]
        EM -->|Vectors| DB
    end

    subgraph "3. Matching & AI Analysis"
        DB -->|Job vs Resume| MS[Semantic Matching]
        MS -->|Score| MG[Skill Gap Analysis]
        MG -->|Insights| AI[Gemini Analysis]
        AI -->|Final Report| UI[Admin Dashboard]
    end
```

---

## 1. Data Ingestion & Preprocessing
This layer ensures that incoming data is clean, secure, and ready for AI processing.

*   **File Upload**: Resumes and Job Descriptions are stored securely in Supabase buckets.
*   **Event Trigger**: Database Webhooks instantly notify the backend when a new file arrives.
*   **Text Cleaning**: Standardizes encoding, removes special characters, and handles whitespace.
*   **PII Anonymization**: Uses Regex and NLP patterns to detect and protect sensitive personal information (phone, address) before deep processing.

## 2. NLP Processing Pipeline
The "Intelligence" layer that understands the meaning behind the text.

*   **Structured Extraction**: Google Gemini parses unstructured text into logical objects (Skills, Experience, Education).
*   **Relational Storage**: Structured data is saved into dedicated PostgreSQL tables for rapid querying.
*   **Vector Embedding**: The BGE-M3 model creates "mathematical summaries" (vectors) of the candidate's profile and the job requirements.
*   **Vector Search Index**: These vectors allow the system to find matches based on *meaning* rather than just keywords (e.g., matching "Software Engineer" with "Full Stack Developer").

## 3. Matching & AI Analysis Layer
The decision-making layer that provides final value to the recruiter.

*   **Semantic Scoring**: Calculates the mathematical distance between a candidate's vector and a job's vector.
*   **Skill Gap Analysis**: Compares the extracted skill sets to identify exactly what is missing or where the candidate excels.
*   **AI Insight Generation**: A second pass with Gemini generates a human-readable summary, custom strengths, and potential weaknesses.
*   **Final Ranking**: Aggregates all scores into a prioritized list for the Admin dashboard.

## Technology Stack

| Layer | Technologies |
| :--- | :--- |
| **Frontend** | React, Vite, Framer Motion, Lucide Icons |
| **Backend** | FastAPI, Python, SQLAlchemy/Supabase-py |
| **Data** | Supabase (Postgres), pgvector, Supabase Storage |
| **AI/ML** | Google Gemini (LLM), BGE-M3 (Embeddings), Sentence Transformers |