Resume Analysis and Matching System 📄✨

A sophisticated resume analysis and matching system that uses RAG (Retrieval Augmented Generation) to match resumes with job descriptions intelligently.

🌟 Features

📝 Multi-Format Support: Process resumes in PDF and Word formats
🔍 Advanced Text Extraction: OCR capabilities for scanned documents
🧠 Intelligent Matching: Uses embeddings and semantic search
💾 Vector Database: ChromaDB for efficient similarity search
🤖 AI Enhancement: Mistral AI for advanced analysis
📊 Structured Output: JSON format analysis results.

🏗️ Project Structure

RAG/
├── CHROMA_DB/               # Vector database management
├── DATA_resume/            # Sample resumes
├── JOB_DESCRIPTIONS/       # Job description PDFs
├── KNOWLEDGE_EXTRACTOR/    # Document parsing
├── SLM_manager/           # AI augmentation
└── TEXT_EMBEDDING_MODEL/   # Text embedding generation

🚀 Getting Started

The system operates in two modes:

Basic Mode: Resume matching using vector similarity (always available)
Enhanced Mode: AI-powered analysis using Mistral (requires Ollama setup)

Prerequisites

Python 3.10 or higher
Virtual environment
Tesseract OCR (for scanned documents)
Ollama with Mistral AI model (for enhanced analysis)

Installation

Basic Setup

Python Environment Setup:

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Tesseract OCR (Optional - for scanned documents):
- macOS: brew install tesseract
- Linux: sudo apt-get install tesseract-ocr
- Windows: Download installer from GitHub

Mistral AI Setup (Required for Enhanced Analysis)

Install Ollama:
- macOS/Linux:
```
curl https://ollama.ai/install.sh | sh
```
- Windows: Download from Ollama's website
Pull Mistral Model:
```
ollama pull mistral
```

Verify Installation:

ollama run mistral "Hello, testing Mistral AI"

⚠️ Important Note: The enhanced analysis features require Mistral AI through Ollama. If you don't have Mistral AI set up:

Basic resume matching will still work
AI-enhanced analysis features will be disabled
You can still use the system with reduced functionality

Installation

Clone the repository:

git clone https://github.com/deepanmpc/ResumeAnalyse_RAG-Architecture.git
cd RAG

Create and activate virtual environment:

# Resume Analysis and Matching System 📄✨

A sophisticated resume analysis and matching system that uses RAG (Retrieval Augmented Generation) to match resumes with job descriptions intelligently.

## 🌟 Features

- 📝 **Multi-Format Support**: Process resumes in PDF and Word formats.
- 🔍 **Advanced Text Extraction**: OCR capabilities for scanned documents.
- 🧠 **Intelligent Matching**: Uses embeddings and semantic search to find the best candidates.
- 💾 **Vector Database**: ChromaDB for efficient similarity search and storage.
- 🤖 **AI Enhancement**: Mistral AI for advanced analysis and summarization.
- 📊 **Structured Output**: JSON format for analysis results.
- 🖥️ **Interactive Web UI**: A React-based frontend for a user-friendly experience.

## 🖥️ Web Frontend

The project includes a modern and interactive web-based user interface built with React, TypeScript, and Vite.

### Frontend Features

-   **Resume Matching Dashboard**: Upload a job description and see the top matching resumes.
-   **Detailed Match View**: For each matched resume, view details like:
    -   Resume file name.
    -   The section that matched best (e.g., "experience", "skills").
    -   A similarity score.
    -   The relevant text from the resume that matched the job description.
-   **AI Summary Display**: Shows an AI-generated summary of the top matches. It gracefully handles and displays errors if the summary generation fails (e.g., if the AI model is not available).
-   **User-Friendly Interface**: Built with modern UI components for a smooth experience.

## 🚀 Getting Started

### Prerequisites

- Python 3.10 or higher
- Node.js and npm (or yarn/pnpm)
- Tesseract OCR (for scanned documents)
- Ollama with Mistral AI model (for enhanced analysis)

### Installation

1.  **Clone the repository**:
    ```bash
    git clone <repository-url>
    cd RAG
    ```

2.  **Backend Setup**:
    ```bash
    # Create and activate virtual environment
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
    # Install Python dependencies
    pip install -r requirements.txt
    ```

3.  **Frontend Setup**:
    ```bash
    # Navigate to the web directory
    cd web
    
    # Install Node.js dependencies
    npm install
    ```

4.  **Tesseract OCR** (Optional - for scanned documents):
    -   macOS: `brew install tesseract`
    -   Linux: `sudo apt-get install tesseract-ocr`
    -   Windows: Download installer from GitHub

5.  **Mistral AI Setup** (Optional - for enhanced analysis):
    -   [Install Ollama](https://ollama.ai)
    -   Pull the Mistral model: `ollama pull mistral`

## 🎯 Usage

To run the application, you need to start both the backend server and the frontend development server.

1.  **Start the Backend Server**:
    From the project root directory (`RAG/`):
    ```bash
    uvicorn api:app --reload
    ```
    The API will be available at `http://127.0.0.1:8000`.

2.  **Start the Frontend Server**:
    In a new terminal, navigate to the `web/` directory:
    ```bash
    cd web
    npm run dev
    ```
    The web application will be available at `http://localhost:5173` (or another port if 5173 is busy).

3.  **Using the Application**:
    -   Open your browser to the frontend URL.
    -   Use the dashboard to upload a job description and see the matching resumes.

### Command-Line Usage (Alternative)

You can also use the system from the command line for indexing and matching.

1.  **Index Resumes**:
    ```bash
    python main.py --index DATA_resume/
    ```

2.  **Match with Job Description**:
    ```bash
    python main.py --job JOB_DESCRIPTIONS/job.pdf -n 5
    ```

## 🔧 Components

-   **Backend**: FastAPI, ChromaDB, SentenceTransformers
-   **Frontend**: React, TypeScript, Vite, Tailwind CSS, shadcn/ui
-   **AI**: Ollama, Mistral

---
Built with ❤️ for making recruitment smarter

Install dependencies:

pip install -r requirements.txt

Install Tesseract OCR (optional, for scanned documents):

macOS: brew install tesseract
Linux: sudo apt-get install tesseract-ocr
Windows: Download installer from GitHub

🎯 Usage

Index Resumes:

python main.py --index DATA_resume/

Match with Job Description:

python main.py --job JOB_DESCRIPTIONS/job.pdf -n 5

Direct Query Search:

python main.py --query "python developer with 5 years experience" -n 3

🔧 Components

Note: AI Enhancement features require Mistral AI setup. Other components work independently.

1. Knowledge Extraction (350+ lines)

PDF Parser: Advanced text extraction with OCR support
Word Parser: Microsoft Word document processing
Universal Parser: Common interface for all document types

2. Vector Database (170+ lines)

ChromaDB integration
Efficient similarity search
Section-level matching

3. Text Embeddings (80+ lines)

SentenceTransformer models
Section-wise embeddings
Metadata handling

4. AI Enhancement (40+ lines)

Mistral AI integration
Enhanced analysis
Match summarization

5. Core Application (300+ lines)

Command line interface
Batch processing
Results export

📊 Output Format

The system generates detailed JSON analysis:

{
    "rank": 1,
    "id": "resume_123",
    "filename": "candidate.pdf",
    "similarity": 0.89,
    "sections": {
        "experience": 0.92,
        "skills": 0.85,
        "education": 0.78
    }
}

📈 Performance

Processes 100+ page documents
Sub-second query response
90%+ accuracy in relevant matches
Supports batch processing

🤝 Contributing

Fork the repository
Create your feature branch
Commit your changes
Push to the branch
Create a new Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Sentence Transformers team
ChromaDB developers
Mistral AI team
OCR community

Built with ❤️ for making recruitment smarter

Downloads last month: -; Downloads are not tracked for this model. How to track