Resume Analysis and Matching System πŸ“„βœ¨

A sophisticated resume analysis and matching system that uses RAG (Retrieval Augmented Generation) to match resumes with job descriptions intelligently.

🌟 Features

  • πŸ“ Multi-Format Support: Process resumes in PDF and Word formats
  • πŸ” Advanced Text Extraction: OCR capabilities for scanned documents
  • 🧠 Intelligent Matching: Uses embeddings and semantic search
  • πŸ’Ύ Vector Database: ChromaDB for efficient similarity search
  • πŸ€– AI Enhancement: Mistral AI for advanced analysis
  • πŸ“Š Structured Output: JSON format analysis results.

πŸ—οΈ Project Structure

RAG/
β”œβ”€β”€ CHROMA_DB/               # Vector database management
β”œβ”€β”€ DATA_resume/            # Sample resumes
β”œβ”€β”€ JOB_DESCRIPTIONS/       # Job description PDFs
β”œβ”€β”€ KNOWLEDGE_EXTRACTOR/    # Document parsing
β”œβ”€β”€ SLM_manager/           # AI augmentation
└── TEXT_EMBEDDING_MODEL/   # Text embedding generation

πŸš€ Getting Started

The system operates in two modes:

  1. Basic Mode: Resume matching using vector similarity (always available)
  2. Enhanced Mode: AI-powered analysis using Mistral (requires Ollama setup)

Prerequisites

  • Python 3.10 or higher
  • Virtual environment
  • Tesseract OCR (for scanned documents)
  • Ollama with Mistral AI model (for enhanced analysis)

Installation

Basic Setup

  1. Python Environment Setup:

    # Create and activate virtual environment
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
    # Install dependencies
    pip install -r requirements.txt
    
  2. Tesseract OCR (Optional - for scanned documents):

    • macOS: brew install tesseract
    • Linux: sudo apt-get install tesseract-ocr
    • Windows: Download installer from GitHub

Mistral AI Setup (Required for Enhanced Analysis)

  1. Install Ollama:

    • macOS/Linux:
      curl https://ollama.ai/install.sh | sh
      
    • Windows: Download from Ollama's website
  2. Pull Mistral Model:

    ollama pull mistral
    
  3. Verify Installation:

    ollama run mistral "Hello, testing Mistral AI"
    

⚠️ Important Note: The enhanced analysis features require Mistral AI through Ollama. If you don't have Mistral AI set up:

  • Basic resume matching will still work
  • AI-enhanced analysis features will be disabled
  • You can still use the system with reduced functionality

Installation

  1. Clone the repository:
git clone https://github.com/deepanmpc/ResumeAnalyse_RAG-Architecture.git
cd RAG
  1. Create and activate virtual environment:
# Resume Analysis and Matching System πŸ“„βœ¨

A sophisticated resume analysis and matching system that uses RAG (Retrieval Augmented Generation) to match resumes with job descriptions intelligently.

## 🌟 Features

- πŸ“ **Multi-Format Support**: Process resumes in PDF and Word formats.
- πŸ” **Advanced Text Extraction**: OCR capabilities for scanned documents.
- 🧠 **Intelligent Matching**: Uses embeddings and semantic search to find the best candidates.
- πŸ’Ύ **Vector Database**: ChromaDB for efficient similarity search and storage.
- πŸ€– **AI Enhancement**: Mistral AI for advanced analysis and summarization.
- πŸ“Š **Structured Output**: JSON format for analysis results.
- πŸ–₯️ **Interactive Web UI**: A React-based frontend for a user-friendly experience.

## πŸ–₯️ Web Frontend

The project includes a modern and interactive web-based user interface built with React, TypeScript, and Vite.

### Frontend Features

-   **Resume Matching Dashboard**: Upload a job description and see the top matching resumes.
-   **Detailed Match View**: For each matched resume, view details like:
    -   Resume file name.
    -   The section that matched best (e.g., "experience", "skills").
    -   A similarity score.
    -   The relevant text from the resume that matched the job description.
-   **AI Summary Display**: Shows an AI-generated summary of the top matches. It gracefully handles and displays errors if the summary generation fails (e.g., if the AI model is not available).
-   **User-Friendly Interface**: Built with modern UI components for a smooth experience.

## πŸš€ Getting Started

### Prerequisites

- Python 3.10 or higher
- Node.js and npm (or yarn/pnpm)
- Tesseract OCR (for scanned documents)
- Ollama with Mistral AI model (for enhanced analysis)

### Installation

1.  **Clone the repository**:
    ```bash
    git clone <repository-url>
    cd RAG
    ```

2.  **Backend Setup**:
    ```bash
    # Create and activate virtual environment
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
    # Install Python dependencies
    pip install -r requirements.txt
    ```

3.  **Frontend Setup**:
    ```bash
    # Navigate to the web directory
    cd web
    
    # Install Node.js dependencies
    npm install
    ```

4.  **Tesseract OCR** (Optional - for scanned documents):
    -   macOS: `brew install tesseract`
    -   Linux: `sudo apt-get install tesseract-ocr`
    -   Windows: Download installer from GitHub

5.  **Mistral AI Setup** (Optional - for enhanced analysis):
    -   [Install Ollama](https://ollama.ai)
    -   Pull the Mistral model: `ollama pull mistral`

## 🎯 Usage

To run the application, you need to start both the backend server and the frontend development server.

1.  **Start the Backend Server**:
    From the project root directory (`RAG/`):
    ```bash
    uvicorn api:app --reload
    ```
    The API will be available at `http://127.0.0.1:8000`.

2.  **Start the Frontend Server**:
    In a new terminal, navigate to the `web/` directory:
    ```bash
    cd web
    npm run dev
    ```
    The web application will be available at `http://localhost:5173` (or another port if 5173 is busy).

3.  **Using the Application**:
    -   Open your browser to the frontend URL.
    -   Use the dashboard to upload a job description and see the matching resumes.

### Command-Line Usage (Alternative)

You can also use the system from the command line for indexing and matching.

1.  **Index Resumes**:
    ```bash
    python main.py --index DATA_resume/
    ```

2.  **Match with Job Description**:
    ```bash
    python main.py --job JOB_DESCRIPTIONS/job.pdf -n 5
    ```

## πŸ”§ Components

-   **Backend**: FastAPI, ChromaDB, SentenceTransformers
-   **Frontend**: React, TypeScript, Vite, Tailwind CSS, shadcn/ui
-   **AI**: Ollama, Mistral

---
Built with ❀️ for making recruitment smarter
  1. Install dependencies:
pip install -r requirements.txt
  1. Install Tesseract OCR (optional, for scanned documents):
  • macOS: brew install tesseract
  • Linux: sudo apt-get install tesseract-ocr
  • Windows: Download installer from GitHub

🎯 Usage

  1. Index Resumes:
python main.py --index DATA_resume/
  1. Match with Job Description:
python main.py --job JOB_DESCRIPTIONS/job.pdf -n 5
  1. Direct Query Search:
python main.py --query "python developer with 5 years experience" -n 3

πŸ”§ Components

Note: AI Enhancement features require Mistral AI setup. Other components work independently.

1. Knowledge Extraction (350+ lines)

  • PDF Parser: Advanced text extraction with OCR support
  • Word Parser: Microsoft Word document processing
  • Universal Parser: Common interface for all document types

2. Vector Database (170+ lines)

  • ChromaDB integration
  • Efficient similarity search
  • Section-level matching

3. Text Embeddings (80+ lines)

  • SentenceTransformer models
  • Section-wise embeddings
  • Metadata handling

4. AI Enhancement (40+ lines)

  • Mistral AI integration
  • Enhanced analysis
  • Match summarization

5. Core Application (300+ lines)

  • Command line interface
  • Batch processing
  • Results export

πŸ“Š Output Format

The system generates detailed JSON analysis:

{
    "rank": 1,
    "id": "resume_123",
    "filename": "candidate.pdf",
    "similarity": 0.89,
    "sections": {
        "experience": 0.92,
        "skills": 0.85,
        "education": 0.78
    }
}

πŸ“ˆ Performance

  • Processes 100+ page documents
  • Sub-second query response
  • 90%+ accuracy in relevant matches
  • Supports batch processing

🀝 Contributing

  1. Fork the repository
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a new Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Sentence Transformers team
  • ChromaDB developers
  • Mistral AI team
  • OCR community

Built with ❀️ for making recruitment smarter

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support