Resume Analysis and Matching System πβ¨
A sophisticated resume analysis and matching system that uses RAG (Retrieval Augmented Generation) to match resumes with job descriptions intelligently.
π Features
- π Multi-Format Support: Process resumes in PDF and Word formats
- π Advanced Text Extraction: OCR capabilities for scanned documents
- π§ Intelligent Matching: Uses embeddings and semantic search
- πΎ Vector Database: ChromaDB for efficient similarity search
- π€ AI Enhancement: Mistral AI for advanced analysis
- π Structured Output: JSON format analysis results.
ποΈ Project Structure
RAG/
βββ CHROMA_DB/ # Vector database management
βββ DATA_resume/ # Sample resumes
βββ JOB_DESCRIPTIONS/ # Job description PDFs
βββ KNOWLEDGE_EXTRACTOR/ # Document parsing
βββ SLM_manager/ # AI augmentation
βββ TEXT_EMBEDDING_MODEL/ # Text embedding generation
π Getting Started
The system operates in two modes:
- Basic Mode: Resume matching using vector similarity (always available)
- Enhanced Mode: AI-powered analysis using Mistral (requires Ollama setup)
Prerequisites
- Python 3.10 or higher
- Virtual environment
- Tesseract OCR (for scanned documents)
- Ollama with Mistral AI model (for enhanced analysis)
Installation
Basic Setup
Python Environment Setup:
# Create and activate virtual environment python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install dependencies pip install -r requirements.txtTesseract OCR (Optional - for scanned documents):
- macOS:
brew install tesseract - Linux:
sudo apt-get install tesseract-ocr - Windows: Download installer from GitHub
- macOS:
Mistral AI Setup (Required for Enhanced Analysis)
Install Ollama:
- macOS/Linux:
curl https://ollama.ai/install.sh | sh - Windows: Download from Ollama's website
- macOS/Linux:
Pull Mistral Model:
ollama pull mistralVerify Installation:
ollama run mistral "Hello, testing Mistral AI"
β οΈ Important Note: The enhanced analysis features require Mistral AI through Ollama. If you don't have Mistral AI set up:
- Basic resume matching will still work
- AI-enhanced analysis features will be disabled
- You can still use the system with reduced functionality
Installation
- Clone the repository:
git clone https://github.com/deepanmpc/ResumeAnalyse_RAG-Architecture.git
cd RAG
- Create and activate virtual environment:
# Resume Analysis and Matching System πβ¨
A sophisticated resume analysis and matching system that uses RAG (Retrieval Augmented Generation) to match resumes with job descriptions intelligently.
## π Features
- π **Multi-Format Support**: Process resumes in PDF and Word formats.
- π **Advanced Text Extraction**: OCR capabilities for scanned documents.
- π§ **Intelligent Matching**: Uses embeddings and semantic search to find the best candidates.
- πΎ **Vector Database**: ChromaDB for efficient similarity search and storage.
- π€ **AI Enhancement**: Mistral AI for advanced analysis and summarization.
- π **Structured Output**: JSON format for analysis results.
- π₯οΈ **Interactive Web UI**: A React-based frontend for a user-friendly experience.
## π₯οΈ Web Frontend
The project includes a modern and interactive web-based user interface built with React, TypeScript, and Vite.
### Frontend Features
- **Resume Matching Dashboard**: Upload a job description and see the top matching resumes.
- **Detailed Match View**: For each matched resume, view details like:
- Resume file name.
- The section that matched best (e.g., "experience", "skills").
- A similarity score.
- The relevant text from the resume that matched the job description.
- **AI Summary Display**: Shows an AI-generated summary of the top matches. It gracefully handles and displays errors if the summary generation fails (e.g., if the AI model is not available).
- **User-Friendly Interface**: Built with modern UI components for a smooth experience.
## π Getting Started
### Prerequisites
- Python 3.10 or higher
- Node.js and npm (or yarn/pnpm)
- Tesseract OCR (for scanned documents)
- Ollama with Mistral AI model (for enhanced analysis)
### Installation
1. **Clone the repository**:
```bash
git clone <repository-url>
cd RAG
```
2. **Backend Setup**:
```bash
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install Python dependencies
pip install -r requirements.txt
```
3. **Frontend Setup**:
```bash
# Navigate to the web directory
cd web
# Install Node.js dependencies
npm install
```
4. **Tesseract OCR** (Optional - for scanned documents):
- macOS: `brew install tesseract`
- Linux: `sudo apt-get install tesseract-ocr`
- Windows: Download installer from GitHub
5. **Mistral AI Setup** (Optional - for enhanced analysis):
- [Install Ollama](https://ollama.ai)
- Pull the Mistral model: `ollama pull mistral`
## π― Usage
To run the application, you need to start both the backend server and the frontend development server.
1. **Start the Backend Server**:
From the project root directory (`RAG/`):
```bash
uvicorn api:app --reload
```
The API will be available at `http://127.0.0.1:8000`.
2. **Start the Frontend Server**:
In a new terminal, navigate to the `web/` directory:
```bash
cd web
npm run dev
```
The web application will be available at `http://localhost:5173` (or another port if 5173 is busy).
3. **Using the Application**:
- Open your browser to the frontend URL.
- Use the dashboard to upload a job description and see the matching resumes.
### Command-Line Usage (Alternative)
You can also use the system from the command line for indexing and matching.
1. **Index Resumes**:
```bash
python main.py --index DATA_resume/
```
2. **Match with Job Description**:
```bash
python main.py --job JOB_DESCRIPTIONS/job.pdf -n 5
```
## π§ Components
- **Backend**: FastAPI, ChromaDB, SentenceTransformers
- **Frontend**: React, TypeScript, Vite, Tailwind CSS, shadcn/ui
- **AI**: Ollama, Mistral
---
Built with β€οΈ for making recruitment smarter
- Install dependencies:
pip install -r requirements.txt
- Install Tesseract OCR (optional, for scanned documents):
- macOS:
brew install tesseract - Linux:
sudo apt-get install tesseract-ocr - Windows: Download installer from GitHub
π― Usage
- Index Resumes:
python main.py --index DATA_resume/
- Match with Job Description:
python main.py --job JOB_DESCRIPTIONS/job.pdf -n 5
- Direct Query Search:
python main.py --query "python developer with 5 years experience" -n 3
π§ Components
Note: AI Enhancement features require Mistral AI setup. Other components work independently.
1. Knowledge Extraction (350+ lines)
- PDF Parser: Advanced text extraction with OCR support
- Word Parser: Microsoft Word document processing
- Universal Parser: Common interface for all document types
2. Vector Database (170+ lines)
- ChromaDB integration
- Efficient similarity search
- Section-level matching
3. Text Embeddings (80+ lines)
- SentenceTransformer models
- Section-wise embeddings
- Metadata handling
4. AI Enhancement (40+ lines)
- Mistral AI integration
- Enhanced analysis
- Match summarization
5. Core Application (300+ lines)
- Command line interface
- Batch processing
- Results export
π Output Format
The system generates detailed JSON analysis:
{
"rank": 1,
"id": "resume_123",
"filename": "candidate.pdf",
"similarity": 0.89,
"sections": {
"experience": 0.92,
"skills": 0.85,
"education": 0.78
}
}
π Performance
- Processes 100+ page documents
- Sub-second query response
- 90%+ accuracy in relevant matches
- Supports batch processing
π€ Contributing
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Sentence Transformers team
- ChromaDB developers
- Mistral AI team
- OCR community
Built with β€οΈ for making recruitment smarter