Spaces:
No application file
No application file
| # β‘ AI Recruitment Agent | |
| A production-grade hybrid candidate matching pipeline using **Groq LLM**, **Pinecone vector DB**, and a **Gradio 4.16.0** UI. | |
| ## Architecture | |
| ``` | |
| CSV Input β Stage 1: Normalize (Groq) | |
| β Stage 2: Embed + Match (Pinecone + SentenceTransformers) β Top 20 | |
| β Stage 3: Deterministic Rerank (Groq) β Top 10 | |
| β Stage 4: LLM Deep Review (Groq) β Top 5 | |
| β Stage 5: Final Synthesis (Groq) β Shortlist | |
| ``` | |
| ## Setup | |
| ### 1. Install dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Configure environment | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env and fill in your API keys | |
| ``` | |
| ### 3. Create Pinecone index | |
| In your Pinecone console: | |
| - Create an index named `recruitment-index` (or whatever you set in `PINECONE_INDEX`) | |
| - Dimension: **1024** for `BAAI/bge-m3`, **768** for `bge-large-en`, **384** for `all-MiniLM-L6-v2` | |
| - Metric: **cosine** | |
| ### 4. Run the Gradio UI | |
| ```bash | |
| python gradio_app.py | |
| ``` | |
| Open http://localhost:7860 in your browser. | |
| ### 5. (Optional) Run the FastAPI backend | |
| ```bash | |
| uvicorn main:app --host 0.0.0.0 --port 8000 --reload | |
| ``` | |
| API docs at http://localhost:8000/docs | |
| ## CSV Format | |
| Your CSV should have these columns (exact names or common variants): | |
| | Column | Variants | | |
| |--------|----------| | |
| | `name` | `full_name`, `candidate_name` | | |
| | `email` | `email_address` | | |
| | `skills` | `parsed_skills`, `technical_skills` | | |
| | `experience` | `parsed_work_experience`, `years_of_experience` | | |
| | `education` | `parsed_metadata_education` | | |
| | `resume_text` | `parsed_summary`, `summary` | | |
| ## Key Environment Variables | |
| | Variable | Description | Default | | |
| |----------|-------------|---------| | |
| | `GROQ_API_KEYS` | Comma-separated keys for rotation | β | | |
| | `GROQ_MODEL` | Model name | `llama3-70b-8192` | | |
| | `PINECONE_API_KEY` | Pinecone API key | β | | |
| | `PINECONE_INDEX` | Index name | `recruitment-index` | | |
| | `EMBEDDING_MODEL` | SentenceTransformer model | `BAAI/bge-m3` | | |
| | `STAGE2_TOP_K` | Candidates retrieved by embeddings | `20` | | |
| | `GRADIO_PORT` | UI port | `7860` | | |
| | `GRADIO_SHARE` | Enable public share link | `False` | | |
| ## Pipeline Stages | |
| | Stage | Method | Input | Output | | |
| |-------|--------|-------|--------| | |
| | 1. Normalize | Groq LLM | All candidates | Structured features | | |
| | 2. Embed & Match | Pinecone + BAAI/bge-m3 | All candidates | Top 20 by similarity | | |
| | 3. Rerank | Groq LLM (deterministic scoring) | Top 20 | Top 10 with scores | | |
| | 4. Deep Review | Groq LLM | Top 5 | Verdicts + signals | | |
| | 5. Final Synthesis | Groq LLM | Top 5 reviews | Final ranked shortlist | | |