Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available: 1.56.0
metadata
title: CV-Extractor
emoji: 📸
sdk: streamlit
sdk_version: 1.37.1
app_file: app.py
CV Analyzer (AI-Powered Resume Parser)
A Streamlit-based app that extracts structured data from CVs (PDF) using Docling + Agentic AI + Pydantic schema, and converts it into a clean, downloadable CSV.
Features
- Upload CV (PDF)
- Parse document using Docling
- Extract structured data using LLM agent
- Validate with Pydantic schema
- Convert to Pandas DataFrame
- View extracted data in UI
- Download as CSV
Tech Stack
- Streamlit – UI
- Docling – PDF parsing
- Pydantic / pydantic-ai – structured extraction
- Hugging Face / LLM – inference
- Pandas – data processing
Setup
1. Clone repo
git clone https://github.com/your-username/cv-analyzer.git
cd cv-analyzer
2. Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows
3. Install dependencies
pip install -r requirements.txt
4. Environment variables
Create a .env file:
HF_TOKEN=your_huggingface_token
.envis ignored via.gitignore
Run App
streamlit run app.py
How it works
- User uploads CV (PDF)
- Docling converts PDF → structured text/markdown
- LLM agent extracts data using predefined schema
- Output is validated via Pydantic
- Data is converted into a DataFrame
- User can view and download CSV
Notes
- Schema is designed for AI/ML-focused resumes
- Missing fields are returned as
null(no hallucination policy) - Dates are stored as strings to avoid parsing errors
- Validation is relaxed to improve LLM compatibility
Limitations
- LLM may still produce inconsistent outputs for poorly formatted CVs
- Complex layouts (tables, multi-column PDFs) may affect parsing quality
- Requires internet access for model inference
Future Improvements
- Multi-CV batch processing
- Candidate scoring & ranking
- Semantic search over resumes (FAISS)
- UI improvements (filters, charts)
- Export to JSON / Excel
License
MIT License