CV-Extractor

Sleeping

App Files Files Community

CV-Extractor / README.md

Sher1988

update sdk_version: 1.37.1

e811837 12 days ago

preview code

raw

history blame contribute delete

2.19 kB

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade

metadata

title: CV-Extractor
emoji: 📸
sdk: streamlit
sdk_version: 1.37.1
app_file: app.py

CV Analyzer (AI-Powered Resume Parser)

A Streamlit-based app that extracts structured data from CVs (PDF) using Docling + Agentic AI + Pydantic schema, and converts it into a clean, downloadable CSV.

Features

Upload CV (PDF)
Parse document using Docling
Extract structured data using LLM agent
Validate with Pydantic schema
Convert to Pandas DataFrame
View extracted data in UI
Download as CSV

Tech Stack

Streamlit – UI
Docling – PDF parsing
Pydantic / pydantic-ai – structured extraction
Hugging Face / LLM – inference
Pandas – data processing

Setup

1. Clone repo

git clone https://github.com/your-username/cv-analyzer.git
cd cv-analyzer

2. Create virtual environment

python -m venv .venv
source .venv/bin/activate   # Linux/macOS
.venv\Scripts\activate      # Windows

3. Install dependencies

pip install -r requirements.txt

4. Environment variables

Create a .env file:

HF_TOKEN=your_huggingface_token

.env is ignored via .gitignore

Run App

streamlit run app.py

How it works

User uploads CV (PDF)
Docling converts PDF → structured text/markdown
LLM agent extracts data using predefined schema
Output is validated via Pydantic
Data is converted into a DataFrame
User can view and download CSV

Notes

Schema is designed for AI/ML-focused resumes
Missing fields are returned as null (no hallucination policy)
Dates are stored as strings to avoid parsing errors
Validation is relaxed to improve LLM compatibility

Limitations

LLM may still produce inconsistent outputs for poorly formatted CVs
Complex layouts (tables, multi-column PDFs) may affect parsing quality
Requires internet access for model inference

Future Improvements

Multi-CV batch processing
Candidate scoring & ranking
Semantic search over resumes (FAISS)
UI improvements (filters, charts)
Export to JSON / Excel

License

MIT License