PhdScout

AI-powered search agent for PhD positions, postdocs, research fellowships, and academic staff roles. Powered by the Groq free API — no subscriptions required.

100% Free Groq API Gradio UI Python 3.10+
🔍

Multi-source Search

5 job boards searched simultaneously — Europe, worldwide, and country-specific

🤖

AI Scoring

Each position scored 0–100 against your CV profile

✉️

Cover Letters

Personalised draft generated for every position

📦

ZIP Export

Download all approved applications in one click

How it works

Upload your CV

PDF, DOCX, or TXT. The LLM extracts a structured profile: education, publications, skills, research interests.

Search job boards

PhdScout queries Euraxess, mlscientist.com, jobs.ac.uk, scholarshipdb.net, and nature.com/careers in parallel, then deduplicates and filters by recency (expired listings discarded).

Score & rank

Each position is scored 0–100 for fit. The LLM reasons semantically — "NLP" and "natural language processing" are treated as equivalent. Postdoc and fellowship positions are automatically penalised when the candidate's CV shows no completed or in-progress PhD.

Review & edit

Load any position to see CV tailoring hints and a draft cover letter. Edit freely before approving.

Export

Download all approved applications as a ZIP containing cover letters and position summaries.

Installation

PhdScout runs locally with Python 3.10+ or on HuggingFace Spaces.

Clone & install

git clone https://github.com/Hipsterfil998/PhDScout.git
cd PhDScout
pip install -r requirements.txt

Get a Groq API key

ℹ️

Groq provides a generous free tier — no credit card required. Register at console.groq.com/keys.

Configure

Create a .env file in the project root:

# Required
LLM_BACKEND=groq
GROQ_API_KEY=gsk_your_key_here

# Optional overrides (see Configuration section)
OUTPUT_DIR=./output

Run

python app.py

Open http://localhost:7860 in your browser.

Dependencies

PackagePurpose
openaiGroq and Ollama API client (OpenAI-compatible)
gradioWeb UI
pdfplumberPDF text extraction
python-docxDOCX text extraction
beautifulsoup4 + lxmlHTML scraping
requestsHTTP client for scrapers
python-dotenv.env loading

Quickstart

From zero to your first scored job list in under 5 minutes.

Upload your CV

Click the upload area and select your PDF, DOCX, or TXT file.

Fill in the search fields

Enter a research field ("machine learning", "computational neuroscience"…), choose a location, and pick a position type.

Click "Parse CV & Search Positions"

Wait ~2–3 minutes. The agent scrapes all sources, parses your CV, and scores every match.

Review results

Switch to the Results tab. Positions are sorted by posting date (newest first) and labelled with a freshness indicator.

Generate & approve cover letters

In Review & Edit, select a position, read the CV hints, edit the draft, and click Approve & Save.

Export

Go to the Export tab and download the ZIP.

💡

Tip: Use comma-separated fields for broader searches: "machine learning, NLP, computer vision".

Web Interface

The Gradio UI is organised into three tabs.

Tab 1 — Setup & Search

FieldDescription
CV uploadPDF, DOCX, or TXT file
Research fieldFree-text or comma-separated list
Location40+ countries or custom value
Position typePhD, postdoc, predoctoral, fellowship, research staff
Min. match scoreThreshold for the "above score" count (all positions still visible)

Tab 2 — Results

Displays a scored table with columns: #, Score, Title, Institution, Type, Freshness, Rec., Why good fit.

Freshness labels

LabelMeaning
🟢 RecentPosted within the last 30 days
🟡 OlderHas a date, posted more than 30 days ago
🔴 Closing soonDeadline within 14 days
emptyNo date information available
ℹ️

Expired listings (deadline already passed, or posted in a previous year) are automatically excluded from results.

Tab 3 — Review & Edit

Select a position from the dropdown, click Load Position, then:

Command-Line Interface

For batch use or scripting, PhdScout exposes a CLI via main.py.

Basic usage

python main.py \
  --cv path/to/cv.pdf \
  --field "machine learning" \
  --location "Germany" \
  --type phd

Options

FlagDefaultDescription
--cvrequiredPath to CV file (PDF, DOCX, TXT)
--fieldrequiredResearch field(s), comma-separated
--locationEuropeLocation filter
--typephdPosition type
--min-score60Minimum match score to show

Python API

from agent import JobAgent

agent = JobAgent(
    model="llama-3.1-8b-instant",
    backend="groq",
    api_key="gsk_...",
)

profile, profile_text = agent.parse_cv("cv.pdf")
jobs = agent.search_jobs(field="NLP", location="Europe", position_type="phd")
scored = agent.score_jobs(jobs, profile_text)

for job in scored[:5]:
    m = job["match"]
    print(m["match_score"], job["title"], job.get("freshness"))

Job Sources

🇪🇺

Euraxess

EU/worldwide research portal. Country-filtered via API parameters.

🤖

mlscientist.com

ML & AI academic positions. 14 country categories supported.

🇬🇧

jobs.ac.uk

UK academic jobs. Queried only when UK or Worldwide is selected.

🌍

scholarshipdb.net

Worldwide aggregator with 28k+ positions across all disciplines. Country-filtered via URL path.

🔬

nature.com/careers

Multidisciplinary global board. Keyword search + ISO country code filtering.

Freshness filtering

After scraping, PhdScout automatically removes:

PhD eligibility gate

Before scoring, PhdScout checks whether the candidate holds or is pursuing a PhD and enforces two caps on postdoc and fellowship positions:

Candidate statusPostdoc / Fellowship score cap
No PhD detected in CV≤ 30 — set to skip
PhD in progress (candidate / student)≤ 65
PhD completedNo cap
ℹ️

This gate is enforced at two levels: in the LLM prompt (via JOB_MATCHER_PROMPT) and in code (agent/matching/matcher.py) as a safety net. PhD positions are always open to master's graduates — no cap applies.

Adding a source

Create a new file in agent/search/scrapers/ that subclasses BaseScraper:

from agent.search.scrapers.base import BaseScraper

class MyScraper(BaseScraper):
    name = "mysource"

    def scrape(self, field, location, position_type):
        soup = self._fetch(f"https://example.com/jobs?q={field}")
        if soup is None: return []
        results = []
        for card in soup.select(".job-card"):
            results.append({
                "title": card.select_one("h2").text,
                "url": card.select_one("a")["href"],
                "posted": card.select_one(".date").text,
                "source": self.name,
                "type": self._detect_type(card.text, ""),
            })
        return results

Then register it in agent/search/searcher.py → _build_scrapers().

Configuration

All settings live in config.py. Edit the file directly — no restart needed if using the CLI, restart the Gradio app after changes.

LLM settings

ParameterDefaultDescription
default_modelllama-3.1-8b-instantGroq model to use
max_tokens4096Max tokens per LLM response
llm_backendollamaBackend: groq | huggingface | ollama

Scraper settings

ParameterDefaultDescription
scraper_delay1.5 sPolite delay between HTTP requests
max_results_per_source20Max listings fetched per source

Freshness thresholds

ParameterDefaultDescription
recent_days30Days since posting → 🟢 Recent
deadline_warn_days14Days until deadline → 🔴 Closing soon

UI defaults

ParameterDefaultDescription
min_score_default60Default minimum match score slider value

Environment variables

VariableDescription
GROQ_API_KEYGroq API key (takes priority over HF_TOKEN)
HF_TOKENHuggingFace token (fallback backend)
LLM_BACKENDOverride backend: groq | huggingface | ollama
OUTPUT_DIROutput directory for ZIP exports (default: ./output)

Prompts

All LLM prompts live in agent/prompts/. Each service has its own file — edit the relevant file to tune that part of the agent's behaviour.

⚠️

Prompts use Python .format() placeholders like {profile}. Keep all placeholders intact when editing.

Available prompts

ConstantUsed byControls
File: agent/prompts/cv_parser.py
CV_PARSER_SYSTEM
CV_PARSER_PROMPT
CVParserHow the CV is structured into JSON. Tweak to extract custom fields.
File: agent/prompts/job_matcher.py
JOB_MATCHER_SYSTEM
JOB_MATCHER_PROMPT
JobMatcherScoring criteria, eligibility gate, and scoring guide. Edit thresholds here.
File: agent/prompts/cv_tailor.py
CV_TAILOR_SYSTEM
CV_TAILOR_PROMPT
CVTailorWhat tailoring hints to produce and how specific to be.
File: agent/prompts/cover_letter.py
COVER_LETTER_SYSTEM
COVER_LETTER_PROMPT
CoverLetterWriterLetter style, length, structure, and language detection.

Example: changing the letter length

In agent/prompts/cover_letter.py, find COVER_LETTER_SYSTEM and change:

# Before
The letter should be 400-600 words (3-4 paragraphs).

# After
The letter should be 250-350 words (2-3 paragraphs).

Example: stricter scoring

In JOB_MATCHER_PROMPT, raise the thresholds in the scoring guide:

Scoring guide:
  85-100: Excellent — perfect research keyword overlap, recent publications
  70-84:  Good — strong overlap on primary research area
  50-69:  Partial — some overlap, transferable skills
  0-49:   Skip — different area or missing key requirements

Architecture

Project structure

PhDScout/ ├── app.py # Gradio web interface ├── config.py # Runtime settings (model, thresholds, delays) ├── main.py # CLI entry point ├── requirements.txt ├── agent/ │ ├── __init__.py # Public API: JobAgent, LLMQuotaError │ ├── pipeline.py # JobAgent orchestrator │ ├── base_service.py # BaseLLMService base class │ ├── llm_client.py # Groq / HuggingFace / Ollama client │ ├── utils.py # JSON parsing, shared helpers │ ├── prompts/ # LLM prompts — one file per service │ │ ├── cv_parser.py # CV extraction prompts │ │ ├── job_matcher.py # Scoring + eligibility gate prompts │ │ ├── cv_tailor.py # Tailoring hints prompts │ │ └── cover_letter.py # Cover letter prompts │ ├── cv/ # CV-related services │ │ ├── parser.py # CV extraction + LLM parsing │ │ ├── tailor.py # Tailoring hints generator │ │ └── cover_letter.py # Cover letter writer │ ├── matching/ # Scoring engine │ │ └── matcher.py # JobMatcher + PhD eligibility cap │ └── search/ # Job search infrastructure │ ├── searcher.py # JobSearcher (orchestrates scrapers) │ └── scrapers/ │ ├── base.py # BaseScraper ABC + shared helpers │ ├── euraxess.py # EU/worldwide research portal │ ├── mlscientist.py # ML & AI academic positions │ ├── jobs_ac_uk.py # UK academic jobs (UK/worldwide only) │ ├── scholarshipdb.py # Worldwide aggregator (28k+ positions) │ └── nature_careers.py # nature.com/careers — multidisciplinary └── tests/ # 156 unit tests (pytest)

Pipeline flow

CV file
  ↓ CVParser.extract_raw_text()
Raw text
  ↓ CVParser.parse() → LLM → CVProfile JSON
  ↓ CVParser.summarize() → profile_text
profile_text
  ↓ (in parallel with search)
  ↓ JobSearcher.search() → scrapers → deduplicate → filter stale → label freshness
jobs[]
  ↓ JobMatcher.score_all() → LLM × N → sort by score
scored_jobs[]
  ↓ (per selected job)
  ↓ CVTailor.generate() → LLM → TailoringHints
  ↓ CoverLetterWriter.generate() → LLM → draft letter
approved_jobs[] → ZIP export

LLM backends

Backendenv varNotes
Groq (recommended)GROQ_API_KEYFree tier, fast, OpenAI-compatible
OllamaLocal inference, set LLM_BACKEND=ollama
HuggingFaceHF_TOKENFallback, free tier has rate limits

Deployment

HuggingFace Spaces (recommended)

Fork or create a Space

Go to huggingface.co/spaces → New Space → SDK: Gradio.

Push the code

Add the Space as a remote and push: git push space main

Set secrets

In Space Settings → Variables and Secrets, add GROQ_API_KEY.

Add HF frontmatter to README

Run ./push_to_hf.sh — it injects the required YAML frontmatter automatically.

GitHub Pages (this documentation)

💡

This documentation is a single HTML file at docs/index.html — no build step required.

To enable GitHub Pages:

  1. Go to your GitHub repo → Settings → Pages
  2. Source: Deploy from a branch
  3. Branch: main / folder: /docs
  4. Click Save

The docs will be live at https://<username>.github.io/PhDScout.

Editing the docs

To modify this documentation directly on GitHub:

  1. Go to your repo on GitHub
  2. Navigate to docs/index.html
  3. Click the pencil icon (Edit this file)
  4. Edit the HTML — each section is a <section class="section" id="..."> block
  5. Commit directly to main — GitHub Pages rebuilds automatically
ℹ️

The navigation links are wired by JavaScript at the bottom of the file. To add a new section: add a <button> in the sidebar and a matching <section> in the main area.