File size: 43,824 Bytes
4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 29fd3cd 4f28e20 29fd3cd 4f28e20 72f3ad9 4f28e20 29fd3cd 4f28e20 29fd3cd 4f28e20 29fd3cd 4f28e20 72f3ad9 4f28e20 a995fa5 4f28e20 a995fa5 4f28e20 a995fa5 4f28e20 a995fa5 4f28e20 a995fa5 4f28e20 a995fa5 4f28e20 a995fa5 4f28e20 a995fa5 29fd3cd 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 e7eb232 4f28e20 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 | <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<title>PhdScout β Documentation</title>
<style>
:root {
--bg: #000000;
--surface: #1c1c1e;
--sidebar-bg: rgba(28,28,30,0.88);
--border: rgba(255,255,255,0.10);
--text: #f5f5f7;
--text-secondary: #98989d;
--accent: #2997ff;
--accent-hover: #47aaff;
--code-bg: #111113;
--code-text: #e8e8ed;
--tag-bg: #0a2540;
--tag-text: #4da6ff;
--radius: 12px;
--radius-sm: 8px;
--shadow: 0 2px 20px rgba(0,0,0,0.40);
--shadow-lg: 0 8px 40px rgba(0,0,0,0.60);
--sidebar-w: 240px;
--font: -apple-system, BlinkMacSystemFont, "SF Pro Text", "Helvetica Neue", Arial, sans-serif;
--font-mono: "SF Mono", "Fira Code", "Cascadia Code", Menlo, monospace;
}
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
font-family: var(--font);
background: var(--bg);
color: var(--text);
line-height: 1.6;
font-size: 15px;
-webkit-font-smoothing: antialiased;
}
/* ββ Sidebar ββββββββββββββββββββββββββββββββββββββββββ */
.sidebar {
position: fixed;
top: 0; left: 0; bottom: 0;
width: var(--sidebar-w);
background: var(--sidebar-bg);
backdrop-filter: blur(20px) saturate(180%);
-webkit-backdrop-filter: blur(20px) saturate(180%);
border-right: 1px solid var(--border);
overflow-y: auto;
z-index: 100;
padding: 24px 0 40px;
display: flex;
flex-direction: column;
gap: 0;
}
.sidebar-logo {
padding: 0 20px 20px;
border-bottom: 1px solid var(--border);
margin-bottom: 12px;
}
.sidebar-logo h1 {
font-size: 18px;
font-weight: 700;
letter-spacing: -0.5px;
color: var(--text);
}
.sidebar-logo span {
font-size: 12px;
color: var(--text-secondary);
display: block;
margin-top: 2px;
}
.nav-section {
padding: 6px 12px 2px;
font-size: 11px;
font-weight: 600;
letter-spacing: 0.06em;
text-transform: uppercase;
color: var(--text-secondary);
margin-top: 10px;
}
.nav-link {
display: flex;
align-items: center;
gap: 8px;
padding: 7px 20px;
font-size: 14px;
color: var(--text-secondary);
text-decoration: none;
border-radius: 0;
transition: color 0.15s, background 0.15s;
cursor: pointer;
border: none;
background: none;
width: 100%;
text-align: left;
}
.nav-link:hover { background: rgba(0,0,0,0.04); color: var(--text); }
.nav-link.active { color: var(--accent); font-weight: 500; background: rgba(0,113,227,0.07); }
.nav-link .icon { font-size: 15px; width: 18px; text-align: center; }
/* ββ Main content ββββββββββββββββββββββββββββββββββββββ */
.main {
margin-left: var(--sidebar-w);
min-height: 100vh;
padding: 48px 64px;
max-width: calc(var(--sidebar-w) + 820px);
}
/* ββ Sections ββββββββββββββββββββββββββββββββββββββββββ */
.section { display: none; }
.section.active { display: block; }
/* ββ Typography ββββββββββββββββββββββββββββββββββββββββ */
h1 {
font-size: 36px;
font-weight: 700;
letter-spacing: -1px;
line-height: 1.15;
color: var(--text);
margin-bottom: 12px;
}
h2 {
font-size: 22px;
font-weight: 600;
letter-spacing: -0.4px;
margin: 40px 0 14px;
color: var(--text);
padding-top: 8px;
}
h3 {
font-size: 17px;
font-weight: 600;
margin: 24px 0 10px;
color: var(--text);
}
p { margin-bottom: 14px; color: var(--text); }
a { color: var(--accent); text-decoration: none; }
a:hover { text-decoration: underline; }
ul, ol { padding-left: 22px; margin-bottom: 14px; }
li { margin-bottom: 5px; }
/* ββ Hero ββββββββββββββββββββββββββββββββββββββββββββββ */
.hero {
background: linear-gradient(135deg, #0071e3 0%, #0a84ff 50%, #34aadc 100%);
border-radius: var(--radius);
padding: 40px 44px;
color: white;
margin-bottom: 40px;
position: relative;
overflow: hidden;
}
.hero::before {
content: "π";
position: absolute;
right: 36px; top: 50%;
transform: translateY(-50%);
font-size: 80px;
opacity: 0.25;
}
.hero h1 { color: white; font-size: 32px; margin-bottom: 8px; }
.hero p { color: rgba(255,255,255,0.88); font-size: 16px; margin: 0; }
.hero-badges {
display: flex; gap: 8px; flex-wrap: wrap;
margin-top: 20px;
}
.badge {
background: rgba(255,255,255,0.2);
border: 1px solid rgba(255,255,255,0.3);
color: white;
padding: 4px 12px;
border-radius: 100px;
font-size: 12px;
font-weight: 500;
}
/* ββ Cards βββββββββββββββββββββββββββββββββββββββββββββ */
.card {
background: var(--surface);
border-radius: var(--radius);
padding: 24px;
margin-bottom: 20px;
box-shadow: var(--shadow);
border: 1px solid var(--border);
}
.card h3 { margin-top: 0; }
.card-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 16px;
margin-bottom: 24px;
}
.card-sm {
background: var(--surface);
border-radius: var(--radius-sm);
padding: 20px;
box-shadow: var(--shadow);
border: 1px solid var(--border);
text-align: center;
}
.card-sm .icon-big { font-size: 28px; display: block; margin-bottom: 8px; }
.card-sm h4 { font-size: 14px; font-weight: 600; margin-bottom: 4px; }
.card-sm p { font-size: 13px; color: var(--text-secondary); margin: 0; }
/* ββ Code ββββββββββββββββββββββββββββββββββββββββββββββ */
pre {
background: var(--code-bg);
color: var(--code-text);
border-radius: var(--radius-sm);
padding: 20px 22px;
overflow-x: auto;
font-family: var(--font-mono);
font-size: 13px;
line-height: 1.7;
margin: 16px 0;
}
code {
font-family: var(--font-mono);
font-size: 13px;
background: rgba(255,255,255,0.08);
padding: 2px 6px;
border-radius: 4px;
color: #ff6b6b;
}
pre code {
background: none;
padding: 0;
color: inherit;
font-size: inherit;
}
/* Syntax highlight colours */
.kw { color: #ff7ab2; } /* keywords */
.cm { color: #7f9f7f; } /* comments */
.st { color: #fc6a5d; } /* strings */
.nb { color: #67b7a4; } /* builtins */
.cn { color: #ffd66b; } /* constants */
/* ββ Steps βββββββββββββββββββββββββββββββββββββββββββββ */
.steps { counter-reset: step; }
.step {
display: flex; gap: 18px;
margin-bottom: 20px;
align-items: flex-start;
}
.step-num {
counter-increment: step;
min-width: 32px; height: 32px;
background: var(--accent);
color: white;
border-radius: 50%;
display: flex; align-items: center; justify-content: center;
font-size: 13px; font-weight: 700;
flex-shrink: 0;
margin-top: 2px;
}
.step-num::before { content: counter(step); }
.step-body { flex: 1; }
.step-body strong { display: block; font-size: 15px; margin-bottom: 4px; }
.step-body p { margin: 0; color: var(--text-secondary); font-size: 14px; }
/* ββ Table βββββββββββββββββββββββββββββββββββββββββββββ */
table {
width: 100%;
border-collapse: collapse;
margin: 16px 0 24px;
font-size: 14px;
}
th {
text-align: left;
padding: 10px 14px;
background: var(--bg);
font-weight: 600;
font-size: 12px;
letter-spacing: 0.04em;
text-transform: uppercase;
color: var(--text-secondary);
border-bottom: 1px solid var(--border);
}
td {
padding: 11px 14px;
border-bottom: 1px solid var(--border);
vertical-align: top;
}
tr:last-child td { border-bottom: none; }
tr:hover td { background: rgba(255,255,255,0.04); }
/* ββ Callout βββββββββββββββββββββββββββββββββββββββββββ */
.callout {
border-radius: var(--radius-sm);
padding: 14px 18px;
margin: 16px 0;
display: flex; gap: 12px; align-items: flex-start;
font-size: 14px;
}
.callout-icon { font-size: 18px; flex-shrink: 0; margin-top: 1px; }
.callout.info { background: #0a1f3a; border-left: 3px solid #2997ff; }
.callout.warn { background: #2a1f00; border-left: 3px solid #f5a623; }
.callout.tip { background: #0a2018; border-left: 3px solid #30d158; }
.callout p { margin: 0; }
/* ββ Tag βββββββββββββββββββββββββββββββββββββββββββββββ */
.tag {
display: inline-block;
background: var(--tag-bg);
color: var(--tag-text);
padding: 2px 8px;
border-radius: 4px;
font-size: 12px;
font-weight: 500;
font-family: var(--font-mono);
}
/* ββ Architecture tree βββββββββββββββββββββββββββββββββ */
.tree {
background: var(--code-bg);
color: var(--code-text);
border-radius: var(--radius-sm);
padding: 20px 22px;
font-family: var(--font-mono);
font-size: 13px;
line-height: 1.9;
}
.tree .dir { color: #67b7a4; font-weight: 600; }
.tree .file { color: #e8e8ed; }
.tree .note { color: #7f9f7f; }
/* ββ Divider βββββββββββββββββββββββββββββββββββββββββββ */
hr { border: none; border-top: 1px solid var(--border); margin: 32px 0; }
/* ββ Responsive ββββββββββββββββββββββββββββββββββββββββ */
@media (max-width: 768px) {
.sidebar { transform: translateX(-100%); transition: transform 0.3s; }
.sidebar.open { transform: translateX(0); }
.main { margin-left: 0; padding: 24px 20px; }
.hero { padding: 28px 24px; }
.hero::before { display: none; }
.card-grid { grid-template-columns: 1fr 1fr; }
}
/* ββ Scrollbar βββββββββββββββββββββββββββββββββββββββββ */
::-webkit-scrollbar { width: 6px; }
::-webkit-scrollbar-track { background: transparent; }
::-webkit-scrollbar-thumb { background: rgba(255,255,255,0.2); border-radius: 3px; }
</style>
</head>
<body>
<!-- βββ SIDEBAR βββββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
<nav class="sidebar" id="sidebar">
<div class="sidebar-logo">
<h1>PhdScout π</h1>
<span>Documentation</span>
</div>
<span class="nav-section">Getting Started</span>
<button class="nav-link active" onclick="show('overview', this)">
<span class="icon">π </span> Overview
</button>
<button class="nav-link" onclick="show('install', this)">
<span class="icon">βοΈ</span> Installation
</button>
<button class="nav-link" onclick="show('quickstart', this)">
<span class="icon">π</span> Quickstart
</button>
<span class="nav-section">Usage</span>
<button class="nav-link" onclick="show('web-ui', this)">
<span class="icon">π₯οΈ</span> Web Interface
</button>
<button class="nav-link" onclick="show('cli', this)">
<span class="icon">π»</span> CLI
</button>
<button class="nav-link" onclick="show('sources', this)">
<span class="icon">π</span> Job Sources
</button>
<span class="nav-section">Reference</span>
<button class="nav-link" onclick="show('config', this)">
<span class="icon">π§</span> Configuration
</button>
<button class="nav-link" onclick="show('prompts', this)">
<span class="icon">βοΈ</span> Prompts
</button>
<button class="nav-link" onclick="show('architecture', this)">
<span class="icon">ποΈ</span> Architecture
</button>
<button class="nav-link" onclick="show('deployment', this)">
<span class="icon">βοΈ</span> Deployment
</button>
</nav>
<!-- βββ MAIN ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
<main class="main">
<!-- ββ Overview βββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section active" id="overview">
<div class="hero">
<h1>PhdScout</h1>
<p>AI-powered search agent for PhD positions, postdocs, research fellowships, and academic staff roles. Powered by the Groq free API β no subscriptions required.</p>
<div class="hero-badges">
<span class="badge">100% Free</span>
<span class="badge">Groq API</span>
<span class="badge">Gradio UI</span>
<span class="badge">Python 3.10+</span>
</div>
</div>
<div class="card-grid">
<div class="card-sm">
<span class="icon-big">π</span>
<h4>Multi-source Search</h4>
<p>5 job boards searched simultaneously β Europe, worldwide, and country-specific</p>
</div>
<div class="card-sm">
<span class="icon-big">π€</span>
<h4>AI Scoring</h4>
<p>Each position scored 0β100 against your CV profile</p>
</div>
<div class="card-sm">
<span class="icon-big">βοΈ</span>
<h4>Cover Letters</h4>
<p>Personalised draft generated for every position</p>
</div>
<div class="card-sm">
<span class="icon-big">π¦</span>
<h4>ZIP Export</h4>
<p>Download all approved applications in one click</p>
</div>
</div>
<h2>How it works</h2>
<div class="card">
<div class="steps">
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Upload your CV</strong>
<p>PDF, DOCX, or TXT. The LLM extracts a structured profile: education, publications, skills, research interests.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Search job boards</strong>
<p>PhdScout queries Euraxess, mlscientist.com, jobs.ac.uk, scholarshipdb.net, and nature.com/careers in parallel, then deduplicates and filters by recency (expired listings discarded).</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Score & rank</strong>
<p>Each position is scored 0β100 for fit. The LLM reasons semantically β "NLP" and "natural language processing" are treated as equivalent. Postdoc and fellowship positions are automatically penalised when the candidate's CV shows no completed or in-progress PhD.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Review & edit</strong>
<p>Load any position to see CV tailoring hints and a draft cover letter. Edit freely before approving.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Export</strong>
<p>Download all approved applications as a ZIP containing cover letters and position summaries.</p>
</div>
</div>
</div>
</div>
</section>
<!-- ββ Installation βββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section" id="install">
<h1>Installation</h1>
<p>PhdScout runs locally with Python 3.10+ or on HuggingFace Spaces.</p>
<h2>Clone & install</h2>
<pre><code>git clone https://github.com/Hipsterfil998/PhDScout.git
cd PhDScout
pip install -r requirements.txt</code></pre>
<h2>Get a Groq API key</h2>
<div class="callout info">
<span class="callout-icon">βΉοΈ</span>
<p>Groq provides a generous free tier β no credit card required. Register at <a href="https://console.groq.com/keys" target="_blank">console.groq.com/keys</a>.</p>
</div>
<h2>Configure</h2>
<p>Create a <code>.env</code> file in the project root:</p>
<pre><code><span class="cm"># Required</span>
LLM_BACKEND=groq
GROQ_API_KEY=gsk_your_key_here
<span class="cm"># Optional overrides (see Configuration section)</span>
OUTPUT_DIR=./output</code></pre>
<h2>Run</h2>
<pre><code>python app.py</code></pre>
<p>Open <a href="http://localhost:7860">http://localhost:7860</a> in your browser.</p>
<h2>Dependencies</h2>
<table>
<tr><th>Package</th><th>Purpose</th></tr>
<tr><td><code>openai</code></td><td>Groq and Ollama API client (OpenAI-compatible)</td></tr>
<tr><td><code>gradio</code></td><td>Web UI</td></tr>
<tr><td><code>pdfplumber</code></td><td>PDF text extraction</td></tr>
<tr><td><code>python-docx</code></td><td>DOCX text extraction</td></tr>
<tr><td><code>beautifulsoup4 + lxml</code></td><td>HTML scraping</td></tr>
<tr><td><code>requests</code></td><td>HTTP client for scrapers</td></tr>
<tr><td><code>python-dotenv</code></td><td>.env loading</td></tr>
</table>
</section>
<!-- ββ Quickstart βββββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section" id="quickstart">
<h1>Quickstart</h1>
<p>From zero to your first scored job list in under 5 minutes.</p>
<div class="card">
<div class="steps">
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Upload your CV</strong>
<p>Click the upload area and select your PDF, DOCX, or TXT file.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Fill in the search fields</strong>
<p>Enter a research field (<em>"machine learning"</em>, <em>"computational neuroscience"</em>β¦), choose a location, and pick a position type.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Click "Parse CV & Search Positions"</strong>
<p>Wait ~2β3 minutes. The agent scrapes all sources, parses your CV, and scores every match.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Review results</strong>
<p>Switch to the <strong>Results</strong> tab. Positions are sorted by posting date (newest first) and labelled with a freshness indicator.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Generate & approve cover letters</strong>
<p>In <strong>Review & Edit</strong>, select a position, read the CV hints, edit the draft, and click <strong>Approve & Save</strong>.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Export</strong>
<p>Go to the <strong>Export</strong> tab and download the ZIP.</p>
</div>
</div>
</div>
</div>
<div class="callout tip">
<span class="callout-icon">π‘</span>
<p><strong>Tip:</strong> Use comma-separated fields for broader searches: <em>"machine learning, NLP, computer vision"</em>.</p>
</div>
</section>
<!-- ββ Web UI βββββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section" id="web-ui">
<h1>Web Interface</h1>
<p>The Gradio UI is organised into three tabs.</p>
<h2>Tab 1 β Setup & Search</h2>
<div class="card">
<table>
<tr><th>Field</th><th>Description</th></tr>
<tr><td><strong>CV upload</strong></td><td>PDF, DOCX, or TXT file</td></tr>
<tr><td><strong>Research field</strong></td><td>Free-text or comma-separated list</td></tr>
<tr><td><strong>Location</strong></td><td>40+ countries or custom value</td></tr>
<tr><td><strong>Position type</strong></td><td>PhD, postdoc, predoctoral, fellowship, research staff</td></tr>
<tr><td><strong>Min. match score</strong></td><td>Threshold for the "above score" count (all positions still visible)</td></tr>
</table>
</div>
<h2>Tab 2 β Results</h2>
<p>Displays a scored table with columns: <strong>#</strong>, <strong>Score</strong>, <strong>Title</strong>, <strong>Institution</strong>, <strong>Type</strong>, <strong>Freshness</strong>, <strong>Rec.</strong>, <strong>Why good fit</strong>.</p>
<h3>Freshness labels</h3>
<table>
<tr><th>Label</th><th>Meaning</th></tr>
<tr><td><span class="tag">π’ Recent</span></td><td>Posted within the last 30 days</td></tr>
<tr><td><span class="tag">π‘ Older</span></td><td>Has a date, posted more than 30 days ago</td></tr>
<tr><td><span class="tag">π΄ Closing soon</span></td><td>Deadline within 14 days</td></tr>
<tr><td><em>empty</em></td><td>No date information available</td></tr>
</table>
<div class="callout info">
<span class="callout-icon">βΉοΈ</span>
<p>Expired listings (deadline already passed, or posted in a previous year) are automatically excluded from results.</p>
</div>
<h2>Tab 3 β Review & Edit</h2>
<p>Select a position from the dropdown, click <strong>Load Position</strong>, then:</p>
<ul>
<li>Read the <strong>Position Details</strong> and match analysis</li>
<li>Follow the <strong>CV Tailoring Hints</strong> panel</li>
<li>Edit the <strong>Cover Letter</strong> draft freely</li>
<li>Click <strong>Regenerate</strong> for a different version</li>
<li>Download the letter as a <strong>.txt</strong> file</li>
<li>Click <strong>Approve & Save</strong> to add it to the export queue</li>
</ul>
</section>
<!-- ββ CLI ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section" id="cli">
<h1>Command-Line Interface</h1>
<p>For batch use or scripting, PhdScout exposes a CLI via <code>main.py</code>.</p>
<h2>Basic usage</h2>
<pre><code>python main.py \
--cv path/to/cv.pdf \
--field "machine learning" \
--location "Germany" \
--type phd</code></pre>
<h2>Options</h2>
<table>
<tr><th>Flag</th><th>Default</th><th>Description</th></tr>
<tr><td><code>--cv</code></td><td><em>required</em></td><td>Path to CV file (PDF, DOCX, TXT)</td></tr>
<tr><td><code>--field</code></td><td><em>required</em></td><td>Research field(s), comma-separated</td></tr>
<tr><td><code>--location</code></td><td><code>Europe</code></td><td>Location filter</td></tr>
<tr><td><code>--type</code></td><td><code>phd</code></td><td>Position type</td></tr>
<tr><td><code>--min-score</code></td><td><code>60</code></td><td>Minimum match score to show</td></tr>
</table>
<h2>Python API</h2>
<pre><code><span class="kw">from</span> agent <span class="kw">import</span> JobAgent
agent = JobAgent(
model=<span class="st">"llama-3.1-8b-instant"</span>,
backend=<span class="st">"groq"</span>,
api_key=<span class="st">"gsk_..."</span>,
)
profile, profile_text = agent.parse_cv(<span class="st">"cv.pdf"</span>)
jobs = agent.search_jobs(field=<span class="st">"NLP"</span>, location=<span class="st">"Europe"</span>, position_type=<span class="st">"phd"</span>)
scored = agent.score_jobs(jobs, profile_text)
<span class="kw">for</span> job <span class="kw">in</span> scored[:5]:
m = job[<span class="st">"match"</span>]
<span class="nb">print</span>(m[<span class="st">"match_score"</span>], job[<span class="st">"title"</span>], job.get(<span class="st">"freshness"</span>))</code></pre>
</section>
<!-- ββ Sources ββββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section" id="sources">
<h1>Job Sources</h1>
<div class="card-grid">
<div class="card-sm">
<span class="icon-big">πͺπΊ</span>
<h4>Euraxess</h4>
<p>EU/worldwide research portal. Country-filtered via API parameters.</p>
</div>
<div class="card-sm">
<span class="icon-big">π€</span>
<h4>mlscientist.com</h4>
<p>ML & AI academic positions. 14 country categories supported.</p>
</div>
<div class="card-sm">
<span class="icon-big">π¬π§</span>
<h4>jobs.ac.uk</h4>
<p>UK academic jobs. Queried only when UK or Worldwide is selected.</p>
</div>
<div class="card-sm">
<span class="icon-big">π</span>
<h4>scholarshipdb.net</h4>
<p>Worldwide aggregator with 28k+ positions across all disciplines. Country-filtered via URL path.</p>
</div>
<div class="card-sm">
<span class="icon-big">π¬</span>
<h4>nature.com/careers</h4>
<p>Multidisciplinary global board. Keyword search + ISO country code filtering.</p>
</div>
</div>
<h2>Freshness filtering</h2>
<p>After scraping, PhdScout automatically removes:</p>
<ul>
<li>Postings with a <strong>posting date in a previous year</strong></li>
<li>Postings with a <strong>deadline already passed</strong></li>
<li>Jobs with no date info are kept (benefit of the doubt)</li>
</ul>
<h2>PhD eligibility gate</h2>
<p>Before scoring, PhdScout checks whether the candidate holds or is pursuing a PhD and enforces two caps on postdoc and fellowship positions:</p>
<table>
<tr><th>Candidate status</th><th>Postdoc / Fellowship score cap</th></tr>
<tr><td>No PhD detected in CV</td><td>β€ 30 β set to <em>skip</em></td></tr>
<tr><td>PhD in progress (candidate / student)</td><td>β€ 65</td></tr>
<tr><td>PhD completed</td><td>No cap</td></tr>
</table>
<div class="callout info">
<span class="callout-icon">βΉοΈ</span>
<p>This gate is enforced at two levels: in the LLM prompt (via <code>JOB_MATCHER_PROMPT</code>) and in code (<code>agent/matching/matcher.py</code>) as a safety net. PhD positions are always open to master's graduates β no cap applies.</p>
</div>
<h2>Adding a source</h2>
<p>Create a new file in <code>agent/search/scrapers/</code> that subclasses <code>BaseScraper</code>:</p>
<pre><code><span class="kw">from</span> agent.search.scrapers.base <span class="kw">import</span> BaseScraper
<span class="kw">class</span> MyScraper(BaseScraper):
name = <span class="st">"mysource"</span>
<span class="kw">def</span> scrape(self, field, location, position_type):
soup = self._fetch(<span class="st">f"https://example.com/jobs?q={field}"</span>)
<span class="kw">if</span> soup <span class="kw">is</span> <span class="nb">None</span>: <span class="kw">return</span> []
results = []
<span class="kw">for</span> card <span class="kw">in</span> soup.select(<span class="st">".job-card"</span>):
results.append({
<span class="st">"title"</span>: card.select_one(<span class="st">"h2"</span>).text,
<span class="st">"url"</span>: card.select_one(<span class="st">"a"</span>)[<span class="st">"href"</span>],
<span class="st">"posted"</span>: card.select_one(<span class="st">".date"</span>).text,
<span class="st">"source"</span>: self.name,
<span class="st">"type"</span>: self._detect_type(card.text, <span class="st">""</span>),
})
<span class="kw">return</span> results</code></pre>
<p>Then register it in <code>agent/search/searcher.py β _build_scrapers()</code>.</p>
</section>
<!-- ββ Configuration ββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section" id="config">
<h1>Configuration</h1>
<p>All settings live in <code>config.py</code>. Edit the file directly β no restart needed if using the CLI, restart the Gradio app after changes.</p>
<h2>LLM settings</h2>
<table>
<tr><th>Parameter</th><th>Default</th><th>Description</th></tr>
<tr><td><code>default_model</code></td><td><code>llama-3.1-8b-instant</code></td><td>Groq model to use</td></tr>
<tr><td><code>max_tokens</code></td><td><code>4096</code></td><td>Max tokens per LLM response</td></tr>
<tr><td><code>llm_backend</code></td><td><code>ollama</code></td><td>Backend: <code>groq</code> | <code>huggingface</code> | <code>ollama</code></td></tr>
</table>
<h2>Scraper settings</h2>
<table>
<tr><th>Parameter</th><th>Default</th><th>Description</th></tr>
<tr><td><code>scraper_delay</code></td><td><code>1.5</code> s</td><td>Polite delay between HTTP requests</td></tr>
<tr><td><code>max_results_per_source</code></td><td><code>20</code></td><td>Max listings fetched per source</td></tr>
</table>
<h2>Freshness thresholds</h2>
<table>
<tr><th>Parameter</th><th>Default</th><th>Description</th></tr>
<tr><td><code>recent_days</code></td><td><code>30</code></td><td>Days since posting β π’ Recent</td></tr>
<tr><td><code>deadline_warn_days</code></td><td><code>14</code></td><td>Days until deadline β π΄ Closing soon</td></tr>
</table>
<h2>UI defaults</h2>
<table>
<tr><th>Parameter</th><th>Default</th><th>Description</th></tr>
<tr><td><code>min_score_default</code></td><td><code>60</code></td><td>Default minimum match score slider value</td></tr>
</table>
<h2>Environment variables</h2>
<table>
<tr><th>Variable</th><th>Description</th></tr>
<tr><td><code>GROQ_API_KEY</code></td><td>Groq API key (takes priority over HF_TOKEN)</td></tr>
<tr><td><code>HF_TOKEN</code></td><td>HuggingFace token (fallback backend)</td></tr>
<tr><td><code>LLM_BACKEND</code></td><td>Override backend: <code>groq</code> | <code>huggingface</code> | <code>ollama</code></td></tr>
<tr><td><code>OUTPUT_DIR</code></td><td>Output directory for ZIP exports (default: <code>./output</code>)</td></tr>
</table>
</section>
<!-- ββ Prompts ββββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section" id="prompts">
<h1>Prompts</h1>
<p>All LLM prompts live in <code>agent/prompts/</code>. Each service has its own file β edit the relevant file to tune that part of the agent's behaviour.</p>
<div class="callout warn">
<span class="callout-icon">β οΈ</span>
<p>Prompts use Python <code>.format()</code> placeholders like <code>{profile}</code>. Keep all placeholders intact when editing.</p>
</div>
<h2>Available prompts</h2>
<table>
<tr><th>Constant</th><th>Used by</th><th>Controls</th></tr>
<tr><th colspan="3" style="background:var(--bg);font-size:12px;color:var(--text-secondary);font-weight:500;">File: <code>agent/prompts/cv_parser.py</code></th></tr>
<tr><td><code>CV_PARSER_SYSTEM</code><br><code>CV_PARSER_PROMPT</code></td><td><code>CVParser</code></td><td>How the CV is structured into JSON. Tweak to extract custom fields.</td></tr>
<tr><th colspan="3" style="background:var(--bg);font-size:12px;color:var(--text-secondary);font-weight:500;">File: <code>agent/prompts/job_matcher.py</code></th></tr>
<tr><td><code>JOB_MATCHER_SYSTEM</code><br><code>JOB_MATCHER_PROMPT</code></td><td><code>JobMatcher</code></td><td>Scoring criteria, eligibility gate, and scoring guide. Edit thresholds here.</td></tr>
<tr><th colspan="3" style="background:var(--bg);font-size:12px;color:var(--text-secondary);font-weight:500;">File: <code>agent/prompts/cv_tailor.py</code></th></tr>
<tr><td><code>CV_TAILOR_SYSTEM</code><br><code>CV_TAILOR_PROMPT</code></td><td><code>CVTailor</code></td><td>What tailoring hints to produce and how specific to be.</td></tr>
<tr><th colspan="3" style="background:var(--bg);font-size:12px;color:var(--text-secondary);font-weight:500;">File: <code>agent/prompts/cover_letter.py</code></th></tr>
<tr><td><code>COVER_LETTER_SYSTEM</code><br><code>COVER_LETTER_PROMPT</code></td><td><code>CoverLetterWriter</code></td><td>Letter style, length, structure, and language detection.</td></tr>
</table>
<h2>Example: changing the letter length</h2>
<p>In <code>agent/prompts/cover_letter.py</code>, find <code>COVER_LETTER_SYSTEM</code> and change:</p>
<pre><code><span class="cm"># Before</span>
The letter should be <span class="st">400-600 words (3-4 paragraphs)</span>.
<span class="cm"># After</span>
The letter should be <span class="st">250-350 words (2-3 paragraphs)</span>.</code></pre>
<h2>Example: stricter scoring</h2>
<p>In <code>JOB_MATCHER_PROMPT</code>, raise the thresholds in the scoring guide:</p>
<pre><code>Scoring guide:
85-100: Excellent β perfect research keyword overlap, recent publications
70-84: Good β strong overlap on primary research area
50-69: Partial β some overlap, transferable skills
0-49: Skip β different area or missing key requirements</code></pre>
</section>
<!-- ββ Architecture βββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section" id="architecture">
<h1>Architecture</h1>
<h2>Project structure</h2>
<div class="tree">
<span class="dir">PhDScout/</span>
βββ <span class="file">app.py</span> <span class="note"># Gradio web interface</span>
βββ <span class="file">config.py</span> <span class="note"># Runtime settings (model, thresholds, delays)</span>
βββ <span class="file">main.py</span> <span class="note"># CLI entry point</span>
βββ <span class="file">requirements.txt</span>
βββ <span class="dir">agent/</span>
β βββ <span class="file">__init__.py</span> <span class="note"># Public API: JobAgent, LLMQuotaError</span>
β βββ <span class="file">pipeline.py</span> <span class="note"># JobAgent orchestrator</span>
β βββ <span class="file">base_service.py</span> <span class="note"># BaseLLMService base class</span>
β βββ <span class="file">llm_client.py</span> <span class="note"># Groq / HuggingFace / Ollama client</span>
β βββ <span class="file">utils.py</span> <span class="note"># JSON parsing, shared helpers</span>
β βββ <span class="dir">prompts/</span> <span class="note"># LLM prompts β one file per service</span>
β β βββ <span class="file">cv_parser.py</span> <span class="note"># CV extraction prompts</span>
β β βββ <span class="file">job_matcher.py</span> <span class="note"># Scoring + eligibility gate prompts</span>
β β βββ <span class="file">cv_tailor.py</span> <span class="note"># Tailoring hints prompts</span>
β β βββ <span class="file">cover_letter.py</span> <span class="note"># Cover letter prompts</span>
β βββ <span class="dir">cv/</span> <span class="note"># CV-related services</span>
β β βββ <span class="file">parser.py</span> <span class="note"># CV extraction + LLM parsing</span>
β β βββ <span class="file">tailor.py</span> <span class="note"># Tailoring hints generator</span>
β β βββ <span class="file">cover_letter.py</span> <span class="note"># Cover letter writer</span>
β βββ <span class="dir">matching/</span> <span class="note"># Scoring engine</span>
β β βββ <span class="file">matcher.py</span> <span class="note"># JobMatcher + PhD eligibility cap</span>
β βββ <span class="dir">search/</span> <span class="note"># Job search infrastructure</span>
β βββ <span class="file">searcher.py</span> <span class="note"># JobSearcher (orchestrates scrapers)</span>
β βββ <span class="dir">scrapers/</span>
β βββ <span class="file">base.py</span> <span class="note"># BaseScraper ABC + shared helpers</span>
β βββ <span class="file">euraxess.py</span> <span class="note"># EU/worldwide research portal</span>
β βββ <span class="file">mlscientist.py</span> <span class="note"># ML & AI academic positions</span>
β βββ <span class="file">jobs_ac_uk.py</span> <span class="note"># UK academic jobs (UK/worldwide only)</span>
β βββ <span class="file">scholarshipdb.py</span> <span class="note"># Worldwide aggregator (28k+ positions)</span>
β βββ <span class="file">nature_careers.py</span> <span class="note"># nature.com/careers β multidisciplinary</span>
βββ <span class="dir">tests/</span> <span class="note"># 156 unit tests (pytest)</span>
</div>
<h2>Pipeline flow</h2>
<div class="card">
<p style="font-family:var(--font-mono);font-size:13px;line-height:2;color:var(--text);">
CV file<br>
β <span style="color:#98989d">CVParser.extract_raw_text()</span><br>
Raw text<br>
β <span style="color:#98989d">CVParser.parse() β LLM β CVProfile JSON</span><br>
β <span style="color:#98989d">CVParser.summarize() β profile_text</span><br>
profile_text<br>
β (in parallel with search)<br>
β <span style="color:#98989d">JobSearcher.search() β scrapers β deduplicate β filter stale β label freshness</span><br>
jobs[]<br>
β <span style="color:#98989d">JobMatcher.score_all() β LLM Γ N β sort by score</span><br>
scored_jobs[]<br>
β (per selected job)<br>
β <span style="color:#98989d">CVTailor.generate() β LLM β TailoringHints</span><br>
β <span style="color:#98989d">CoverLetterWriter.generate() β LLM β draft letter</span><br>
approved_jobs[] β ZIP export
</p>
</div>
<h2>LLM backends</h2>
<table>
<tr><th>Backend</th><th>env var</th><th>Notes</th></tr>
<tr><td><strong>Groq</strong> (recommended)</td><td><code>GROQ_API_KEY</code></td><td>Free tier, fast, OpenAI-compatible</td></tr>
<tr><td><strong>Ollama</strong></td><td>β</td><td>Local inference, set <code>LLM_BACKEND=ollama</code></td></tr>
<tr><td><strong>HuggingFace</strong></td><td><code>HF_TOKEN</code></td><td>Fallback, free tier has rate limits</td></tr>
</table>
</section>
<!-- ββ Deployment βββββββββββββββββββββββββββββββββββββββββββββββββββ -->
<section class="section" id="deployment">
<h1>Deployment</h1>
<h2>HuggingFace Spaces (recommended)</h2>
<div class="card">
<div class="steps">
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Fork or create a Space</strong>
<p>Go to <a href="https://huggingface.co/spaces" target="_blank">huggingface.co/spaces</a> β New Space β SDK: Gradio.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Push the code</strong>
<p>Add the Space as a remote and push: <code>git push space main</code></p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Set secrets</strong>
<p>In Space Settings β Variables and Secrets, add <code>GROQ_API_KEY</code>.</p>
</div>
</div>
<div class="step">
<div class="step-num"></div>
<div class="step-body">
<strong>Add HF frontmatter to README</strong>
<p>Run <code>./push_to_hf.sh</code> β it injects the required YAML frontmatter automatically.</p>
</div>
</div>
</div>
</div>
<h2>GitHub Pages (this documentation)</h2>
<div class="callout tip">
<span class="callout-icon">π‘</span>
<p>This documentation is a single HTML file at <code>docs/index.html</code> β no build step required.</p>
</div>
<p>To enable GitHub Pages:</p>
<ol>
<li>Go to your GitHub repo β <strong>Settings β Pages</strong></li>
<li>Source: <strong>Deploy from a branch</strong></li>
<li>Branch: <code>main</code> / folder: <code>/docs</code></li>
<li>Click <strong>Save</strong></li>
</ol>
<p>The docs will be live at <code>https://<username>.github.io/PhDScout</code>.</p>
<h2>Editing the docs</h2>
<p>To modify this documentation directly on GitHub:</p>
<ol>
<li>Go to your repo on GitHub</li>
<li>Navigate to <code>docs/index.html</code></li>
<li>Click the <strong>pencil icon</strong> (Edit this file)</li>
<li>Edit the HTML β each section is a <code><section class="section" id="..."></code> block</li>
<li>Commit directly to <code>main</code> β GitHub Pages rebuilds automatically</li>
</ol>
<div class="callout info">
<span class="callout-icon">βΉοΈ</span>
<p>The navigation links are wired by JavaScript at the bottom of the file. To add a new section: add a <code><button></code> in the sidebar and a matching <code><section></code> in the main area.</p>
</div>
</section>
</main>
<script>
function show(id, btn) {
document.querySelectorAll('.section').forEach(s => s.classList.remove('active'));
document.querySelectorAll('.nav-link').forEach(b => b.classList.remove('active'));
document.getElementById(id).classList.add('active');
btn.classList.add('active');
window.scrollTo({ top: 0, behavior: 'smooth' });
}
</script>
</body>
</html>
|