HipFil98 Claude Sonnet 4.6 commited on
Commit
29fd3cd
Β·
1 Parent(s): 1579508

docs: update README and docs with new scrapers and location filter fixes

Browse files

- README: add scholarshipdb.net and nature.com/careers to Credits
- README: add new scraper files to project structure tree
- docs: fix stray 'si' characters at start of index.html
- docs: update multi-source card (3 β†’ 5 sources)
- docs: update "Search job boards" step to list all 5 sources
- docs: add scholarshipdb and nature.com/careers source cards
- docs: update architecture tree with new scraper files
- docs: update test count 125 β†’ 156

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (2) hide show
  1. README.md +7 -3
  2. docs/index.html +22 -10
README.md CHANGED
@@ -94,9 +94,11 @@ PhDScout/
94
  β”œβ”€β”€ searcher.py # JobSearcher (orchestrates scrapers)
95
  └── scrapers/
96
  β”œβ”€β”€ base.py # BaseScraper ABC + shared helpers
97
- β”œβ”€β”€ euraxess.py
98
- β”œβ”€β”€ mlscientist.py
99
- └── jobs_ac_uk.py
 
 
100
  ```
101
 
102
  ---
@@ -116,6 +118,8 @@ Job data sourced from:
116
  - [Euraxess](https://euraxess.ec.europa.eu) β€” European Commission portal for research careers
117
  - [mlscientist.com](https://mlscientist.com) β€” ML & AI academic job board
118
  - [jobs.ac.uk](https://www.jobs.ac.uk) β€” UK academic jobs portal
 
 
119
 
120
  LLM inference powered by [Groq](https://groq.com) free API.
121
 
 
94
  β”œβ”€β”€ searcher.py # JobSearcher (orchestrates scrapers)
95
  └── scrapers/
96
  β”œβ”€β”€ base.py # BaseScraper ABC + shared helpers
97
+ β”œβ”€β”€ euraxess.py # EU/worldwide research portal
98
+ β”œβ”€β”€ mlscientist.py # ML & AI academic positions
99
+ β”œβ”€β”€ jobs_ac_uk.py # UK academic jobs (UK/worldwide only)
100
+ β”œβ”€β”€ scholarshipdb.py # Worldwide aggregator (28k+ positions)
101
+ └── nature_careers.py # Nature.com/careers β€” multidisciplinary
102
  ```
103
 
104
  ---
 
118
  - [Euraxess](https://euraxess.ec.europa.eu) β€” European Commission portal for research careers
119
  - [mlscientist.com](https://mlscientist.com) β€” ML & AI academic job board
120
  - [jobs.ac.uk](https://www.jobs.ac.uk) β€” UK academic jobs portal
121
+ - [scholarshipdb.net](https://scholarshipdb.net) β€” Worldwide academic jobs and scholarships aggregator
122
+ - [nature.com/careers](https://www.nature.com/naturecareers) β€” Multidisciplinary global research job board
123
 
124
  LLM inference powered by [Groq](https://groq.com) free API.
125
 
docs/index.html CHANGED
@@ -441,7 +441,7 @@
441
  <div class="card-sm">
442
  <span class="icon-big">πŸ”</span>
443
  <h4>Multi-source Search</h4>
444
- <p>Euraxess, mlscientist.com, jobs.ac.uk searched simultaneously</p>
445
  </div>
446
  <div class="card-sm">
447
  <span class="icon-big">πŸ€–</span>
@@ -474,7 +474,7 @@
474
  <div class="step-num"></div>
475
  <div class="step-body">
476
  <strong>Search job boards</strong>
477
- <p>PhdScout queries Euraxess, mlscientist.com, and jobs.ac.uk in parallel, then deduplicates and filters by recency (expired listings discarded).</p>
478
  </div>
479
  </div>
480
  <div class="step">
@@ -696,17 +696,27 @@ scored = agent.score_jobs(jobs, profile_text)
696
  <div class="card-sm">
697
  <span class="icon-big">πŸ‡ͺπŸ‡Ί</span>
698
  <h4>Euraxess</h4>
699
- <p>EU/worldwide research portal. Country-filtered.</p>
700
  </div>
701
  <div class="card-sm">
702
  <span class="icon-big">πŸ€–</span>
703
  <h4>mlscientist.com</h4>
704
- <p>ML &amp; AI academic positions worldwide.</p>
705
  </div>
706
  <div class="card-sm">
707
  <span class="icon-big">πŸ‡¬πŸ‡§</span>
708
  <h4>jobs.ac.uk</h4>
709
- <p>UK academic jobs. Queried only when UK location is selected.</p>
 
 
 
 
 
 
 
 
 
 
710
  </div>
711
  </div>
712
 
@@ -868,11 +878,13 @@ The letter should be <span class="st">250-350 words (2-3 paragraphs)</span>.</co
868
  β”‚ └── <span class="dir">search/</span> <span class="note"># Job search infrastructure</span>
869
  β”‚ β”œβ”€β”€ <span class="file">searcher.py</span> <span class="note"># JobSearcher (orchestrates scrapers)</span>
870
  β”‚ └── <span class="dir">scrapers/</span>
871
- β”‚ β”œβ”€β”€ <span class="file">base.py</span> <span class="note"># BaseScraper ABC + shared helpers</span>
872
- β”‚ β”œβ”€β”€ <span class="file">euraxess.py</span> <span class="note"># EuraxessScraper</span>
873
- β”‚ β”œβ”€β”€ <span class="file">mlscientist.py</span> <span class="note"># MLScientistScraper</span>
874
- β”‚ └── <span class="file">jobs_ac_uk.py</span> <span class="note"># JobsAcUkScraper</span>
875
- └── <span class="dir">tests/</span> <span class="note"># 125 unit tests (pytest)</span>
 
 
876
  </div>
877
 
878
  <h2>Pipeline flow</h2>
 
441
  <div class="card-sm">
442
  <span class="icon-big">πŸ”</span>
443
  <h4>Multi-source Search</h4>
444
+ <p>5 job boards searched simultaneously β€” Europe, worldwide, and country-specific</p>
445
  </div>
446
  <div class="card-sm">
447
  <span class="icon-big">πŸ€–</span>
 
474
  <div class="step-num"></div>
475
  <div class="step-body">
476
  <strong>Search job boards</strong>
477
+ <p>PhdScout queries Euraxess, mlscientist.com, jobs.ac.uk, scholarshipdb.net, and nature.com/careers in parallel, then deduplicates and filters by recency (expired listings discarded).</p>
478
  </div>
479
  </div>
480
  <div class="step">
 
696
  <div class="card-sm">
697
  <span class="icon-big">πŸ‡ͺπŸ‡Ί</span>
698
  <h4>Euraxess</h4>
699
+ <p>EU/worldwide research portal. Country-filtered via API parameters.</p>
700
  </div>
701
  <div class="card-sm">
702
  <span class="icon-big">πŸ€–</span>
703
  <h4>mlscientist.com</h4>
704
+ <p>ML &amp; AI academic positions. 14 country categories supported.</p>
705
  </div>
706
  <div class="card-sm">
707
  <span class="icon-big">πŸ‡¬πŸ‡§</span>
708
  <h4>jobs.ac.uk</h4>
709
+ <p>UK academic jobs. Queried only when UK or Worldwide is selected.</p>
710
+ </div>
711
+ <div class="card-sm">
712
+ <span class="icon-big">🌍</span>
713
+ <h4>scholarshipdb.net</h4>
714
+ <p>Worldwide aggregator with 28k+ positions across all disciplines. Country-filtered via URL path.</p>
715
+ </div>
716
+ <div class="card-sm">
717
+ <span class="icon-big">πŸ”¬</span>
718
+ <h4>nature.com/careers</h4>
719
+ <p>Multidisciplinary global board. Keyword search + ISO country code filtering.</p>
720
  </div>
721
  </div>
722
 
 
878
  β”‚ └── <span class="dir">search/</span> <span class="note"># Job search infrastructure</span>
879
  β”‚ β”œβ”€β”€ <span class="file">searcher.py</span> <span class="note"># JobSearcher (orchestrates scrapers)</span>
880
  β”‚ └── <span class="dir">scrapers/</span>
881
+ β”‚ β”œβ”€β”€ <span class="file">base.py</span> <span class="note"># BaseScraper ABC + shared helpers</span>
882
+ β”‚ β”œβ”€β”€ <span class="file">euraxess.py</span> <span class="note"># EU/worldwide research portal</span>
883
+ β”‚ β”œβ”€β”€ <span class="file">mlscientist.py</span> <span class="note"># ML &amp; AI academic positions</span>
884
+ β”‚ β”œβ”€β”€ <span class="file">jobs_ac_uk.py</span> <span class="note"># UK academic jobs (UK/worldwide only)</span>
885
+ β”‚ β”œβ”€β”€ <span class="file">scholarshipdb.py</span> <span class="note"># Worldwide aggregator (28k+ positions)</span>
886
+ β”‚ └── <span class="file">nature_careers.py</span> <span class="note"># nature.com/careers β€” multidisciplinary</span>
887
+ └── <span class="dir">tests/</span> <span class="note"># 156 unit tests (pytest)</span>
888
  </div>
889
 
890
  <h2>Pipeline flow</h2>