Spaces:

Kaito117
/

linkedin_profile_scoring

No application file

App Files Files Community

Kaito117 commited on Jun 30, 2025

Commit

3f7f9cc

1 Parent(s): 8cf6b7b

update readme, add diagrams

Browse files

Files changed (2) hide show

.env.example +1 -1
README.md +177 -33

.env.example CHANGED Viewed

@@ -3,4 +3,4 @@ MONGO_DATABASE=
 SERPAPI_KEY=
 GROQ_API_KEY=
 RAPIDAPI_API_KEY=
-DEV=False

 SERPAPI_KEY=
 GROQ_API_KEY=
 RAPIDAPI_API_KEY=
+DEBUG=False

README.md CHANGED Viewed

@@ -1,62 +1,206 @@
-# score_profiles
-AI-powered LinkedIn candidate sourcing and scoring microservice.
-## Features
-- FastAPI HTTP API with a `/jobs` endpoint
-- URL extraction, SerpAPI search, GitHub & LinkedIn profile clients
-- Profile data extraction & ranking via a `CandidateScorer`
-- CORS enabled for all origins
-- Comprehensive unit & integration tests using `pytest`, `respx` & FastAPI `TestClient`
-## Getting Started
-### Prerequisites
 - Python 3.12+
-- `uv` or `pip`
-- `.env` file copied from `.env.example`, populated with your api keys
 ### Installation
-```sh
-# Using pip
 pip install -r requirements.txt
-# or using poetry
 uv sync
-```
-### Configuration
-Copy `.env.example` to `.env` and populate with your own fields:
-Additionally, check is you want to change any configs in `config.py` (though the defaults are sensible).
-### Running the Service
-```sh
 python app/main.py
 ```
-Docker based setup is not ready yet.
-The API docs are available at `http://localhost:8000/docs`.
 ## Testing
-All HTTP calls are stubbed using [`respx`](https://github.com/lundberg/respx) and fixtures under `test/data/`.
-```sh
 pytest
 ```
-## Workflow
-1. Client POSTs to `/jobs` with `search_query` (job description).
-2. `LinkedInSourcingAgent` orchestrates:
-   - `SerpAPIClient.search` → get LinkedIn & GitHub URLs
-   - `LinkedInProfileClient.fetch_profile` & `GitHubClient.fetch_github_profile_html`
-   - `LinkedInProfileExtractor` & `GitHubProfileExtractor` → normalized profile dicts
-   - `CandidateScorer.batch_score_candidates` → rank & filter
-3. Returns top-N scored

+# LinkedIn Sourcing Agent
+An autonomous AI agent that sources LinkedIn profiles at scale, scores candidates using advanced fit algorithms, and generates personalized outreach messages - built for the Synapse AI Challenge.
+## Challenge Overview
+This project implements a complete LinkedIn sourcing pipeline that:
+- **Discovers** relevant LinkedIn profiles from job descriptions
+- **Scores** candidates using a comprehensive 6-factor rubric
+- **Generates** personalized AI-powered outreach messages
+- **Scales** to handle multiple jobs simultaneously
+## Key Features
+### Core Functionality
+- **Smart Profile Discovery**: Multi-source candidate sourcing via SerpAPI + Google Search
+- **Data collection and Processing**: Collect Linkedin data using RapidAPI, Github profile using HTTP request and HTML parsing
+- **Advanced Scoring Algorithm**: 6-factor rubric (Education, Trajectory, Company, Skills, Location, Tenure)
+- **AI-Powered Outreach**: Personalized LinkedIn messages using LLama (`Groq`)
+- **FastAPI backend**: FastAPI endpoint for instant results
+- **Multi-Source Enhancement**: Combines LinkedIn + GitHub data for improved scoring
+- **Smart Caching**: Intelligent caching to avoid re-fetching profiles
+- **Batch Processing**: Handles multiple jobs in parallel (asyncio)
+- **Confidence Scoring**: Shows confidence levels when data is incomplete
+## Architecture
+![Architecture](architecture.png)
+![Data flow](data_flow.png)
+## Tech Stack
+- **Language**: Python 3.12+
+- **Framework**: FastAPI
+- **LLM (to build)**: OpenAI GPT-4, o4-mini + Claude 4 + Gemini 2.5 Pro
+- **Search**: SerpAPI - Google Search
+- **Storage**: In-memory with JSON persistence
+- **Testing**: pytest + respx
+- **Data parsing**: BeautifulSoup for Github
+## Quick Start
+### Prerequisites
 - Python 3.12+
+- API Keys (OpenAI, SerpAPI)
 ### Installation
+```bash
+# Clone the repository
+git clone <your-repo-url>
+cd score_profiles
+# Install dependencies
 pip install -r requirements.txt
+# or using uv
 uv sync
+# Setup environment
+cp .env.example .env
+# Add your API keys to .env
+```
+### Environment Variables (at `.env.example`)
+```env
+MONGODB_URI=
+MONGO_DATABASE=
+SERPAPI_KEY=
+GROQ_API_KEY=
+RAPIDAPI_API_KEY=
+DEV=False
+```
+### Running the Agent
+```bash
+# Start the FastAPI server
 python app/main.py
+# API available at: http://localhost:8000
+# Interactive docs: http://localhost:8000/docs
 ```
+## Fit Scoring
+The scoring system evaluates candidates across 6 dimensions:
+| Factor | Weight | Scoring Criteria |
+|--------|--------|------------------|
+| **Education** | 20% | Elite schools (9-10), Strong schools (7-8), Standard (5-6) |
+| **Career Trajectory** | 20% | Clear progression (8-10), Steady growth (6-8), Limited (3-5) |
+| **Company Relevance** | 15% | Top tech (9-10), Relevant industry (7-8), Any experience (5-6) |
+| **Experience Match** | 25% | Perfect match (9-10), Strong overlap (7-8), Some relevance (5-6) |
+| **Location Match** | 10% | Exact city (10), Same metro (8), Remote-friendly (6) |
+| **Tenure** | 10% | 2-3 years avg (9-10), 1-2 years (6-8), Job hopping (3-5) |
+## API
+### Single Job Processing
+Use `FastAPI`'s built in docs endpoint for an interactive test.
+Or if you want to use a script:
+```python
+import requests
+response = requests.post("http://localhost:8000/jobs", json={
+    "search_query": "Software Engineer, ML Research\nWindsurf • Full Time • Mountain View, CA • On-site • $140,000 – $300,000 + Equity\nAbout the Company:\nWindsurf (formerly Codeium) is a Forbes AI 50 company building the future of developer productivity through AI. With over 200 employees and $243M raised across multiple rounds including a Series C, Windsurf provides \ncutting-edge in-editor autocomplete, chat assistants, and full IDEs powered by proprietary LLMs. Their user base spans hundreds of thousands of developers worldwide, reflecting strong\nproduct-market fit and commercial traction.\nRoles and Responsibilities:\nTrain and fine-tune LLMs focused on developer productivity\nDesign and prioritize experiments for product impact\nAnalyze results, conduct ablation studies, and document findings\nConvert ML discoveries into scalable product features\nParticipate in the ML reading group and contribute to knowledge sharing\nJob Requirements:\n2+ years in software engineering with fast promotions\nStrong software engineering and systems thinking skills\nProven experience training and iterating on large production neural networks\nStrong GPA from a top CS undergrad program (MIT, Stanford, CMU, UIUC, etc.)\nFamiliarity with tools like Copilot, ChatGPT, or Windsurf is preferred\nDeep curiosity for the code generation space\nExcellent documentation and experimentation discipline\nPrior experience with applied research (not purely academic publishing)\nMust be able to work in Mountain View, CA full-time onsite\nExcited to build product-facing features from ML research\nInterview Process\nRecruiter Chat (15 min)\nVirtual Algorithm Round (LeetCode-style, 45 min)\nVirtual ML Case Study (1 hour)\nOnsite (3 hours): Additional ML case, implementation project, and culture interview\nOffer Extended",
+    "max_candidates": 50,
+  "include_github": false,
+  "confidence_threshold": 0.3
+})
+results = response.json()
+```
+### Sample Response
+```json
+{
+  "job_id": "backend-fintech-sf-2024",
+  "candidates_found": 25,
+  "processing_time": "45.2s",
+  "top_candidates": [
+    {
+      "name": "Jane Smith",
+      "linkedin_url": "linkedin.com/in/janesmith",
+      "fit_score": 8.5,
+      "confidence": 0.92,
+      "score_breakdown": {
+        "education": 9.0,
+        "trajectory": 8.0,
+        "company": 8.5,
+        "skills": 9.0,
+        "location": 10.0,
+        "tenure": 7.0
+      },
+      "outreach_message": "Hi Jane, I noticed your impressive 6 years at Stripe building payment infrastructure. Your experience with distributed systems and fintech regulations makes you a perfect fit for our Senior Backend Engineer role...",
+      "key_highlights": [
+        "6 years at Stripe in payments infrastructure",
+        "Stanford CS degree",
+        "Expert in distributed systems & microservices"
+      ]
+    }
+  ]
+}
+```
 ## Testing
+Comprehensive test suite with mocked HTTP calls:
+```bash
+# Run all tests
 pytest
+# Run with coverage
+pytest --cov=app
+# Run specific test categories
+pytest tests/test_scoring.py
+pytest tests/test_integration.py
 ```
+## Tradeoffs
+- Not including Twitter or personal website - high variance, low signal
+- Not including Github - false positives (getting company profiles recommended)
+### Sample Generated Outreach
+*"Hi Alex, I came across your profile and was impressed by your work at OpenAI on transformer architectures. Your research background in neural code generation and experience with large-scale ML training makes you an ideal candidate for Windsurf's ML Research Engineer role. We're building the next generation of AI-powered developer tools - would love to discuss how your expertise could accelerate our LLM training initiatives..."*
+## Scaling Strategy
+For production scale (100s of jobs):
+1. **Concurrency**: Asyncio is good, unless you have multiple cpu cores (use multiprocessing + asyncio - multiple docker containers)
+2. **Queue System**: Redis/Celery as async task queue (partial setup done)
+3. **Database**: MongoDB for intermediate and final results storage
+4. **Rate Limiting**: Intelligent backoff with multiple API key rotation
+5. **Monitoring**: Comprehensive logging and metrics (Prometheus and Grafana, Otel)
+## Future Enhancements
+- [ ] **Database Integration**: MongoDB integration with motor (asynchronous)
+- [ ] **Dockerization**: For ease of
+- [ ] **Advanced Deduplication**: Bloom filters for large-scale URL dedup
+- [ ] **ML Enhancement**: Custom embedding models for better skill matching
+- [ ] **Multi-platform**: Improve on Github integration, add Twitter integration
+- [ ] **A/B Testing**: Message, prompt effectiveness tracking
+Built for the Synapse AI Challenge. Code structure designed for easy extension and modification.
+## License
+MIT License - Built for challenge purposes.
+---
+**Demo Video**: [Link to 3-minute demo]
+**Live API**: [HuggingFace Space URL]