Spaces:
Sleeping
Sleeping
LinkedIn Sourcing Agent - Development Plan
π― Project Overview
Build an autonomous AI agent that sources LinkedIn profiles, scores candidates using a fit score algorithm, and generates personalized outreach messages.
Deadline: Monday 7 PM PST Time Budget: 2-3 hours Tech Stack: Python + FastAPI + Gemini + SQLite
π Core Requirements Analysis
1. LinkedIn Profile Discovery
- Input: Job description
- Output: Array of candidate profiles with basic data
- Methods: Google Search API, RapidAPI, or direct parsing
2. Candidate Scoring System
- Implement 6-category fit score rubric (100% total)
- Education (20%), Career Trajectory (20%), Company Relevance (15%)
- Experience Match (25%), Location Match (10%), Tenure (10%)
3. Personalized Outreach Generation
- AI-generated messages referencing candidate details
- Professional tone, job-specific customization
4. Scalability Features
- Multiple job processing
- Rate limiting management
- Minimal data storage
ποΈ Architecture Design
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Job Input βββββΆβ LinkedIn βββββΆβ Profile β
β (FastAPI) β β Search Engine β β Parser β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Outreach ββββββ Fit Score ββββββ Candidate β
β Generator β β Algorithm β β Data Store β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
π Development Timeline (2-3 hours)
Phase 1: Foundation (30 minutes)
- Set up project structure
- Install dependencies (FastAPI, google-generativeai, SQLite, requests)
- Create basic FastAPI endpoints
- Set up environment variables for API keys
Phase 2: LinkedIn Search (45 minutes)
- Implement Google Search API integration
- Create LinkedIn profile URL extraction
- Build basic profile data parser
- Add rate limiting and error handling
Phase 3: Fit Scoring Algorithm (45 minutes)
- Implement education scoring (20%)
- Implement career trajectory scoring (20%)
- Implement company relevance scoring (15%)
- Implement experience match scoring (25%)
- Implement location match scoring (10%)
- Implement tenure scoring (10%)
- Create weighted scoring function
Phase 4: Outreach Generation (30 minutes)
- Design prompt templates for LLM
- Implement personalized message generation
- Add candidate-specific references
- Ensure professional tone
Phase 5: Integration & Testing (30 minutes)
- Connect all components
- Test end-to-end pipeline
- Optimize performance
- Add error handling
Phase 6: Deployment & Documentation (30 minutes)
- Deploy to Hugging Face Spaces
- Create README with setup instructions
- Record demo video
- Write submission summary
π οΈ Technical Implementation Details
Project Structure
linkedin-agent/
βββ app/
β βββ __init__.py
β βββ main.py # FastAPI app
β βββ models.py # Pydantic models
β βββ services/
β β βββ linkedin_search.py
β β βββ scoring.py
β β βββ outreach.py
β β βββ database.py
β βββ utils/
β βββ config.py
β βββ helpers.py
βββ requirements.txt
βββ README.md
βββ .env
Key Dependencies
fastapi==0.104.1
uvicorn==0.24.0
google-generativeai==0.3.0
requests==2.31.0
python-dotenv==1.0.0
sqlite3 (built-in)
API Endpoints
POST /api/source-candidates
{
"job_description": "string",
"location": "string (optional)",
"max_candidates": "integer (default: 10)"
}
Response:
{
"job_id": "string",
"candidates_found": "integer",
"top_candidates": [
{
"name": "string",
"linkedin_url": "string",
"fit_score": "float",
"score_breakdown": "object",
"outreach_message": "string"
}
]
}
π― Fit Scoring Implementation
Education Scoring (20%)
def score_education(education_data):
elite_schools = ["MIT", "Stanford", "Harvard", "Berkeley", "CMU"]
strong_schools = ["UCLA", "USC", "Georgia Tech", "UIUC"]
if any(school in education_data for school in elite_schools):
return 9.5
elif any(school in education_data for school in strong_schools):
return 7.5
else:
return 5.5
Experience Match Scoring (25%)
def score_experience(candidate_skills, job_requirements):
# Use Gemini to compare skills and requirements
prompt = f"Rate match between skills: {candidate_skills} and requirements: {job_requirements}"
# Return score 1-10
π LinkedIn Search Strategy
Primary Method: Google Search API
def search_linkedin_profiles(job_description, location):
query = f'site:linkedin.com/in "{job_description}" "{location}"'
# Use Google Custom Search API
# Extract LinkedIn URLs from results
# Parse basic profile data
Fallback: Direct Parsing
- Use requests + BeautifulSoup for basic profile extraction
- Focus on public information only
- Implement respectful rate limiting
π€ LLM Integration
Gemini for Scoring & Outreach
def generate_outreach_message(candidate, job_description):
prompt = f"""
Generate a personalized LinkedIn outreach message for {candidate['name']}
based on their profile: {candidate['profile_data']}
For this job: {job_description}
Requirements:
- Professional tone
- Reference specific details from their profile
- Explain why they're a good fit
- Keep under 200 words
"""
π Data Storage
SQLite Schema
CREATE TABLE candidates (
id INTEGER PRIMARY KEY,
job_id TEXT,
name TEXT,
linkedin_url TEXT,
profile_data TEXT,
fit_score REAL,
score_breakdown TEXT,
outreach_message TEXT,
created_at TIMESTAMP
);
π Deployment Strategy
Hugging Face Spaces
- Use Gradio for simple UI
- FastAPI backend
- Free tier hosting
- Easy sharing and demo
Environment Variables
GOOGLE_API_KEY=your_key_here
GOOGLE_SEARCH_API_KEY=your_key_here
GOOGLE_SEARCH_ENGINE_ID=your_id_here
π― Success Metrics
MVP Requirements
- Find 10+ candidates for given job
- Score candidates with breakdown
- Generate personalized outreach
- Handle basic rate limiting
- Deploy working API
Bonus Features (if time permits)
- Multi-source data (GitHub, Twitter)
- Smart caching
- Batch processing
- Confidence scoring
β οΈ Risk Mitigation
Technical Risks
- LinkedIn rate limiting: Implement delays and user agents
- API costs: Use free tiers, implement caching
- Data quality: Graceful handling of incomplete profiles
Time Risks
- Scope creep: Focus on MVP first
- Integration issues: Test components individually
- Deployment problems: Use simple hosting (Hugging Face)
π Submission Checklist
- Working GitHub repository
- Clear README with setup instructions
- 3-minute demo video
- 500-word write-up
- Deployed API on Hugging Face
- Submit via Google Form
π‘ Optimization Tips
- Start with mock data to test scoring algorithm
- Use Cursor AI for boilerplate code generation
- Focus on pipeline architecture over perfect accuracy
- Comment code thoroughly to show thinking process
- Make it easily runnable for judges
π― Final Notes
- Priority: Working pipeline > perfect accuracy
- Focus: Architecture and approach over data quality
- Goal: Demonstrate ability to build production-ready systems
- Time: 2-3 hours maximum, keep it simple but functional
This plan provides a clear roadmap to build a functional LinkedIn Sourcing Agent within the time constraints while meeting all core requirements and positioning for the bonus features.