# LinkedIn Sourcing Agent - Detailed Development Phases ## 🎯 Project Overview **Goal**: Build LinkedIn Sourcing Agent in 2-3 hours **Deadline**: Monday 7 PM PST **Tech Stack**: Python + FastAPI + Gemini + SQLite --- ## 📋 Phase 1: Project Foundation (30 minutes) ### **Objective**: Set up basic project structure and dependencies ### **Tasks** (30 min total) - [ ] **Project Setup** (10 min) - Create project directory structure - Initialize git repository - Create virtual environment - Set up `.env` file for API keys - [ ] **Dependencies** (10 min) - Install FastAPI, uvicorn, google-generativeai, requests, python-dotenv - Create `requirements.txt` - Test basic imports - [ ] **Basic FastAPI Setup** (10 min) - Create main FastAPI app (`app/main.py`) - Set up basic health check endpoint - Test server startup ### **Deliverables** - [ ] Working FastAPI server - [ ] `requirements.txt` file - [ ] Basic project structure - [ ] Environment variables configured ### **Files to Create** ``` linkedin-agent/ ├── app/ │ ├── __init__.py │ ├── main.py │ └── models.py ├── requirements.txt ├── .env └── README.md ``` --- ## 🔍 Phase 2: LinkedIn Search Engine (45 minutes) ### **Objective**: Implement LinkedIn profile discovery functionality ### **Tasks** (45 min total) - [ ] **Google Search Integration** (20 min) - Set up Google Custom Search API - Create search function for LinkedIn profiles - Implement query building from job description - Add location filtering - [ ] **Profile URL Extraction** (15 min) - Parse search results for LinkedIn URLs - Filter valid profile URLs - Extract basic profile information from snippets - Handle rate limiting (1 request per 2 seconds) - [ ] **Basic Profile Parser** (10 min) - Extract name, headline, location from search results - Create candidate data structure - Add error handling for malformed data ### **Deliverables** - [ ] Function to search LinkedIn profiles - [ ] Basic profile data extraction - [ ] Rate limiting implementation - [ ] Error handling for search failures ### **Files to Create** ``` app/ ├── services/ │ ├── __init__.py │ └── linkedin_search.py └── utils/ ├── __init__.py └── config.py ``` ### **Key Functions** ```python def search_linkedin_profiles(job_description: str, location: str = None) -> List[Dict] def extract_profile_data(search_results: List) -> List[Dict] def build_search_query(job_description: str, location: str) -> str ``` --- ## 📊 Phase 3: Fit Scoring Algorithm (45 minutes) ### **Objective**: Implement comprehensive candidate scoring system ### **Tasks** (45 min total) - [ ] **Education Scoring** (8 min) - Define elite and strong school lists - Implement education score calculation (20% weight) - Handle missing education data - [ ] **Career Trajectory Scoring** (8 min) - Analyze job progression patterns - Score based on title advancement (20% weight) - Handle career changes and gaps - [ ] **Company Relevance Scoring** (6 min) - Define top tech companies list - Score based on company tier (15% weight) - Handle startup vs. big tech weighting - [ ] **Experience Match Scoring** (10 min) - Use Gemini to compare skills with job requirements (25% weight) - Implement skill matching algorithm - Handle keyword extraction and matching - [ ] **Location & Tenure Scoring** (8 min) - Location match scoring (10% weight) - Tenure analysis (10% weight) - Handle remote work preferences - [ ] **Weighted Score Calculation** (5 min) - Combine all scores with proper weights - Generate score breakdown - Normalize final scores (1-10 scale) ### **Deliverables** - [ ] Complete scoring algorithm - [ ] Score breakdown for each candidate - [ ] Weighted final scores - [ ] Handling of missing data ### **Files to Create** ``` app/services/scoring.py ``` ### **Key Functions** ```python def score_candidates(candidates: List[Dict], job_description: str) -> List[Dict] def calculate_education_score(education_data: str) -> float def calculate_experience_match(candidate_skills: str, job_requirements: str) -> float def calculate_weighted_score(breakdown: Dict) -> float ``` --- ## 💬 Phase 4: Outreach Generation (30 minutes) ### **Objective**: Create personalized LinkedIn outreach messages ### **Tasks** (30 min total) - [ ] **Prompt Engineering** (10 min) - Design effective prompt templates - Include candidate-specific details - Ensure professional tone requirements - Set message length constraints - [ ] **Message Generation** (15 min) - Implement Gemini integration for message creation - Generate personalized messages for top candidates - Include specific profile references - Add job-specific customization - [ ] **Message Quality Control** (5 min) - Validate message length and tone - Ensure personalization elements - Add fallback for generation failures ### **Deliverables** - [ ] Personalized outreach messages - [ ] Professional tone validation - [ ] Candidate-specific references - [ ] Error handling for message generation ### **Files to Create** ``` app/services/outreach.py ``` ### **Key Functions** ```python def generate_outreach_messages(candidates: List[Dict], job_description: str) -> List[Dict] def create_personalized_message(candidate: Dict, job_description: str) -> str def validate_message_quality(message: str) -> bool ``` --- ## 🔗 Phase 5: Integration & Testing (30 minutes) ### **Objective**: Connect all components and test end-to-end functionality ### **Tasks** (30 min total) - [ ] **API Integration** (15 min) - Connect LinkedIn search with scoring - Integrate outreach generation - Create main API endpoint - Add request/response models - [ ] **Data Flow Testing** (10 min) - Test complete pipeline with sample data - Verify data transformations - Check error handling - Validate output format - [ ] **Performance Optimization** (5 min) - Add basic caching - Optimize API calls - Implement concurrent processing where possible ### **Deliverables** - [ ] Working end-to-end pipeline - [ ] Main API endpoint functional - [ ] Error handling throughout - [ ] Performance optimizations ### **Files to Update** ``` app/main.py (add main endpoint) app/models.py (add request/response models) ``` ### **Key Endpoint** ```python POST /api/source-candidates { "job_description": "string", "location": "string (optional)", "max_candidates": "integer (default: 10)" } ``` --- ## 🚀 Phase 6: Deployment & Documentation (30 minutes) ### **Objective**: Deploy application and create submission materials ### **Tasks** (30 min total) - [ ] **Hugging Face Deployment** (15 min) - Set up Hugging Face Spaces - Configure Gradio interface - Deploy FastAPI backend - Test deployed application - [ ] **Documentation** (10 min) - Create comprehensive README - Add setup instructions - Document API usage - Include example requests - [ ] **Submission Preparation** (5 min) - Record demo video (3 minutes) - Write 500-word summary - Prepare GitHub repository - Test submission checklist ### **Deliverables** - [ ] Deployed API on Hugging Face - [ ] Complete README documentation - [ ] Demo video recording - [ ] Submission write-up ### **Files to Create** ``` README.md (comprehensive) demo_video.mp4 submission_summary.md ``` --- ## 🎯 Phase 7: Bonus Features (If Time Permits) ### **Objective**: Implement additional features for extra points ### **Tasks** (Optional - 30 min) - [ ] **Multi-Source Enhancement** (15 min) - Add GitHub profile integration - Include Twitter/X profile data - Enhance scoring with additional sources - [ ] **Smart Caching** (10 min) - Implement Redis or file-based caching - Cache search results and scores - Add cache invalidation logic - [ ] **Batch Processing** (5 min) - Handle multiple jobs simultaneously - Implement job queue system - Add progress tracking ### **Deliverables** - [ ] Enhanced data sources - [ ] Caching system - [ ] Batch processing capability --- ## 📋 Phase Completion Checklist ### **Phase 1 - Foundation** ✅ - [ ] Project structure created - [ ] Dependencies installed - [ ] FastAPI server running - [ ] Environment configured ### **Phase 2 - LinkedIn Search** ✅ - [ ] Google Search API integrated - [ ] Profile URLs extracted - [ ] Basic data parsed - [ ] Rate limiting implemented ### **Phase 3 - Scoring** ✅ - [ ] All 6 scoring categories implemented - [ ] Weighted scoring working - [ ] Score breakdown generated - [ ] Missing data handled ### **Phase 4 - Outreach** ✅ - [ ] Message generation working - [ ] Personalization implemented - [ ] Professional tone achieved - [ ] Error handling added ### **Phase 5 - Integration** ✅ - [ ] End-to-end pipeline working - [ ] API endpoint functional - [ ] Error handling complete - [ ] Performance optimized ### **Phase 6 - Deployment** ✅ - [ ] Hugging Face deployment live - [ ] Documentation complete - [ ] Demo video recorded - [ ] Submission ready ### **Phase 7 - Bonus** (Optional) - [ ] Multi-source data added - [ ] Caching implemented - [ ] Batch processing working --- ## ⚠️ Risk Mitigation by Phase ### **Phase 1 Risks** - **API key issues**: Have backup API providers ready - **Environment setup**: Use virtual environment best practices ### **Phase 2 Risks** - **Rate limiting**: Implement delays and user agents - **Search failures**: Add fallback search methods - **Data quality**: Graceful handling of incomplete profiles ### **Phase 3 Risks** - **Scoring accuracy**: Focus on algorithm over perfect data - **LLM costs**: Use efficient prompts and caching - **Missing data**: Implement default scores ### **Phase 4 Risks** - **Message quality**: Add validation and fallbacks - **LLM failures**: Implement retry logic - **Personalization**: Use available data effectively ### **Phase 5 Risks** - **Integration issues**: Test components individually first - **Performance**: Start simple, optimize later - **Error handling**: Comprehensive try-catch blocks ### **Phase 6 Risks** - **Deployment issues**: Use simple hosting (Hugging Face) - **Documentation**: Keep it clear and concise - **Time pressure**: Prioritize working demo over perfection --- ## 🎯 Success Criteria by Phase ### **Phase 1 Success** - Server starts without errors - All dependencies resolve - Basic endpoint responds ### **Phase 2 Success** - Can find LinkedIn profiles - Extracts basic profile data - Handles rate limiting gracefully ### **Phase 3 Success** - Generates scores for all candidates - Provides score breakdown - Handles edge cases ### **Phase 4 Success** - Creates personalized messages - Maintains professional tone - References candidate details ### **Phase 5 Success** - Complete pipeline works end-to-end - API returns expected format - Error handling works ### **Phase 6 Success** - Application deployed and accessible - Documentation clear and complete - Ready for submission --- ## 💡 Tips for Each Phase ### **Phase 1**: Start simple, get the foundation right ### **Phase 2**: Focus on getting any LinkedIn data, not perfect data ### **Phase 3**: Implement scoring logic first, optimize later ### **Phase 4**: Use templates and prompts effectively ### **Phase 5**: Test each component before integration ### **Phase 6**: Prioritize working demo over perfect code This phased approach ensures systematic development while maintaining focus on the MVP requirements and positioning for bonus features.