# LinkedIn Sourcing Agent - Development Plan ## 🎯 Project Overview Build an autonomous AI agent that sources LinkedIn profiles, scores candidates using a fit score algorithm, and generates personalized outreach messages. **Deadline**: Monday 7 PM PST **Time Budget**: 2-3 hours **Tech Stack**: Python + FastAPI + Gemini + SQLite ## πŸ“‹ Core Requirements Analysis ### 1. **LinkedIn Profile Discovery** - Input: Job description - Output: Array of candidate profiles with basic data - Methods: Google Search API, RapidAPI, or direct parsing ### 2. **Candidate Scoring System** - Implement 6-category fit score rubric (100% total) - Education (20%), Career Trajectory (20%), Company Relevance (15%) - Experience Match (25%), Location Match (10%), Tenure (10%) ### 3. **Personalized Outreach Generation** - AI-generated messages referencing candidate details - Professional tone, job-specific customization ### 4. **Scalability Features** - Multiple job processing - Rate limiting management - Minimal data storage ## πŸ—οΈ Architecture Design ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Job Input │───▢│ LinkedIn │───▢│ Profile β”‚ β”‚ (FastAPI) β”‚ β”‚ Search Engine β”‚ β”‚ Parser β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Outreach │◀───│ Fit Score │◀───│ Candidate β”‚ β”‚ Generator β”‚ β”‚ Algorithm β”‚ β”‚ Data Store β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## πŸ“… Development Timeline (2-3 hours) ### Phase 1: Foundation (30 minutes) - [ ] Set up project structure - [ ] Install dependencies (FastAPI, google-generativeai, SQLite, requests) - [ ] Create basic FastAPI endpoints - [ ] Set up environment variables for API keys ### Phase 2: LinkedIn Search (45 minutes) - [ ] Implement Google Search API integration - [ ] Create LinkedIn profile URL extraction - [ ] Build basic profile data parser - [ ] Add rate limiting and error handling ### Phase 3: Fit Scoring Algorithm (45 minutes) - [ ] Implement education scoring (20%) - [ ] Implement career trajectory scoring (20%) - [ ] Implement company relevance scoring (15%) - [ ] Implement experience match scoring (25%) - [ ] Implement location match scoring (10%) - [ ] Implement tenure scoring (10%) - [ ] Create weighted scoring function ### Phase 4: Outreach Generation (30 minutes) - [ ] Design prompt templates for LLM - [ ] Implement personalized message generation - [ ] Add candidate-specific references - [ ] Ensure professional tone ### Phase 5: Integration & Testing (30 minutes) - [ ] Connect all components - [ ] Test end-to-end pipeline - [ ] Optimize performance - [ ] Add error handling ### Phase 6: Deployment & Documentation (30 minutes) - [ ] Deploy to Hugging Face Spaces - [ ] Create README with setup instructions - [ ] Record demo video - [ ] Write submission summary ## πŸ› οΈ Technical Implementation Details ### Project Structure ``` linkedin-agent/ β”œβ”€β”€ app/ β”‚ β”œβ”€β”€ __init__.py β”‚ β”œβ”€β”€ main.py # FastAPI app β”‚ β”œβ”€β”€ models.py # Pydantic models β”‚ β”œβ”€β”€ services/ β”‚ β”‚ β”œβ”€β”€ linkedin_search.py β”‚ β”‚ β”œβ”€β”€ scoring.py β”‚ β”‚ β”œβ”€β”€ outreach.py β”‚ β”‚ └── database.py β”‚ └── utils/ β”‚ β”œβ”€β”€ config.py β”‚ └── helpers.py β”œβ”€β”€ requirements.txt β”œβ”€β”€ README.md └── .env ``` ### Key Dependencies ```python fastapi==0.104.1 uvicorn==0.24.0 google-generativeai==0.3.0 requests==2.31.0 python-dotenv==1.0.0 sqlite3 (built-in) ``` ### API Endpoints ```python POST /api/source-candidates { "job_description": "string", "location": "string (optional)", "max_candidates": "integer (default: 10)" } Response: { "job_id": "string", "candidates_found": "integer", "top_candidates": [ { "name": "string", "linkedin_url": "string", "fit_score": "float", "score_breakdown": "object", "outreach_message": "string" } ] } ``` ## 🎯 Fit Scoring Implementation ### Education Scoring (20%) ```python def score_education(education_data): elite_schools = ["MIT", "Stanford", "Harvard", "Berkeley", "CMU"] strong_schools = ["UCLA", "USC", "Georgia Tech", "UIUC"] if any(school in education_data for school in elite_schools): return 9.5 elif any(school in education_data for school in strong_schools): return 7.5 else: return 5.5 ``` ### Experience Match Scoring (25%) ```python def score_experience(candidate_skills, job_requirements): # Use Gemini to compare skills and requirements prompt = f"Rate match between skills: {candidate_skills} and requirements: {job_requirements}" # Return score 1-10 ``` ## πŸ” LinkedIn Search Strategy ### Primary Method: Google Search API ```python def search_linkedin_profiles(job_description, location): query = f'site:linkedin.com/in "{job_description}" "{location}"' # Use Google Custom Search API # Extract LinkedIn URLs from results # Parse basic profile data ``` ### Fallback: Direct Parsing - Use requests + BeautifulSoup for basic profile extraction - Focus on public information only - Implement respectful rate limiting ## πŸ€– LLM Integration ### Gemini for Scoring & Outreach ```python def generate_outreach_message(candidate, job_description): prompt = f""" Generate a personalized LinkedIn outreach message for {candidate['name']} based on their profile: {candidate['profile_data']} For this job: {job_description} Requirements: - Professional tone - Reference specific details from their profile - Explain why they're a good fit - Keep under 200 words """ ``` ## πŸ“Š Data Storage ### SQLite Schema ```sql CREATE TABLE candidates ( id INTEGER PRIMARY KEY, job_id TEXT, name TEXT, linkedin_url TEXT, profile_data TEXT, fit_score REAL, score_breakdown TEXT, outreach_message TEXT, created_at TIMESTAMP ); ``` ## πŸš€ Deployment Strategy ### Hugging Face Spaces - Use Gradio for simple UI - FastAPI backend - Free tier hosting - Easy sharing and demo ### Environment Variables ```bash GOOGLE_API_KEY=your_key_here GOOGLE_SEARCH_API_KEY=your_key_here GOOGLE_SEARCH_ENGINE_ID=your_id_here ``` ## 🎯 Success Metrics ### MVP Requirements - [ ] Find 10+ candidates for given job - [ ] Score candidates with breakdown - [ ] Generate personalized outreach - [ ] Handle basic rate limiting - [ ] Deploy working API ### Bonus Features (if time permits) - [ ] Multi-source data (GitHub, Twitter) - [ ] Smart caching - [ ] Batch processing - [ ] Confidence scoring ## ⚠️ Risk Mitigation ### Technical Risks - **LinkedIn rate limiting**: Implement delays and user agents - **API costs**: Use free tiers, implement caching - **Data quality**: Graceful handling of incomplete profiles ### Time Risks - **Scope creep**: Focus on MVP first - **Integration issues**: Test components individually - **Deployment problems**: Use simple hosting (Hugging Face) ## πŸ“ Submission Checklist - [ ] Working GitHub repository - [ ] Clear README with setup instructions - [ ] 3-minute demo video - [ ] 500-word write-up - [ ] Deployed API on Hugging Face - [ ] Submit via Google Form ## πŸ’‘ Optimization Tips 1. **Start with mock data** to test scoring algorithm 2. **Use Cursor AI** for boilerplate code generation 3. **Focus on pipeline architecture** over perfect accuracy 4. **Comment code thoroughly** to show thinking process 5. **Make it easily runnable** for judges ## 🎯 Final Notes - **Priority**: Working pipeline > perfect accuracy - **Focus**: Architecture and approach over data quality - **Goal**: Demonstrate ability to build production-ready systems - **Time**: 2-3 hours maximum, keep it simple but functional This plan provides a clear roadmap to build a functional LinkedIn Sourcing Agent within the time constraints while meeting all core requirements and positioning for the bonus features.