Spaces:
Sleeping
Sleeping
| # LinkedIn Sourcing Agent - Development Plan | |
| ## π― Project Overview | |
| Build an autonomous AI agent that sources LinkedIn profiles, scores candidates using a fit score algorithm, and generates personalized outreach messages. | |
| **Deadline**: Monday 7 PM PST | |
| **Time Budget**: 2-3 hours | |
| **Tech Stack**: Python + FastAPI + Gemini + SQLite | |
| ## π Core Requirements Analysis | |
| ### 1. **LinkedIn Profile Discovery** | |
| - Input: Job description | |
| - Output: Array of candidate profiles with basic data | |
| - Methods: Google Search API, RapidAPI, or direct parsing | |
| ### 2. **Candidate Scoring System** | |
| - Implement 6-category fit score rubric (100% total) | |
| - Education (20%), Career Trajectory (20%), Company Relevance (15%) | |
| - Experience Match (25%), Location Match (10%), Tenure (10%) | |
| ### 3. **Personalized Outreach Generation** | |
| - AI-generated messages referencing candidate details | |
| - Professional tone, job-specific customization | |
| ### 4. **Scalability Features** | |
| - Multiple job processing | |
| - Rate limiting management | |
| - Minimal data storage | |
| ## ποΈ Architecture Design | |
| ``` | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| β Job Input βββββΆβ LinkedIn βββββΆβ Profile β | |
| β (FastAPI) β β Search Engine β β Parser β | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| β | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| β Outreach ββββββ Fit Score ββββββ Candidate β | |
| β Generator β β Algorithm β β Data Store β | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| ``` | |
| ## π Development Timeline (2-3 hours) | |
| ### Phase 1: Foundation (30 minutes) | |
| - [ ] Set up project structure | |
| - [ ] Install dependencies (FastAPI, google-generativeai, SQLite, requests) | |
| - [ ] Create basic FastAPI endpoints | |
| - [ ] Set up environment variables for API keys | |
| ### Phase 2: LinkedIn Search (45 minutes) | |
| - [ ] Implement Google Search API integration | |
| - [ ] Create LinkedIn profile URL extraction | |
| - [ ] Build basic profile data parser | |
| - [ ] Add rate limiting and error handling | |
| ### Phase 3: Fit Scoring Algorithm (45 minutes) | |
| - [ ] Implement education scoring (20%) | |
| - [ ] Implement career trajectory scoring (20%) | |
| - [ ] Implement company relevance scoring (15%) | |
| - [ ] Implement experience match scoring (25%) | |
| - [ ] Implement location match scoring (10%) | |
| - [ ] Implement tenure scoring (10%) | |
| - [ ] Create weighted scoring function | |
| ### Phase 4: Outreach Generation (30 minutes) | |
| - [ ] Design prompt templates for LLM | |
| - [ ] Implement personalized message generation | |
| - [ ] Add candidate-specific references | |
| - [ ] Ensure professional tone | |
| ### Phase 5: Integration & Testing (30 minutes) | |
| - [ ] Connect all components | |
| - [ ] Test end-to-end pipeline | |
| - [ ] Optimize performance | |
| - [ ] Add error handling | |
| ### Phase 6: Deployment & Documentation (30 minutes) | |
| - [ ] Deploy to Hugging Face Spaces | |
| - [ ] Create README with setup instructions | |
| - [ ] Record demo video | |
| - [ ] Write submission summary | |
| ## π οΈ Technical Implementation Details | |
| ### Project Structure | |
| ``` | |
| linkedin-agent/ | |
| βββ app/ | |
| β βββ __init__.py | |
| β βββ main.py # FastAPI app | |
| β βββ models.py # Pydantic models | |
| β βββ services/ | |
| β β βββ linkedin_search.py | |
| β β βββ scoring.py | |
| β β βββ outreach.py | |
| β β βββ database.py | |
| β βββ utils/ | |
| β βββ config.py | |
| β βββ helpers.py | |
| βββ requirements.txt | |
| βββ README.md | |
| βββ .env | |
| ``` | |
| ### Key Dependencies | |
| ```python | |
| fastapi==0.104.1 | |
| uvicorn==0.24.0 | |
| google-generativeai==0.3.0 | |
| requests==2.31.0 | |
| python-dotenv==1.0.0 | |
| sqlite3 (built-in) | |
| ``` | |
| ### API Endpoints | |
| ```python | |
| POST /api/source-candidates | |
| { | |
| "job_description": "string", | |
| "location": "string (optional)", | |
| "max_candidates": "integer (default: 10)" | |
| } | |
| Response: | |
| { | |
| "job_id": "string", | |
| "candidates_found": "integer", | |
| "top_candidates": [ | |
| { | |
| "name": "string", | |
| "linkedin_url": "string", | |
| "fit_score": "float", | |
| "score_breakdown": "object", | |
| "outreach_message": "string" | |
| } | |
| ] | |
| } | |
| ``` | |
| ## π― Fit Scoring Implementation | |
| ### Education Scoring (20%) | |
| ```python | |
| def score_education(education_data): | |
| elite_schools = ["MIT", "Stanford", "Harvard", "Berkeley", "CMU"] | |
| strong_schools = ["UCLA", "USC", "Georgia Tech", "UIUC"] | |
| if any(school in education_data for school in elite_schools): | |
| return 9.5 | |
| elif any(school in education_data for school in strong_schools): | |
| return 7.5 | |
| else: | |
| return 5.5 | |
| ``` | |
| ### Experience Match Scoring (25%) | |
| ```python | |
| def score_experience(candidate_skills, job_requirements): | |
| # Use Gemini to compare skills and requirements | |
| prompt = f"Rate match between skills: {candidate_skills} and requirements: {job_requirements}" | |
| # Return score 1-10 | |
| ``` | |
| ## π LinkedIn Search Strategy | |
| ### Primary Method: Google Search API | |
| ```python | |
| def search_linkedin_profiles(job_description, location): | |
| query = f'site:linkedin.com/in "{job_description}" "{location}"' | |
| # Use Google Custom Search API | |
| # Extract LinkedIn URLs from results | |
| # Parse basic profile data | |
| ``` | |
| ### Fallback: Direct Parsing | |
| - Use requests + BeautifulSoup for basic profile extraction | |
| - Focus on public information only | |
| - Implement respectful rate limiting | |
| ## π€ LLM Integration | |
| ### Gemini for Scoring & Outreach | |
| ```python | |
| def generate_outreach_message(candidate, job_description): | |
| prompt = f""" | |
| Generate a personalized LinkedIn outreach message for {candidate['name']} | |
| based on their profile: {candidate['profile_data']} | |
| For this job: {job_description} | |
| Requirements: | |
| - Professional tone | |
| - Reference specific details from their profile | |
| - Explain why they're a good fit | |
| - Keep under 200 words | |
| """ | |
| ``` | |
| ## π Data Storage | |
| ### SQLite Schema | |
| ```sql | |
| CREATE TABLE candidates ( | |
| id INTEGER PRIMARY KEY, | |
| job_id TEXT, | |
| name TEXT, | |
| linkedin_url TEXT, | |
| profile_data TEXT, | |
| fit_score REAL, | |
| score_breakdown TEXT, | |
| outreach_message TEXT, | |
| created_at TIMESTAMP | |
| ); | |
| ``` | |
| ## π Deployment Strategy | |
| ### Hugging Face Spaces | |
| - Use Gradio for simple UI | |
| - FastAPI backend | |
| - Free tier hosting | |
| - Easy sharing and demo | |
| ### Environment Variables | |
| ```bash | |
| GOOGLE_API_KEY=your_key_here | |
| GOOGLE_SEARCH_API_KEY=your_key_here | |
| GOOGLE_SEARCH_ENGINE_ID=your_id_here | |
| ``` | |
| ## π― Success Metrics | |
| ### MVP Requirements | |
| - [ ] Find 10+ candidates for given job | |
| - [ ] Score candidates with breakdown | |
| - [ ] Generate personalized outreach | |
| - [ ] Handle basic rate limiting | |
| - [ ] Deploy working API | |
| ### Bonus Features (if time permits) | |
| - [ ] Multi-source data (GitHub, Twitter) | |
| - [ ] Smart caching | |
| - [ ] Batch processing | |
| - [ ] Confidence scoring | |
| ## β οΈ Risk Mitigation | |
| ### Technical Risks | |
| - **LinkedIn rate limiting**: Implement delays and user agents | |
| - **API costs**: Use free tiers, implement caching | |
| - **Data quality**: Graceful handling of incomplete profiles | |
| ### Time Risks | |
| - **Scope creep**: Focus on MVP first | |
| - **Integration issues**: Test components individually | |
| - **Deployment problems**: Use simple hosting (Hugging Face) | |
| ## π Submission Checklist | |
| - [ ] Working GitHub repository | |
| - [ ] Clear README with setup instructions | |
| - [ ] 3-minute demo video | |
| - [ ] 500-word write-up | |
| - [ ] Deployed API on Hugging Face | |
| - [ ] Submit via Google Form | |
| ## π‘ Optimization Tips | |
| 1. **Start with mock data** to test scoring algorithm | |
| 2. **Use Cursor AI** for boilerplate code generation | |
| 3. **Focus on pipeline architecture** over perfect accuracy | |
| 4. **Comment code thoroughly** to show thinking process | |
| 5. **Make it easily runnable** for judges | |
| ## π― Final Notes | |
| - **Priority**: Working pipeline > perfect accuracy | |
| - **Focus**: Architecture and approach over data quality | |
| - **Goal**: Demonstrate ability to build production-ready systems | |
| - **Time**: 2-3 hours maximum, keep it simple but functional | |
| This plan provides a clear roadmap to build a functional LinkedIn Sourcing Agent within the time constraints while meeting all core requirements and positioning for the bonus features. |