LinkedinAgent / development_plan.md
Hydra-Bolt
add
3856f78
# LinkedIn Sourcing Agent - Development Plan
## 🎯 Project Overview
Build an autonomous AI agent that sources LinkedIn profiles, scores candidates using a fit score algorithm, and generates personalized outreach messages.
**Deadline**: Monday 7 PM PST
**Time Budget**: 2-3 hours
**Tech Stack**: Python + FastAPI + Gemini + SQLite
## πŸ“‹ Core Requirements Analysis
### 1. **LinkedIn Profile Discovery**
- Input: Job description
- Output: Array of candidate profiles with basic data
- Methods: Google Search API, RapidAPI, or direct parsing
### 2. **Candidate Scoring System**
- Implement 6-category fit score rubric (100% total)
- Education (20%), Career Trajectory (20%), Company Relevance (15%)
- Experience Match (25%), Location Match (10%), Tenure (10%)
### 3. **Personalized Outreach Generation**
- AI-generated messages referencing candidate details
- Professional tone, job-specific customization
### 4. **Scalability Features**
- Multiple job processing
- Rate limiting management
- Minimal data storage
## πŸ—οΈ Architecture Design
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Job Input │───▢│ LinkedIn │───▢│ Profile β”‚
β”‚ (FastAPI) β”‚ β”‚ Search Engine β”‚ β”‚ Parser β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Outreach │◀───│ Fit Score │◀───│ Candidate β”‚
β”‚ Generator β”‚ β”‚ Algorithm β”‚ β”‚ Data Store β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## πŸ“… Development Timeline (2-3 hours)
### Phase 1: Foundation (30 minutes)
- [ ] Set up project structure
- [ ] Install dependencies (FastAPI, google-generativeai, SQLite, requests)
- [ ] Create basic FastAPI endpoints
- [ ] Set up environment variables for API keys
### Phase 2: LinkedIn Search (45 minutes)
- [ ] Implement Google Search API integration
- [ ] Create LinkedIn profile URL extraction
- [ ] Build basic profile data parser
- [ ] Add rate limiting and error handling
### Phase 3: Fit Scoring Algorithm (45 minutes)
- [ ] Implement education scoring (20%)
- [ ] Implement career trajectory scoring (20%)
- [ ] Implement company relevance scoring (15%)
- [ ] Implement experience match scoring (25%)
- [ ] Implement location match scoring (10%)
- [ ] Implement tenure scoring (10%)
- [ ] Create weighted scoring function
### Phase 4: Outreach Generation (30 minutes)
- [ ] Design prompt templates for LLM
- [ ] Implement personalized message generation
- [ ] Add candidate-specific references
- [ ] Ensure professional tone
### Phase 5: Integration & Testing (30 minutes)
- [ ] Connect all components
- [ ] Test end-to-end pipeline
- [ ] Optimize performance
- [ ] Add error handling
### Phase 6: Deployment & Documentation (30 minutes)
- [ ] Deploy to Hugging Face Spaces
- [ ] Create README with setup instructions
- [ ] Record demo video
- [ ] Write submission summary
## πŸ› οΈ Technical Implementation Details
### Project Structure
```
linkedin-agent/
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ main.py # FastAPI app
β”‚ β”œβ”€β”€ models.py # Pydantic models
β”‚ β”œβ”€β”€ services/
β”‚ β”‚ β”œβ”€β”€ linkedin_search.py
β”‚ β”‚ β”œβ”€β”€ scoring.py
β”‚ β”‚ β”œβ”€β”€ outreach.py
β”‚ β”‚ └── database.py
β”‚ └── utils/
β”‚ β”œβ”€β”€ config.py
β”‚ └── helpers.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── .env
```
### Key Dependencies
```python
fastapi==0.104.1
uvicorn==0.24.0
google-generativeai==0.3.0
requests==2.31.0
python-dotenv==1.0.0
sqlite3 (built-in)
```
### API Endpoints
```python
POST /api/source-candidates
{
"job_description": "string",
"location": "string (optional)",
"max_candidates": "integer (default: 10)"
}
Response:
{
"job_id": "string",
"candidates_found": "integer",
"top_candidates": [
{
"name": "string",
"linkedin_url": "string",
"fit_score": "float",
"score_breakdown": "object",
"outreach_message": "string"
}
]
}
```
## 🎯 Fit Scoring Implementation
### Education Scoring (20%)
```python
def score_education(education_data):
elite_schools = ["MIT", "Stanford", "Harvard", "Berkeley", "CMU"]
strong_schools = ["UCLA", "USC", "Georgia Tech", "UIUC"]
if any(school in education_data for school in elite_schools):
return 9.5
elif any(school in education_data for school in strong_schools):
return 7.5
else:
return 5.5
```
### Experience Match Scoring (25%)
```python
def score_experience(candidate_skills, job_requirements):
# Use Gemini to compare skills and requirements
prompt = f"Rate match between skills: {candidate_skills} and requirements: {job_requirements}"
# Return score 1-10
```
## πŸ” LinkedIn Search Strategy
### Primary Method: Google Search API
```python
def search_linkedin_profiles(job_description, location):
query = f'site:linkedin.com/in "{job_description}" "{location}"'
# Use Google Custom Search API
# Extract LinkedIn URLs from results
# Parse basic profile data
```
### Fallback: Direct Parsing
- Use requests + BeautifulSoup for basic profile extraction
- Focus on public information only
- Implement respectful rate limiting
## πŸ€– LLM Integration
### Gemini for Scoring & Outreach
```python
def generate_outreach_message(candidate, job_description):
prompt = f"""
Generate a personalized LinkedIn outreach message for {candidate['name']}
based on their profile: {candidate['profile_data']}
For this job: {job_description}
Requirements:
- Professional tone
- Reference specific details from their profile
- Explain why they're a good fit
- Keep under 200 words
"""
```
## πŸ“Š Data Storage
### SQLite Schema
```sql
CREATE TABLE candidates (
id INTEGER PRIMARY KEY,
job_id TEXT,
name TEXT,
linkedin_url TEXT,
profile_data TEXT,
fit_score REAL,
score_breakdown TEXT,
outreach_message TEXT,
created_at TIMESTAMP
);
```
## πŸš€ Deployment Strategy
### Hugging Face Spaces
- Use Gradio for simple UI
- FastAPI backend
- Free tier hosting
- Easy sharing and demo
### Environment Variables
```bash
GOOGLE_API_KEY=your_key_here
GOOGLE_SEARCH_API_KEY=your_key_here
GOOGLE_SEARCH_ENGINE_ID=your_id_here
```
## 🎯 Success Metrics
### MVP Requirements
- [ ] Find 10+ candidates for given job
- [ ] Score candidates with breakdown
- [ ] Generate personalized outreach
- [ ] Handle basic rate limiting
- [ ] Deploy working API
### Bonus Features (if time permits)
- [ ] Multi-source data (GitHub, Twitter)
- [ ] Smart caching
- [ ] Batch processing
- [ ] Confidence scoring
## ⚠️ Risk Mitigation
### Technical Risks
- **LinkedIn rate limiting**: Implement delays and user agents
- **API costs**: Use free tiers, implement caching
- **Data quality**: Graceful handling of incomplete profiles
### Time Risks
- **Scope creep**: Focus on MVP first
- **Integration issues**: Test components individually
- **Deployment problems**: Use simple hosting (Hugging Face)
## πŸ“ Submission Checklist
- [ ] Working GitHub repository
- [ ] Clear README with setup instructions
- [ ] 3-minute demo video
- [ ] 500-word write-up
- [ ] Deployed API on Hugging Face
- [ ] Submit via Google Form
## πŸ’‘ Optimization Tips
1. **Start with mock data** to test scoring algorithm
2. **Use Cursor AI** for boilerplate code generation
3. **Focus on pipeline architecture** over perfect accuracy
4. **Comment code thoroughly** to show thinking process
5. **Make it easily runnable** for judges
## 🎯 Final Notes
- **Priority**: Working pipeline > perfect accuracy
- **Focus**: Architecture and approach over data quality
- **Goal**: Demonstrate ability to build production-ready systems
- **Time**: 2-3 hours maximum, keep it simple but functional
This plan provides a clear roadmap to build a functional LinkedIn Sourcing Agent within the time constraints while meeting all core requirements and positioning for the bonus features.