# LinkedIn Sourcing Agent - Development Plan

## 🎯 Project Overview
Build an autonomous AI agent that sources LinkedIn profiles, scores candidates using a fit score algorithm, and generates personalized outreach messages.

**Deadline**: Monday 7 PM PST
**Time Budget**: 2-3 hours
**Tech Stack**: Python + FastAPI + Gemini + SQLite

## 📋 Core Requirements Analysis

### 1. **LinkedIn Profile Discovery**
- Input: Job description
- Output: Array of candidate profiles with basic data
- Methods: Google Search API, RapidAPI, or direct parsing

### 2. **Candidate Scoring System**
- Implement 6-category fit score rubric (100% total)
- Education (20%), Career Trajectory (20%), Company Relevance (15%)
- Experience Match (25%), Location Match (10%), Tenure (10%)

### 3. **Personalized Outreach Generation**
- AI-generated messages referencing candidate details
- Professional tone, job-specific customization

### 4. **Scalability Features**
- Multiple job processing
- Rate limiting management
- Minimal data storage

## 🏗️ Architecture Design

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Job Input     │───▶│  LinkedIn       │───▶│  Profile        │
│   (FastAPI)     │    │  Search Engine  │    │  Parser         │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                                        │
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Outreach      │◀───│  Fit Score      │◀───│  Candidate      │
│   Generator     │    │  Algorithm      │    │  Data Store     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
```

## 📅 Development Timeline (2-3 hours)

### Phase 1: Foundation (30 minutes)
- [ ] Set up project structure
- [ ] Install dependencies (FastAPI, google-generativeai, SQLite, requests)
- [ ] Create basic FastAPI endpoints
- [ ] Set up environment variables for API keys

### Phase 2: LinkedIn Search (45 minutes)
- [ ] Implement Google Search API integration
- [ ] Create LinkedIn profile URL extraction
- [ ] Build basic profile data parser
- [ ] Add rate limiting and error handling

### Phase 3: Fit Scoring Algorithm (45 minutes)
- [ ] Implement education scoring (20%)
- [ ] Implement career trajectory scoring (20%)
- [ ] Implement company relevance scoring (15%)
- [ ] Implement experience match scoring (25%)
- [ ] Implement location match scoring (10%)
- [ ] Implement tenure scoring (10%)
- [ ] Create weighted scoring function

### Phase 4: Outreach Generation (30 minutes)
- [ ] Design prompt templates for LLM
- [ ] Implement personalized message generation
- [ ] Add candidate-specific references
- [ ] Ensure professional tone

### Phase 5: Integration & Testing (30 minutes)
- [ ] Connect all components
- [ ] Test end-to-end pipeline
- [ ] Optimize performance
- [ ] Add error handling

### Phase 6: Deployment & Documentation (30 minutes)
- [ ] Deploy to Hugging Face Spaces
- [ ] Create README with setup instructions
- [ ] Record demo video
- [ ] Write submission summary

## 🛠️ Technical Implementation Details

### Project Structure
```
linkedin-agent/
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI app
│   ├── models.py            # Pydantic models
│   ├── services/
│   │   ├── linkedin_search.py
│   │   ├── scoring.py
│   │   ├── outreach.py
│   │   └── database.py
│   └── utils/
│       ├── config.py
│       └── helpers.py
├── requirements.txt
├── README.md
└── .env
```

### Key Dependencies
```python
fastapi==0.104.1
uvicorn==0.24.0
google-generativeai==0.3.0
requests==2.31.0
python-dotenv==1.0.0
sqlite3 (built-in)
```

### API Endpoints
```python
POST /api/source-candidates
{
  "job_description": "string",
  "location": "string (optional)",
  "max_candidates": "integer (default: 10)"
}

Response:
{
  "job_id": "string",
  "candidates_found": "integer",
  "top_candidates": [
    {
      "name": "string",
      "linkedin_url": "string",
      "fit_score": "float",
      "score_breakdown": "object",
      "outreach_message": "string"
    }
  ]
}
```

## 🎯 Fit Scoring Implementation

### Education Scoring (20%)
```python
def score_education(education_data):
    elite_schools = ["MIT", "Stanford", "Harvard", "Berkeley", "CMU"]
    strong_schools = ["UCLA", "USC", "Georgia Tech", "UIUC"]
    
    if any(school in education_data for school in elite_schools):
        return 9.5
    elif any(school in education_data for school in strong_schools):
        return 7.5
    else:
        return 5.5
```

### Experience Match Scoring (25%)
```python
def score_experience(candidate_skills, job_requirements):
    # Use Gemini to compare skills and requirements
    prompt = f"Rate match between skills: {candidate_skills} and requirements: {job_requirements}"
    # Return score 1-10
```

## 🔍 LinkedIn Search Strategy

### Primary Method: Google Search API
```python
def search_linkedin_profiles(job_description, location):
    query = f'site:linkedin.com/in "{job_description}" "{location}"'
    # Use Google Custom Search API
    # Extract LinkedIn URLs from results
    # Parse basic profile data
```

### Fallback: Direct Parsing
- Use requests + BeautifulSoup for basic profile extraction
- Focus on public information only
- Implement respectful rate limiting

## 🤖 LLM Integration

### Gemini for Scoring & Outreach
```python
def generate_outreach_message(candidate, job_description):
    prompt = f"""
    Generate a personalized LinkedIn outreach message for {candidate['name']} 
    based on their profile: {candidate['profile_data']}
    For this job: {job_description}
    
    Requirements:
    - Professional tone
    - Reference specific details from their profile
    - Explain why they're a good fit
    - Keep under 200 words
    """
```

## 📊 Data Storage

### SQLite Schema
```sql
CREATE TABLE candidates (
    id INTEGER PRIMARY KEY,
    job_id TEXT,
    name TEXT,
    linkedin_url TEXT,
    profile_data TEXT,
    fit_score REAL,
    score_breakdown TEXT,
    outreach_message TEXT,
    created_at TIMESTAMP
);
```

## 🚀 Deployment Strategy

### Hugging Face Spaces
- Use Gradio for simple UI
- FastAPI backend
- Free tier hosting
- Easy sharing and demo

### Environment Variables
```bash
GOOGLE_API_KEY=your_key_here
GOOGLE_SEARCH_API_KEY=your_key_here
GOOGLE_SEARCH_ENGINE_ID=your_id_here
```

## 🎯 Success Metrics

### MVP Requirements
- [ ] Find 10+ candidates for given job
- [ ] Score candidates with breakdown
- [ ] Generate personalized outreach
- [ ] Handle basic rate limiting
- [ ] Deploy working API

### Bonus Features (if time permits)
- [ ] Multi-source data (GitHub, Twitter)
- [ ] Smart caching
- [ ] Batch processing
- [ ] Confidence scoring

## ⚠️ Risk Mitigation

### Technical Risks
- **LinkedIn rate limiting**: Implement delays and user agents
- **API costs**: Use free tiers, implement caching
- **Data quality**: Graceful handling of incomplete profiles

### Time Risks
- **Scope creep**: Focus on MVP first
- **Integration issues**: Test components individually
- **Deployment problems**: Use simple hosting (Hugging Face)

## 📝 Submission Checklist

- [ ] Working GitHub repository
- [ ] Clear README with setup instructions
- [ ] 3-minute demo video
- [ ] 500-word write-up
- [ ] Deployed API on Hugging Face
- [ ] Submit via Google Form

## 💡 Optimization Tips

1. **Start with mock data** to test scoring algorithm
2. **Use Cursor AI** for boilerplate code generation
3. **Focus on pipeline architecture** over perfect accuracy
4. **Comment code thoroughly** to show thinking process
5. **Make it easily runnable** for judges

## 🎯 Final Notes

- **Priority**: Working pipeline > perfect accuracy
- **Focus**: Architecture and approach over data quality
- **Goal**: Demonstrate ability to build production-ready systems
- **Time**: 2-3 hours maximum, keep it simple but functional

This plan provides a clear roadmap to build a functional LinkedIn Sourcing Agent within the time constraints while meeting all core requirements and positioning for the bonus features.