Spaces:

Rishabh2095
/

AgentWorkflowJobApplications

Sleeping

App Files Files Community

Rishabh2095 commited on 25 days ago

Commit

6a10ab5

2 Parent(s): 45de167 44010a8

Merge branch 'main' of https://github.com/rishabh1024/job_writer

Browse files

Files changed (13) hide show

.dockerignore +74 -0
.github/agents/PythonMentor.agent.md +0 -0
.vscode/settings.json +7 -3
DEPLOYMENT_GUIDE.md +0 -303
DOCKERFILE_EXPLANATION.md +0 -147
api-1.json +0 -0
demo_candidate_store.py +273 -0
docker-compose.override.example.yml +0 -21
src/job_writing_agent/agent_memory/__init__.py +0 -0
src/job_writing_agent/agent_memory/agent_shopping_example.py +249 -0
src/job_writing_agent/agent_memory/candidate_profile_store.py +345 -0
src/job_writing_agent/agent_memory/mongodb_logterm_memory.py +0 -0
uv.lock +0 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,74 @@

+# Virtual environments
+app_env/
+venv/
+env/
+.venv/
+ENV/
+# Python cache
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+*.egg-info/
+dist/
+build/
+# IDE
+.vscode/
+.cursor/
+.idea/
+*.swp
+*.swo
+*~
+# Logs
+*.log
+logs/
+src/job_writing_agent/logs/*
+# Environment files
+.env
+.env.*
+.docker_env
+*.env
+# Git
+.git/
+.gitignore
+.gitattributes
+# Documentation
+*.md
+!README.md
+DEPLOYMENT_GUIDE.md
+# Test files
+test_*.py
+*_test.py
+tests/
+# Data files (if large)
+*.csv
+*.json
+!langgraph.json
+!pyproject.toml
+# Docker
+docker-compose.yml
+Dockerfile
+.dockerignore
+.docker_env
+# LangGraph artifacts
+.langgraph_api/
+*.pkl
+*.pickle
+# API docs
+api-1.json
+# OS
+.DS_Store
+Thumbs.db

.github/agents/PythonMentor.agent.md ADDED Viewed

File without changes

.vscode/settings.json CHANGED Viewed

@@ -2,9 +2,13 @@
     "python.defaultInterpreterPath": "C:\\Users\\risha\\python-dir\\job_application_agent\\job_writer\\app_env\\Scripts\\python.exe",
     "python.formatting.provider": "black",
     "editor.formatOnSave": true,
-    "python.formatting.blackArgs": ["--line-length", "88"],
-     "python.linting.enabled": true,
     "python.linting.pylintEnabled": true,
     "python.linting.lintOnSave": true,
-    "python.linting.mypyEnabled": true
 }

     "python.defaultInterpreterPath": "C:\\Users\\risha\\python-dir\\job_application_agent\\job_writer\\app_env\\Scripts\\python.exe",
     "python.formatting.provider": "black",
     "editor.formatOnSave": true,
+    "python.formatting.blackArgs": [
+        "--line-length",
+        "88"
+    ],
+    "python.linting.enabled": true,
     "python.linting.pylintEnabled": true,
     "python.linting.lintOnSave": true,
+    "python.linting.mypyEnabled": true,
+    "python-envs.pythonProjects": []
 }

DEPLOYMENT_GUIDE.md DELETED Viewed

@@ -1,303 +0,0 @@
-# Deployment Guide for Job Application Agent
-## Option 1: LangGraph Cloud (Easiest & Recommended)
-### Prerequisites
-- LangGraph CLI installed (`langgraph-cli` in requirements.txt)
-- `langgraph.json` already configured ✅
-### Steps
-1. **Install LangGraph CLI** (if not already):
-```powershell
-pip install langgraph-cli
-```
-2. **Login to LangGraph Cloud**:
-```powershell
-langgraph login
-```
-3. **Deploy your agent**:
-```powershell
-langgraph deploy
-```
-4. **Get your API endpoint** - LangGraph Cloud provides a REST API automatically
-### Cost
-- **Free tier**: Limited requests/month
-- **Paid**: Pay-per-use pricing
-### Pros
-- ✅ Zero infrastructure management
-- ✅ Built-in state persistence
-- ✅ Automatic API generation
-- ✅ LangSmith integration
-- ✅ Perfect for LangGraph apps
-### Cons
-- ⚠️ Vendor lock-in
-- ⚠️ Limited customization
----
-## Option 2: Railway.app (Simple & Cheap)
-### Steps
-1. **Create a FastAPI wrapper** (create `api.py`):
-```python
-from fastapi import FastAPI, File, UploadFile
-from job_writing_agent.workflow import JobWorkflow
-import tempfile
-import os
-app = FastAPI()
-@app.post("/generate")
-async def generate_application(
-    resume: UploadFile = File(...),
-    job_description: str,
-    content_type: str = "cover_letter"
-):
-    # Save resume temporarily
-    with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp:
-        tmp.write(await resume.read())
-        resume_path = tmp.name
-    try:
-        workflow = JobWorkflow(
-            resume=resume_path,
-            job_description_source=job_description,
-            content=content_type
-        )
-        result = await workflow.run()
-        return {"result": result}
-    finally:
-        os.unlink(resume_path)
-```
-2. **Create `Procfile`**:
-```
-web: uvicorn api:app --host 0.0.0.0 --port $PORT
-```
-3. **Deploy to Railway**:
-   - Sign up at [railway.app](https://railway.app)
-   - Connect GitHub repo
-   - Railway auto-detects Python and runs `Procfile`
-### Cost
-- **Free tier**: $5 credit/month
-- **Hobby**: $5/month for 512MB RAM
-- **Pro**: $20/month for 2GB RAM
-### Pros
-- ✅ Very simple deployment
-- ✅ Auto-scaling
-- ✅ Free tier available
-- ✅ Automatic HTTPS
-### Cons
-- ⚠️ Need to add FastAPI wrapper
-- ⚠️ State management needs Redis/Postgres
----
-## Option 3: Render.com (Similar to Railway)
-### Steps
-1. **Create `render.yaml`**:
-```yaml
-services:
-  - type: web
-    name: job-writer-api
-    env: python
-    buildCommand: pip install -r requirements.txt
-    startCommand: uvicorn api:app --host 0.0.0.0 --port $PORT
-    envVars:
-      - key: OPENROUTER_API_KEY
-        sync: false
-      - key: TAVILY_API_KEY
-        sync: false
-```
-2. **Deploy**:
-   - Connect GitHub repo to Render
-   - Render auto-detects `render.yaml`
-### Cost
-- **Free tier**: 750 hours/month (sleeps after 15min inactivity)
-- **Starter**: $7/month (always on)
-### Pros
-- ✅ Free tier for testing
-- ✅ Simple YAML config
-- ✅ Auto-deploy from Git
-### Cons
-- ⚠️ Free tier sleeps (cold starts)
-- ⚠️ Need FastAPI wrapper
----
-## Option 4: Fly.io (Good Free Tier)
-### Steps
-1. **Install Fly CLI**:
-```powershell
-iwr https://fly.io/install.ps1 -useb | iex
-```
-2. **Create `Dockerfile`**:
-```dockerfile
-FROM python:3.12-slim
-WORKDIR /app
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-COPY . .
-CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8080"]
-```
-3. **Deploy**:
-```powershell
-fly launch
-fly deploy
-```
-### Cost
-- **Free tier**: 3 shared-cpu VMs, 3GB storage
-- **Paid**: $1.94/month per VM
-### Pros
-- ✅ Generous free tier
-- ✅ Global edge deployment
-- ✅ Docker-based (flexible)
-### Cons
-- ⚠️ Need Docker knowledge
-- ⚠️ Need FastAPI wrapper
----
-## Option 5: AWS Lambda (Serverless - Pay Per Use)
-### Steps
-1. **Create Lambda handler** (`lambda_handler.py`):
-```python
-import json
-from job_writing_agent.workflow import JobWorkflow
-def lambda_handler(event, context):
-    # Parse event
-    body = json.loads(event['body'])
-    workflow = JobWorkflow(
-        resume=body['resume_path'],
-        job_description_source=body['job_description'],
-        content=body.get('content_type', 'cover_letter')
-    )
-    result = workflow.run()
-    return {
-        'statusCode': 200,
-        'body': json.dumps({'result': result})
-    }
-```
-2. **Package and deploy** using AWS SAM or Serverless Framework
-### Cost
-- **Free tier**: 1M requests/month
-- **Paid**: $0.20 per 1M requests + compute time
-### Pros
-- ✅ Pay only for usage
-- ✅ Auto-scaling
-- ✅ Very cheap for low traffic
-### Cons
-- ⚠️ 15min timeout limit
-- ⚠️ Cold starts
-- ⚠️ Complex setup
-- ⚠️ Need to handle state externally
----
-## Recommendation
-**For your use case, I recommend:**
-1. **Start with LangGraph Cloud** - Easiest, built for your stack
-2. **If you need more control → Railway** - Simple, good free tier
-3. **If you need serverless → AWS Lambda** - Cheapest for low traffic
----
-## Quick Start: FastAPI Wrapper (for Railway/Render/Fly.io)
-Create `api.py` in your project root:
-```python
-from fastapi import FastAPI, File, UploadFile, HTTPException
-from fastapi.responses import JSONResponse
-from job_writing_agent.workflow import JobWorkflow
-import tempfile
-import os
-import asyncio
-app = FastAPI(title="Job Application Writer API")
-@app.get("/")
-def health():
-    return {"status": "ok"}
-@app.post("/generate")
-async def generate_application(
-    resume: UploadFile = File(...),
-    job_description: str,
-    content_type: str = "cover_letter"
-):
-    """Generate job application material."""
-    # Save resume temporarily
-    with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp:
-        content = await resume.read()
-        tmp.write(content)
-        resume_path = tmp.name
-    try:
-        workflow = JobWorkflow(
-            resume=resume_path,
-            job_description_source=job_description,
-            content=content_type
-        )
-        # Run workflow (assuming it's async or can be wrapped)
-        result = await asyncio.to_thread(workflow.run)
-        return JSONResponse({
-            "status": "success",
-            "result": result
-        })
-    except Exception as e:
-        raise HTTPException(status_code=500, detail=str(e))
-    finally:
-        # Cleanup
-        if os.path.exists(resume_path):
-            os.unlink(resume_path)
-if __name__ == "__main__":
-    import uvicorn
-    uvicorn.run(app, host="0.0.0.0", port=8000)
-```
-Then update `requirements.txt` to ensure FastAPI and uvicorn are included (they already are ✅).

DOCKERFILE_EXPLANATION.md DELETED Viewed

@@ -1,147 +0,0 @@
-# Dockerfile Explanation
-This Dockerfile is specifically designed for **LangGraph Cloud/LangServe deployment**. It uses the official LangGraph API base image and configures your agent graphs to be served as REST APIs.
-## Line-by-Line Breakdown
-### 1. Base Image (Line 1)
-```dockerfile
-FROM langchain/langgraph-api:3.12
-```
-- **Purpose**: Uses the official LangGraph API base image with Python 3.12
-- **What it includes**: Pre-configured LangGraph runtime, LangServe server, and all LangGraph dependencies
-- **Why**: This image already has everything needed to serve LangGraph workflows as REST APIs
----
-### 2. Install Node Dependencies (Line 9)
-```dockerfile
-RUN PYTHONDONTWRITEBYTECODE=1 uv pip install --system --no-cache-dir -c /api/constraints.txt nodes
-```
-- **Purpose**: Installs the `nodes` package (likely a dependency from your `langgraph.json`)
-- **`PYTHONDONTWRITEBYTECODE=1`**: Prevents creating `.pyc` files (smaller image)
-- **`uv pip`**: Uses `uv` (fast Python package installer) instead of regular `pip`
-- **`--system`**: Installs to system Python (not virtual env)
-- **`--no-cache-dir`**: Doesn't cache pip downloads (smaller image)
-- **`-c /api/constraints.txt`**: Uses constraint file from base image (ensures compatible versions)
----
-### 3. Copy Your Code (Line 14)
-```dockerfile
-ADD . /deps/job_writer
-```
-- **Purpose**: Copies your entire project into `/deps/job_writer` in the container
-- **Why `/deps/`**: LangGraph API expects dependencies in this directory
-- **What gets copied**: All your source code, `pyproject.toml`, `requirements.txt`, etc.
----
-### 4. Install Your Package (Lines 19-21)
-```dockerfile
-RUN for dep in /deps/*; do
-    echo "Installing $dep";
-    if [ -d "$dep" ]; then
-        echo "Installing $dep";
-        (cd "$dep" && PYTHONDONTWRITEBYTECODE=1 uv pip install --system --no-cache-dir -c /api/constraints.txt -e .);
-    fi;
-done
-```
-- **Purpose**: Installs your `job_writer` package in editable mode (`-e`)
-- **How it works**:
-  - Loops through all directories in `/deps/`
-  - For each directory, changes into it and runs `pip install -e .`
-  - The `-e` flag installs in "editable" mode (changes to code are reflected)
-- **Why**: Makes your package importable as `job_writing_agent` inside the container
----
-### 5. Register Your Graphs (Line 25)
-```dockerfile
-ENV LANGSERVE_GRAPHS='{"job_app_graph": "/deps/job_writer/src/job_writing_agent/workflow.py:job_app_graph", ...}'
-```
-- **Purpose**: Tells LangServe which graphs to expose as REST APIs
-- **Format**: JSON mapping of `graph_name` → `module_path:attribute_name`
-- **What it does**:
-  - `job_app_graph` → Exposes `JobWorkflow.job_app_graph` property as an API endpoint
-  - `research_workflow` → Exposes the research subgraph
-  - `data_loading_workflow` → Exposes the data loading subgraph
-- **Result**: Each graph becomes a REST API endpoint like `/invoke/job_app_graph`
----
-### 6. Protect LangGraph API (Lines 33-35)
-```dockerfile
-RUN mkdir -p /api/langgraph_api /api/langgraph_runtime /api/langgraph_license && \
-    touch /api/langgraph_api/__init__.py /api/langgraph_runtime/__init__.py /api/langgraph_license/__init__.py
-RUN PYTHONDONTWRITEBYTECODE=1 uv pip install --system --no-cache-dir --no-deps -e /api
-```
-- **Purpose**: Prevents your dependencies from accidentally overwriting LangGraph API packages
-- **How**:
-  1. Creates placeholder `__init__.py` files for LangGraph packages
-  2. Reinstalls LangGraph API (without dependencies) to ensure it's not overwritten
-- **Why**: If your `requirements.txt` has conflicting versions, this ensures LangGraph API stays intact
----
-### 7. Cleanup Build Tools (Lines 37-41)
-```dockerfile
-RUN pip uninstall -y pip setuptools wheel
-RUN rm -rf /usr/local/lib/python*/site-packages/pip* ...
-RUN uv pip uninstall --system pip setuptools wheel && rm /usr/bin/uv /usr/bin/uvx
-```
-- **Purpose**: Removes all build tools to make the image smaller and more secure
-- **What gets removed**:
-  - `pip`, `setuptools`, `wheel` (Python build tools)
-  - `uv` and `uvx` (package installers)
-- **Why**: These tools aren't needed at runtime, only during build
-- **Security**: Smaller attack surface (can't install malicious packages at runtime)
----
-### 8. Set Working Directory (Line 45)
-```dockerfile
-WORKDIR /deps/job_writer
-```
-- **Purpose**: Sets the default directory when the container starts
-- **Why**: Makes it easier to reference files relative to your project root
----
-## How It Works at Runtime
-When this container runs:
-1. **LangServe starts automatically** (from base image)
-2. **Reads `LANGSERVE_GRAPHS`** environment variable
-3. **Imports your graphs** from the specified paths
-4. **Exposes REST API endpoints**:
-   - `POST /invoke/job_app_graph` - Main workflow
-   - `POST /invoke/research_workflow` - Research subgraph
-   - `POST /invoke/data_loading_workflow` - Data loading subgraph
-5. **Handles state management** automatically (checkpointing, persistence)
-## Example API Usage
-Once deployed, you can call your agent like this:
-```bash
-curl -X POST http://your-deployment/invoke/job_app_graph \
-  -H "Content-Type: application/json" \
-  -d '{
-    "resume_path": "...",
-    "job_description_source": "...",
-    "content": "cover_letter"
-  }'
-```
-## Key Points
-✅ **Optimized for LangGraph Cloud** - Uses official base image
-✅ **Automatic API generation** - No need to write FastAPI code
-✅ **State management** - Built-in checkpointing and persistence
-✅ **Security** - Removes build tools from final image
-✅ **Small image** - No-cache installs, no bytecode files
-This is the **easiest deployment option** for LangGraph apps - just build and push this Docker image!

api-1.json ADDED Viewed

The diff for this file is too large to render. See raw diff

demo_candidate_store.py ADDED Viewed

	@@ -0,0 +1,273 @@

+"""
+Demo script for CandidateProfileStore using ChromaDB.
+Tests basic operations:
+- Adding resumes with different sections
+- Querying with natural language
+- Retrieving sections
+- Listing candidates
+- Deleting data
+"""
+import logging
+from src.job_writing_agent.agent_memory.candidate_profile_store import CandidateProfileStore
+# Setup logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+def demo_basic_operations():
+    """Demonstrate basic ChromaDB candidate store operations."""
+    print("=" * 80)
+    print("CANDIDATE PROFILE STORE DEMO")
+    print("=" * 80)
+    # Initialize store
+    print("\n1. Initializing ChromaDB store...")
+    store = CandidateProfileStore(persist_directory="./chroma_db")
+    # Sample resume data
+    candidate_1 = {
+        "Experience": """
+            Senior Software Engineer at TechCorp (2020-2024)
+            - Led development of microservices architecture using Python and FastAPI
+            - Implemented machine learning models for recommendation systems
+            - Managed team of 5 engineers, conducted code reviews
+            - Built CI/CD pipelines using Docker and Kubernetes
+            - Reduced API response time by 60% through optimization
+            Software Engineer at StartupXYZ (2018-2020)
+            - Developed RESTful APIs using Django and PostgreSQL
+            - Created data processing pipelines with Apache Airflow
+            - Implemented automated testing with pytest and coverage tools
+        """,
+        "Skills": """
+            Programming Languages: Python, JavaScript, SQL, Go
+            Frameworks: FastAPI, Django, React, Flask
+            Databases: PostgreSQL, MongoDB, Redis
+            Tools: Docker, Kubernetes, Git, Jenkins, AWS
+            Machine Learning: scikit-learn, TensorFlow, pandas, numpy
+            Methodologies: Agile, Scrum, Test-Driven Development
+        """,
+        "Education": """
+            Master of Science in Computer Science
+            Stanford University (2016-2018)
+            - Specialization in Machine Learning and AI
+            - GPA: 3.9/4.0
+            - Thesis: "Deep Learning Approaches for Natural Language Processing"
+            Bachelor of Science in Computer Engineering
+            MIT (2012-2016)
+            - Minor in Mathematics
+            - Dean's List all semesters
+        """,
+        "Projects": """
+            Open Source Contributions:
+            - Contributor to FastAPI framework (30+ merged PRs)
+            - Created python-ml-toolkit library (500+ GitHub stars)
+            Personal Projects:
+            - Built AI-powered job matching platform
+            - Developed automated trading bot using machine learning
+        """
+    }
+    candidate_2 = {
+        "Experience": """
+            Data Scientist at AnalyticsPro (2021-2024)
+            - Built predictive models for customer churn analysis
+            - Developed NLP pipelines for sentiment analysis
+            - Created interactive dashboards using Tableau and Plotly
+            - Worked with large datasets (100M+ records) using PySpark
+            Junior Data Analyst at DataCo (2019-2021)
+            - Performed statistical analysis on user behavior data
+            - Created SQL queries and data visualizations
+            - Automated reporting using Python scripts
+        """,
+        "Skills": """
+            Programming: Python, R, SQL
+            Data Science: pandas, numpy, scikit-learn, statsmodels
+            Machine Learning: TensorFlow, PyTorch, XGBoost
+            Visualization: Matplotlib, Seaborn, Plotly, Tableau
+            Big Data: PySpark, Hadoop, Hive
+            Statistics: A/B Testing, Hypothesis Testing, Regression Analysis
+        """,
+        "Education": """
+            Master of Science in Data Science
+            UC Berkeley (2017-2019)
+            - Focus on Statistical Learning and Big Data Analytics
+            Bachelor of Science in Mathematics
+            UCLA (2013-2017)
+            - Minor in Computer Science
+        """
+    }
+    # Add resumes
+    print("\n2. Adding candidate resumes...")
+    result1 = store.add_resume_text(
+        candidate_id="candidate_001",
+        sections=candidate_1,
+        metadata={
+            "name": "John Smith",
+            "email": "john.smith@email.com",
+            "title": "Senior Software Engineer"
+        }
+    )
+    print(f"   ✓ Added candidate_001: {result1['chunks_stored']} chunks, sections: {result1['sections']}")
+    result2 = store.add_resume_text(
+        candidate_id="candidate_002",
+        sections=candidate_2,
+        metadata={
+            "name": "Jane Doe",
+            "email": "jane.doe@email.com",
+            "title": "Data Scientist"
+        }
+    )
+    print(f"   ✓ Added candidate_002: {result2['chunks_stored']} chunks, sections: {result2['sections']}")
+    # List all candidates
+    print("\n3. Listing all candidates...")
+    candidates = store.list_candidates()
+    print(f"   Found {len(candidates)} candidates: {candidates}")
+    # Query tests
+    print("\n4. Testing semantic queries...")
+    queries = [
+        ("candidate_001", "programming languages and frameworks", None),
+        ("candidate_001", "machine learning experience", "Experience"),
+        ("candidate_002", "data visualization tools", "Skills"),
+        ("candidate_001", "education background in AI", "Education"),
+        ("candidate_002", "worked with big data", None),
+    ]
+    for candidate_id, query, section in queries:
+        section_str = f" (section: {section})" if section else ""
+        print(f"\n   Query: '{query}'{section_str}")
+        print(f"   Candidate: {candidate_id}")
+        results = store.query_resume(
+            candidate_id=candidate_id,
+            query=query,
+            section=section,
+            n_results=3
+        )
+        for i, result in enumerate(results, 1):
+            relevance = result['relevance_score']
+            doc_preview = result['document'][:150].replace('\n', ' ')
+            print(f"   {i}. [Score: {relevance:.3f}] {result['metadata']['section']}")
+            print(f"      {doc_preview}...")
+    # Get all sections for a candidate
+    print("\n5. Retrieving all sections for candidate_001...")
+    sections = store.get_all_sections("candidate_001")
+    for section_name, content in sections.items():
+        preview = content[:100].replace('\n', ' ')
+        print(f"   {section_name}: {preview}...")
+    # Get specific section
+    print("\n6. Getting specific section (Skills) for candidate_001...")
+    skills_chunks = store.get_candidate_sections(
+        candidate_id="candidate_001",
+        section="Skills"
+    )
+    print(f"   Found {len(skills_chunks)} chunks:")
+    for chunk in skills_chunks:
+        preview = chunk['document'][:80].replace('\n', ' ')
+        print(f"   - {preview}...")
+    # Statistics
+    print("\n7. Database statistics...")
+    print(f"   Total documents in collection: {store.collection.count()}")
+    print(f"   Total candidates: {len(store.list_candidates())}")
+    # Cleanup option (commented out - uncomment to test deletion)
+    # print("\n8. Testing deletion...")
+    # delete_result = store.delete_candidate("candidate_002")
+    # print(f"   Deleted {delete_result['chunks_deleted']} chunks for candidate_002")
+    # print(f"   Remaining candidates: {store.list_candidates()}")
+    print("\n" + "=" * 80)
+    print("DEMO COMPLETED SUCCESSFULLY")
+    print("=" * 80)
+    print(f"\nData persisted to: {store.persist_directory}")
+    print("To reset database, uncomment the cleanup section in the demo script.")
+def demo_job_matching_queries():
+    """Demonstrate job-matching use cases."""
+    print("\n" + "=" * 80)
+    print("JOB MATCHING QUERIES DEMO")
+    print("=" * 80)
+    store = CandidateProfileStore(persist_directory="./chroma_db")
+    # Simulate job requirements
+    job_queries = [
+        {
+            "title": "Senior Backend Engineer",
+            "query": "Python FastAPI microservices Docker Kubernetes experience",
+            "candidate": "candidate_001"
+        },
+        {
+            "title": "Machine Learning Engineer",
+            "query": "machine learning models deep learning TensorFlow production",
+            "candidate": "candidate_001"
+        },
+        {
+            "title": "Data Science Lead",
+            "query": "predictive modeling statistics big data PySpark",
+            "candidate": "candidate_002"
+        },
+        {
+            "title": "NLP Engineer",
+            "query": "natural language processing sentiment analysis text mining",
+            "candidate": "candidate_002"
+        }
+    ]
+    print("\nMatching candidates to job requirements:\n")
+    for job in job_queries:
+        print(f"Job: {job['title']}")
+        print(f"Requirements: {job['query']}")
+        print(f"Checking: {job['candidate']}\n")
+        results = store.query_resume(
+            candidate_id=job['candidate'],
+            query=job['query'],
+            n_results=2
+        )
+        if results:
+            best_match = results[0]
+            print(f"✓ Match Score: {best_match['relevance_score']:.3f}")
+            print(f"  Relevant experience from {best_match['metadata']['section']}:")
+            preview = best_match['document'][:200].replace('\n', ' ')
+            print(f"  {preview}...\n")
+        print("-" * 80 + "\n")
+if __name__ == "__main__":
+    try:
+        # Run basic operations demo
+        demo_basic_operations()
+        # Run job matching demo
+        demo_job_matching_queries()
+    except Exception as e:
+        logger.error(f"Demo failed: {e}", exc_info=True)
+        raise

docker-compose.override.example.yml DELETED Viewed

@@ -1,21 +0,0 @@
-# Example override file for local development
-# Copy this to docker-compose.override.yml to customize settings
-# docker-compose automatically loads override files
-version: "3.9"
-services:
-  redis:
-    # Override Redis port for local development
-    ports:
-      - "6380:6379"  # Use different port if 6379 is already in use
-  postgres:
-    # Override Postgres port for local development
-    ports:
-      - "5433:5432"  # Use different port if 5432 is already in use
-    environment:
-      # Override credentials for local dev
-      - POSTGRES_USER=dev_user
-      - POSTGRES_PASSWORD=dev_password
-      - POSTGRES_DB=job_app_dev

src/job_writing_agent/agent_memory/__init__.py ADDED Viewed

File without changes

src/job_writing_agent/agent_memory/agent_shopping_example.py ADDED Viewed

	@@ -0,0 +1,249 @@

+import logging
+from langgraph.store.mongodb.base import MongoDBStore, VectorIndexConfig
+from langgraph.checkpoint.mongodb import MongoDBSaver
+from langchain.agents.middleware import dynamic_prompt, ModelRequest
+from langchain.agents import create_agent
+from langchain.tools import tool
+from langmem import create_manage_memory_tool
+from langchain_voyageai import VoyageAIEmbeddings
+from langchain_mongodb import MongoDBAtlasVectorSearch
+from langchain_openai import OpenAIEmbeddings
+from langchain_core.runnables import RunnableConfig
+from pydantic import SecretStr
+from pymongo import MongoClient
+import os
+from src.job_writing_agent.utils.llm_provider_factory import LLMFactory
+factory = LLMFactory(default_provider="openrouter")
+llm = factory.create_langchain(
+    model="openai/gpt-oss-20b:free",
+    provider="openrouter",
+    temperature=0.7
+)
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.ERROR)
+os.environ["VOYAGE_API_KEY"] = "al-Tb_yMc_j7L50kEyRl_wsHTUARCcs77h0EjiEadFuT7N"
+# Initialize MongoDB connection
+MONGODB_URI = "mongodb+srv://raggar15:Rishabh%402095@cluster0.jsbdmm9.mongodb.net/?appName=Cluster0"
+client = MongoClient(MONGODB_URI)
+db = client["memories"]
+collection = db["memory_store"]
+# Create the store with vector search capabilities
+store = MongoDBStore(
+    collection=collection,
+    index_config=VectorIndexConfig(
+        fields=None,          # Auto-detect fields for indexing
+        filters=[],         # No additional filters
+        dims=1024,           # OpenAI embedding dimensions
+        embed=VoyageAIEmbeddings(model="voyage-3.5-lite", api_key=SecretStr("al-Tb_yMc_j7L50kEyRl_wsHTUARCcs77h0EjiEadFuT7N"))   # Embedding model for vector search
+    ),
+auto_index_timeout=120
+)
+checkpointer = MongoDBSaver(
+    client,                            # MongoDB client
+    db_name="memories",               # Database name
+    collection_name="thread_checkpoints"   # Collection for conversation state
+)
+db = client["ai_shop"]
+collection = db["products"]
+@tool
+def search_products(query: str) -> str:
+    """Searches for products in the database using vector search."""
+    vectorstore = MongoDBAtlasVectorSearch(collection, OpenAIEmbeddings(), text_key="title", embedding_key="embedding", index_name="vector_index_2")
+    docs = vectorstore.similarity_search(query, k=5)
+    return "\n".join([str(doc.metadata) for doc in docs])
+@dynamic_prompt
+def dynamic_memories_prompt(request: ModelRequest) -> str:
+    """
+    A middleware that builds a dynamic system prompt using relevant memories.
+    """
+    # 1. Get current state (which includes messages)
+    state = request.state
+    # 2. Extract the last user message content
+    last_message_text = ""
+    if state.get("messages"):
+        last_message_text = state["messages"][-1].content
+    # 3. Query your long-term memory store (not state) for relevant memories
+    # Remember: 'store' is a global or outer scoped store you defined earlier
+    memories = store.search(
+        ("memories",),
+        query=last_message_text if isinstance(last_message_text, str) else "",
+    )
+    # 4. Build a system message that includes found memories
+    system_msg = (
+        "You are a shopping assistant with persistent memory.\n"
+        "## Relevant Memories\n"
+        "<memories>\n"
+        f"{memories}\n"
+        "</memories>\n"
+        "Use these memories to provide personalized responses."
+    )
+    # Return the system prompt text
+    return system_msg
+agent = create_agent(
+    model=llm,
+    tools=[
+        create_manage_memory_tool(namespace=("memories",)),
+        search_products,
+    ],
+    middleware=[dynamic_memories_prompt],  # dynamic prompt injection
+    store=store,             # long-term memory
+    checkpointer=checkpointer,  # persistent checkpointing
+)
+def create_shopping_agent():
+    """Complete shopping assistant with memory"""
+    # Memory storage setup
+    store = MongoDBStore(
+        collection=client.memories.user_preferences,
+        index_config=VectorIndexConfig(
+            dims=1024,
+            embed=VoyageAIEmbeddings(model="voyage-3.5-lite", api_key=SecretStr("al-Tb_yMc_j7L50kEyRl_wsHTUARCcs77h0EjiEadFuT7N")),
+            fields=["content"],
+            filters=["active"]
+        )
+    )
+    # Conversation persistence
+    checkpointer = MongoDBSaver(
+        client,
+        db_name="shopping_assistant",
+        collection_name="conversations"
+    )
+    @dynamic_prompt
+    def enhanced_dynamic_prompt(request: ModelRequest) -> str:
+        state = request.state
+        # Safely extract last user text
+        raw = state.get("messages")[-1].content if state.get("messages") else ""
+        if isinstance(raw, dict):  # normalize structured content
+            user_query = raw.get("text", "")
+        elif isinstance(raw, str):
+            user_query = raw
+        else:
+            user_query = ""
+        user_query = user_query.strip() or ""
+        if not user_query:
+            # Fallback prompt with no query inputs
+            return "You are a shopping assistant."
+        memories = store.search(
+            ("preferences",),
+            query=user_query,
+            limit=3,
+            filter={"active": True},
+        )
+        purchase_history = store.search(
+            ("purchases",),
+            query=user_query,
+            limit=2,
+        )
+        system_msg = f"""You are an expert shopping assistant with access to:
+    - Product search capabilities
+    - User preference memory
+    - Purchase history
+    ## User Preferences
+    {memories}
+    ## Recent Purchase Context
+    {purchase_history}
+    Provide personalized, helpful shopping advice."""
+        return system_msg
+    # Create agent with all capabilities
+    return create_agent(
+    model=llm,
+    tools=[
+        create_manage_memory_tool(namespace=("preferences",)),
+        create_manage_memory_tool(namespace=("purchases",)),
+        search_products,
+    ],
+    middleware=[enhanced_dynamic_prompt],  # Attach the dynamic prompt generator
+    store=store,
+    checkpointer=checkpointer,
+)
+# Usage example
+agent = create_shopping_agent()
+def get_user_config(
+    user_id: str,
+    thread_id: str = "default_thread",
+    *,
+    tags: list[str] | None = None,
+    metadata: dict | None = None,
+) -> RunnableConfig:
+    """Build a RunnableConfig for agent runs.
+    Parameters
+    - user_id: unique identifier for the user
+    - thread_id: logical conversation/thread id to group runs
+    - tags: optional list of tags to attach to the run
+    - metadata: optional metadata dict for observability/auditing
+    Returns
+    - RunnableConfig dict accepted by LangChain runnables
+    """
+    config: RunnableConfig = {
+        "configurable": {"user_id": user_id, "thread_id": thread_id}
+    }
+    if tags:
+        config["tags"] = tags
+    if metadata:
+        config["metadata"] = metadata
+    return config
+# Conversation 1: Learning preferences
+user_config1 = get_user_config("user123")
+response = agent.invoke({
+    "messages": [{"role": "user", "content": "I'm vegan and prefer organic products"}]
+}, config=user_config1)
+print(response["messages"][-1].content)
+# Conversation 2: Using learned preferences (different session)
+user_config2 = get_user_config("user123", "mobile-app")
+response = agent.invoke({
+    "messages": [{"role": "user", "content": "Find me some pasta options"}]
+}, config=user_config2)
+# Agent automatically applies vegan + organic filters
+print(response["messages"][-1].content)

src/job_writing_agent/agent_memory/candidate_profile_store.py ADDED Viewed

	@@ -0,0 +1,345 @@

+"""
+Candidate Profile Store using ChromaDB for vector-based resume storage and retrieval.
+"""
+import logging
+import os
+from datetime import datetime
+from pathlib import Path
+from typing import Optional, List, Dict, Any
+import uuid
+import chromadb
+from chromadb.config import Settings
+logger = logging.getLogger(__name__)
+class CandidateProfileStore:
+    """
+    Manages candidate resumes in ChromaDB with vector embeddings.
+    Uses ChromaDB's default embedding function (sentence-transformers/all-MiniLM-L6-v2)
+    for free, local embeddings without API keys.
+    """
+    def __init__(self, persist_directory: str = "./chroma_db"):
+        """
+        Initialize ChromaDB client and collection.
+        Args:
+            persist_directory: Directory to persist ChromaDB data
+        """
+        self.persist_directory = persist_directory
+        # Create directory if it doesn't exist
+        Path(persist_directory).mkdir(parents=True, exist_ok=True)
+        # Initialize ChromaDB client
+        self.client = chromadb.PersistentClient(
+            path=persist_directory,
+            settings=Settings(
+                anonymized_telemetry=False,
+                allow_reset=True
+            )
+        )
+        # Get or create collection with default embedding function
+        self.collection = self.client.get_or_create_collection(
+            name="candidate_resumes",
+            metadata={"hnsw:space": "cosine"}  # Use cosine similarity
+        )
+        logger.info(f"Initialized CandidateProfileStore at {persist_directory}")
+        logger.info(f"Collection contains {self.collection.count()} documents")
+    def add_resume_text(
+        self,
+        candidate_id: str,
+        sections: Dict[str, str],
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> Dict[str, Any]:
+        """
+        Add resume sections directly as text (for demo/testing).
+        Args:
+            candidate_id: Unique identifier for the candidate
+            sections: Dict mapping section names to content
+                     e.g., {"Experience": "...", "Skills": "..."}
+            metadata: Additional metadata (name, email, etc.)
+        Returns:
+            Dict with operation summary
+        """
+        ids = []
+        documents = []
+        metadatas = []
+        base_metadata = metadata or {}
+        timestamp = datetime.now().isoformat()
+        for section_name, content in sections.items():
+            # Split long sections into chunks
+            chunks = self._chunk_text(content, chunk_size=400, overlap=50)
+            for i, chunk in enumerate(chunks):
+                chunk_id = f"{candidate_id}_{section_name}_{i}"
+                ids.append(chunk_id)
+                documents.append(chunk)
+                metadatas.append({
+                    "candidate_id": candidate_id,
+                    "section": section_name,
+                    "chunk_index": i,
+                    "timestamp": timestamp,
+                    **base_metadata
+                })
+        # Add to ChromaDB (auto-embeds with default embedding function)
+        self.collection.add(
+            ids=ids,
+            documents=documents,
+            metadatas=metadatas
+        )
+        result = {
+            "candidate_id": candidate_id,
+            "chunks_stored": len(ids),
+            "sections": list(sections.keys()),
+            "total_documents": self.collection.count()
+        }
+        logger.info(f"Added {len(ids)} chunks for candidate {candidate_id}")
+        return result
+    def query_resume(
+        self,
+        candidate_id: str,
+        query: str,
+        section: Optional[str] = None,
+        n_results: int = 5
+    ) -> List[Dict[str, Any]]:
+        """
+        Semantic search candidate's resume using natural language query.
+        Args:
+            candidate_id: Candidate to search
+            query: Natural language query
+            section: Optional section filter (e.g., "Experience", "Skills")
+            n_results: Number of results to return
+        Returns:
+            List of matching chunks with metadata and relevance scores
+        """
+        # Build where filter with proper ChromaDB syntax for multiple conditions
+        if section:
+            where_filter = {
+                "$and": [
+                    {"candidate_id": candidate_id},
+                    {"section": section}
+                ]
+            }
+        else:
+            where_filter = {"candidate_id": candidate_id}
+        try:
+            results = self.collection.query(
+                query_texts=[query],
+                n_results=n_results,
+                where=where_filter,
+                include=["documents", "metadatas", "distances"]
+            )
+            return self._format_query_results(results)
+        except Exception as e:
+            logger.error(f"Query failed: {e}")
+            return []
+    def get_candidate_sections(
+        self,
+        candidate_id: str,
+        section: Optional[str] = None,
+        limit: Optional[int] = None
+    ) -> List[Dict[str, Any]]:
+        """
+        Get all stored data for a candidate, optionally filtered by section.
+        Args:
+            candidate_id: Candidate identifier
+            section: Optional section filter
+            limit: Maximum number of results
+        Returns:
+            List of documents with metadata
+        """
+        # Build where filter with proper ChromaDB syntax for multiple conditions
+        if section:
+            where_filter = {
+                "$and": [
+                    {"candidate_id": candidate_id},
+                    {"section": section}
+                ]
+            }
+        else:
+            where_filter = {"candidate_id": candidate_id}
+        try:
+            results = self.collection.get(
+                where=where_filter,
+                limit=limit,
+                include=["documents", "metadatas"]
+            )
+            return self._format_get_results(results)
+        except Exception as e:
+            logger.error(f"Get failed: {e}")
+            return []
+    def get_all_sections(self, candidate_id: str) -> Dict[str, str]:
+        """
+        Get all sections for a candidate, reconstructed from chunks.
+        Args:
+            candidate_id: Candidate identifier
+        Returns:
+            Dict mapping section names to reconstructed content
+        """
+        results = self.get_candidate_sections(candidate_id)
+        sections = {}
+        for item in results:
+            section_name = item["metadata"]["section"]
+            if section_name not in sections:
+                sections[section_name] = []
+            sections[section_name].append({
+                "chunk_index": item["metadata"]["chunk_index"],
+                "content": item["document"]
+            })
+        # Sort by chunk_index and join
+        reconstructed = {}
+        for section_name, chunks in sections.items():
+            sorted_chunks = sorted(chunks, key=lambda x: x["chunk_index"])
+            reconstructed[section_name] = " ".join(c["content"] for c in sorted_chunks)
+        return reconstructed
+    def delete_candidate(self, candidate_id: str) -> Dict[str, Any]:
+        """
+        Remove all data for a candidate.
+        Args:
+            candidate_id: Candidate to delete
+        Returns:
+            Operation summary
+        """
+        # Get count before deletion
+        before_count = len(self.get_candidate_sections(candidate_id))
+        self.collection.delete(
+            where={"candidate_id": candidate_id}
+        )
+        logger.info(f"Deleted {before_count} chunks for candidate {candidate_id}")
+        return {
+            "candidate_id": candidate_id,
+            "chunks_deleted": before_count,
+            "total_documents": self.collection.count()
+        }
+    def list_candidates(self) -> List[str]:
+        """
+        Get list of all candidate IDs in the database.
+        Returns:
+            List of unique candidate IDs
+        """
+        all_data = self.collection.get(include=["metadatas"])
+        candidate_ids = set(
+            meta["candidate_id"]
+            for meta in all_data.get("metadatas")
+        )
+        return sorted(list(candidate_ids))
+    def reset(self) -> None:
+        """Reset the database (delete all data). Use with caution!"""
+        self.client.delete_collection("candidate_resumes")
+        self.collection = self.client.get_or_create_collection(
+            name="candidate_resumes",
+            metadata={"hnsw:space": "cosine"}
+        )
+        logger.warning("Database reset - all data deleted")
+    def _chunk_text(self, text: str, chunk_size: int = 400, overlap: int = 50) -> List[str]:
+        """
+        Split text into overlapping chunks.
+        Args:
+            text: Text to chunk
+            chunk_size: Maximum chunk size in characters
+            overlap: Overlap between chunks
+        Returns:
+            List of text chunks
+        """
+        if len(text) <= chunk_size:
+            return [text]
+        chunks = []
+        start = 0
+        while start < len(text):
+            end = start + chunk_size
+            chunk = text[start:end]
+            # Try to break at sentence boundary
+            if end < len(text):
+                last_period = chunk.rfind(". ")
+                if last_period > chunk_size // 2:
+                    end = start + last_period + 1
+                    chunk = text[start:end]
+            chunks.append(chunk.strip())
+            start = end - overlap
+        return chunks
+    def _format_query_results(self, results: Dict[str, Any]) -> List[Dict[str, Any]]:
+        """Format ChromaDB query results into a cleaner structure."""
+        formatted = []
+        if not results['ids'] or not results['ids'][0]:
+            return formatted
+        for i in range(len(results['ids'][0])):
+            formatted.append({
+                "id": results['ids'][0][i],
+                "document": results['documents'][0][i],
+                "metadata": results['metadatas'][0][i],
+                "distance": results['distances'][0][i],
+                "relevance_score": 1 - results['distances'][0][i]  # Convert distance to similarity
+            })
+        return formatted
+    def _format_get_results(self, results: Dict[str, Any]) -> List[Dict[str, Any]]:
+        """Format ChromaDB get results into a cleaner structure."""
+        formatted = []
+        if not results['ids']:
+            return formatted
+        for i in range(len(results['ids'])):
+            formatted.append({
+                "id": results['ids'][i],
+                "document": results['documents'][i],
+                "metadata": results['metadatas'][i]
+            })
+        return formatted

src/job_writing_agent/agent_memory/mongodb_logterm_memory.py ADDED Viewed

File without changes

uv.lock CHANGED Viewed

The diff for this file is too large to render. See raw diff