Agentic-RagBot / docs /archive /SKILL_TO_CODE_MAPPING.md
Nikhil Pravin Pise
docs: update all documentation to reflect current codebase state
aefac4f
╔════════════════════════════════════════════════════════════════════════════╗
β•‘ πŸ“š SKILL-TO-CODE MAPPING: Where Each Skill Applies in RagBot β•‘
β•‘ Reference guide showing skill application locations β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
This document maps each of the 34 skills to specific code files and critical
issues they resolve. Use this for quick lookup: "Where do I apply Skill #X?"
════════════════════════════════════════════════════════════════════════════════
CRITICAL ISSUES MAPPING TO SKILLS
════════════════════════════════════════════════════════════════════════════════
ISSUE #1: biomarker_flags & safety_alerts not propagating through workflow
──────────────────────────────────────────────────────────────────────────────
Problem Location: src/state.py, src/agents/*.py, src/workflow.py
Affected Code:
β”œβ”€ GuildState (missing fields)
β”œβ”€ BiomarkerAnalyzerAgent.invoke() (only returns biomarkers)
β”œβ”€ ResponseSynthesizerAgent.invoke() (fields missing in input)
└─ Workflow edges (state not fully passed)
Primary Skills:
βœ“ #2 Workflow Orchestration Patterns β†’ Fix state passing
βœ“ #3 Multi-Agent Orchestration β†’ Ensure deterministic flow
βœ“ #16 Structured Output β†’ Enforce complete schema
Secondary Skills:
β€’ #22 Testing Patterns β†’ Write tests for state flow
β€’ #27 Observability β†’ Log state changes
Action: Read src/state.py β†’ identify missing fields β†’ update all agents to
return complete state β†’ test end-to-end
ISSUE #2: Schema mismatch between workflow output & API formatter
──────────────────────────────────────────────────────────────────────────────
Problem Location: src/workflow.py, api/app/models/ (missing or inconsistent)
Affected Code:
β”œβ”€ ResponseSynthesizerAgent output structure (varies)
β”œβ”€ api/app/services/ragbot.py format_response() (expects different keys)
β”œβ”€ CLI scripts/chat.py (different field names)
└─ Tests referencing old schema
Primary Skills:
βœ“ #16 AI Wrapper/Structured Output β†’ Create unified Pydantic model
βœ“ #22 Testing Patterns β†’ Write schema validation tests
Secondary Skills:
β€’ #27 Observability β†’ Log schema mismatches (debugging)
Action: Create api/app/models/response.py with BaseAnalysisResponse β†’
update all agents to return it β†’ validate in API
ISSUE #3: Prediction confidence forced to 0.5 (dangerous for medical)
──────────────────────────────────────────────────────────────────────────────
Problem Location: src/agents/confidence_assessor.py, api/app/routes/analyze.py
Affected Code:
β”œβ”€ ConfidenceAssessorAgent.invoke() (ignores actual assessment)
β”œβ”€ Default response in analyze endpoint (hardcoded 0.5)
└─ CLI logic (no failure path for low confidence)
Primary Skills:
βœ“ #13 Senior Prompt Engineer β†’ Better reasoning in assessor
βœ“ #14 LLM Evaluation β†’ Benchmark accuracy
Secondary Skills:
β€’ #4 Agentic Development β†’ Decision logic improvements
β€’ #22 Testing Patterns β†’ Test confidence boundaries
β€’ #27 Observability β†’ Track confidence distributions
Action: Update confidence_assessor.py to use actual evidence β†’ test with
multiple biomarker scenarios β†’ Add high/medium/low confidence paths
ISSUE #4: Biomarker naming inconsistency (API vs CLI)
──────────────────────────────────────────────────────────────────────────────
Problem Location: config/biomarker_references.json, src/agents/*, api/*
Affected Code:
β”œβ”€ config/biomarker_references.json (canonical list)
β”œβ”€ BiomarkerAnalyzerAgent (validation against reference)
β”œβ”€ CLI scripts/chat.py (different naming)
└─ API endpoints (naming transformation)
Primary Skills:
βœ“ #9 Chunking Strategy β†’ Include standard names in embedding
βœ“ #16 Structured Output β†’ Enforce standard field names
Secondary Skills:
β€’ #10 Embedding Pipeline β†’ Index with canonical names
β€’ #22 Testing Patterns β†’ Test name transformation
β€’ #27 Observability β†’ Log name mismatches
Action: Create biomarker_normalizer() β†’ apply in all code paths β†’ add
mapping tests
ISSUE #5: JSON parsing breaks on malformed LLM output
──────────────────────────────────────────────────────────────────────────────
Problem Location: api/app/services/extraction.py, src/agents/extraction code
Affected Code:
β”œβ”€ LLM.predict() returns text
β”œβ”€ json.loads() has no error handling
β”œβ”€ Invalid JSON crashes endpoint
└─ No fallback strategy
Primary Skills:
βœ“ #5 Tool/Function Calling β†’ Use function calling instead
βœ“ #21 Python Error Handling β†’ Graceful degradation
Secondary Skills:
β€’ #16 Structured Output β†’ Pydantic validation
β€’ #19 LLM Security β†’ Prevent injection in JSON
β€’ #27 Observability β†’ Log parsing failures
β€’ #14 LLM Evaluation β†’ Track failure rate
Action: Replace json.loads() with Pydantic validator β†’ implement retry logic
β†’ add function calling as fallback
ISSUE #6: No citation enforcement in RAG outputs
──────────────────────────────────────────────────────────────────────────────
Problem Location: src/agents/disease_explainer.py, response synthesis
Affected Code:
β”œβ”€ retriever.retrieve() returns docs but citations dropped
β”œβ”€ DiseaseExplainerAgent doesn't track sources
β”œβ”€ ResponseSynthesizerAgent loses citation info
└─ API response has no source attribution
Primary Skills:
βœ“ #11 RAG Implementation β†’ Enforce citations in loop
βœ“ #8 Hybrid Search β†’ Better relevance = better cites
βœ“ #12 Knowledge Graph β†’ Link to authoritative sources
Secondary Skills:
β€’ #1 LangChain Architecture β†’ Tool for citation tracking
β€’ #7 RAG Agent Builder β†’ Full RAG best practices
β€’ #14 LLM Evaluation β†’ Test for hallucinations
β€’ #27 Observability β†’ Track citation frequency
Action: Modify disease_explainer.py to preserve doc metadata β†’ add citation
validation β†’ return sources in API response
════════════════════════════════════════════════════════════════════════════════
SKILL-BY-SKILL APPLICATION GUIDE
════════════════════════════════════════════════════════════════════════════════
#1 LangChain Architecture
Phase: 3, Week 7
Apply To: src/agents/, src/services/
Key Files:
└─ src/agents/base_agent.py (NEW) - Create BaseAgent with LangChain patterns
└─ src/agents/*/invoke() - Add callbacks, chains, tools
└─ src/services/*.py - RunnableWithMessageHistory for conversation
Integration: Advanced chain composition, callbacks for metrics
Outcome: More sophisticated agent orchestration
Effort: 3-4 hours
#2 Workflow Orchestration Patterns
Phase: 1, Week 1 / Phase 4, Week 12 (final review)
Apply To: src/workflow.py, src/state.py
Key Files:
└─ src/state.py - REFACTOR GuildState with all fields
└─ src/workflow.py - REFACTOR state passing between agents
└─ src/agents/biomarker_analyzer.py - Return complete state
└─ src/agents/disease_explainer.py - Preserve incoming state
Integration: Fix Issue #1 (state propagation)
Outcome: biomarker_flags & safety_alerts flow through entire workflow
Effort: 4-6 hours (Week 1) + 2 hours (Week 12 refine)
#3 Multi-Agent Orchestration
Phase: 1, Week 2
Apply To: src/workflow.py
Key Files:
└─ src/workflow.py - Ensure deterministic agent order
└─ Parallel execution order documentation
Integration: Ensure agents execute in correct order with proper state passing
Outcome: Deterministic workflow execution
Effort: 3-4 hours
#4 Agentic Development
Phase: 2, Week 3
Apply To: src/agents/biomarker_analyzer.py, confidence_assessor.py
Key Files:
└─ BiomarkerAnalyzerAgent.invoke() - Add confidence thresholds
└─ ConfidenceAssessorAgent - Better decision logic
└─ Add reasoning trace to responses
Integration: Better medical decisions, alternatives for low confidence
Outcome: More reliable biomarker analysis
Effort: 3-4 hours
#5 Tool/Function Calling Patterns
Phase: 2, Week 4
Apply To: api/app/services/extraction.py, src/agents/extraction.py
Key Files:
└─ api/app/services/extraction.py - Define extraction tools/functions
└─ src/agents/ - Use function returns instead of JSON parsing
Integration: Fix Issue #5 (JSON parsing fragility)
Outcome: Structured LLM outputs guaranteed valid
Effort: 3-4 hours
#6 LLM Application Dev with LangChain
Phase: 4, Week 11
Apply To: src/agents/ (production patterns)
Key Files:
└─ src/agents/base_agent.py - Implement lifecycle (setup, execute, cleanup)
└─ All agents - Add retry logic, graceful degradation
└─ Agent composition patterns - Chain agents
Integration: Production-ready agent code
Outcome: Robust, maintainable agents with error recovery
Effort: 4-5 hours
#7 RAG Agent Builder
Phase: 4, Week 12
Apply To: src/agents/ (full review)
Key Files:
└─ src/agents/disease_explainer.py - RAG pattern review
└─ Ensure all responses cite sources
└─ Verify accuracy benchmarks
Integration: Full RAG agent validation before production
Outcome: Production-ready RAG agents
Effort: 4-5 hours
#8 Hybrid Search Implementation
Phase: 3, Week 6
Apply To: src/retrievers/ (NEW)
Key Files:
└─ src/retrievers/hybrid_retriever.py (NEW) - Combine BM25 + FAISS
└─ src/agents/disease_explainer.py - Use hybrid retriever
Integration: Better document retrieval (semantic + keyword)
Outcome: +15% recall on rare disease queries
Effort: 4-6 hours
#9 Chunking Strategy
Phase: 3, Week 6
Apply To: src/chunking_strategy.py (NEW), src/pdf_processor.py
Key Files:
└─ src/chunking_strategy.py (NEW) - Split by medical sections
└─ scripts/setup_embeddings.py - Use new chunking
└─ Re-chunk and re-embed medical_knowledge.faiss
Integration: Fix Issue #4 (naming), improve context window usage
Outcome: Better semantic chunks, improved retrieval quality
Effort: 4-5 hours
#10 Embedding Pipeline Builder
Phase: 3, Week 6
Apply To: src/llm_config.py, scripts/setup_embeddings.py
Key Files:
└─ src/llm_config.py - Consider medical embedding models
└─ scripts/setup_embeddings.py - Use new embeddings
└─ Benchmark embedding quality
Integration: Better semantic search for medical terminology
Outcome: Improved document relevance ranking
Effort: 3-4 hours
#11 RAG Implementation
Phase: 3, Week 6
Apply To: src/agents/disease_explainer.py
Key Files:
└─ src/agents/disease_explainer.py - Track and enforce citations
└─ src/models/response.py - Add sources field
└─ api/app/routes/analyze.py - Return sources
Integration: Fix Issue #6 (no citations), enforce medical accuracy
Outcome: All claims backed by sources
Effort: 3-4 hours
#12 Knowledge Graph Builder
Phase: 3, Week 7
Apply To: src/knowledge_graph.py (NEW)
Key Files:
└─ src/knowledge_graph.py (NEW) - Disease β†’ Biomarker β†’ Treatment graph
└─ Extract entities from medical PDFs
└─ src/agents/biomarker_analyzer.py - Use knowledge graph
└─ Create graph.html visualization
Integration: Better disease prediction via relationships
Outcome: Knowledge graph with 100+ nodes, 500+ edges
Effort: 6-8 hours
#13 Senior Prompt Engineer
Phase: 2, Week 3
Apply To: src/agents/ (all agent prompts)
Key Files:
└─ src/agents/biomarker_analyzer.py - Prompt: few-shot extraction
└─ src/agents/disease_explainer.py - Prompt: chain-of-thought reasoning
└─ src/agents/confidence_assessor.py - Prompt: decision logic
└─ src/agents/clinical_guidelines.py - Prompt: evidence-based
Integration: Fix Issue #3 (confidence), improve medical reasoning
Outcome: +15% accuracy improvement
Effort: 5-6 hours
#14 LLM Evaluation
Phase: 2, Week 4
Apply To: tests/evaluation_metrics.py (NEW)
Key Files:
└─ tests/evaluation_metrics.py (NEW) - Benchmarking suite
└─ tests/fixtures/evaluation_patients.py - Test scenarios
└─ Benchmark Groq vs Gemini performance
└─ Track before/after improvements
Integration: Measure all improvements quantitatively
Outcome: Clear metrics showing progress
Effort: 4-5 hours
#15 Cost-Aware LLM Pipeline
Phase: 3, Week 8
Apply To: src/llm_config.py
Key Files:
└─ src/llm_config.py - Model routing by complexity
└─ Implement caching (hash β†’ result)
└─ Cost tracking and reporting
└─ Target: -40% cost reduction
Integration: Optimize API costs without sacrificing accuracy
Outcome: Lower operational costs
Effort: 4-5 hours
#16 AI Wrapper/Structured Output
Phase: 1, Week 1
Apply To: api/app/models/ (NEW and REFACTORED)
Key Files:
└─ api/app/models/response.py (NEW) - Create unified BaseAnalysisResponse
└─ api/app/services/ragbot.py - Use unified schema
└─ All agents - Match unified output
└─ API responses - Validate with Pydantic
Integration: Fix Issues #1, #2, #4, #5 (schema consistency)
Outcome: Single canonical response format
Effort: 3-5 hours
#17 API Security Hardening
Phase: 1, Week 1
Apply To: api/app/middleware/, api/main.py
Key Files:
└─ api/app/middleware/auth.py (NEW) - JWT auth
└─ api/main.py - Add security middleware chain
└─ CORS, headers, rate limiting
Integration: Secure REST API endpoints
Outcome: API hardened against common attacks
Effort: 4-6 hours
#18 OWASP Security Check
Phase: 1, Week 1
Apply To: docs/ (audit report)
Key Files:
└─ docs/SECURITY_AUDIT.md (NEW) - Security findings
└─ Scan api/ and src/ for vulnerabilities
└─ Create tickets for each issue
Integration: Establish security baseline
Outcome: All vulnerabilities documented and prioritized
Effort: 2-3 hours
#19 LLM Security
Phase: 1, Week 2
Apply To: api/app/middleware/input_validation.py (NEW)
Key Files:
└─ api/app/middleware/input_validation.py (NEW) - Input sanitization
└─ Detect prompt injection attempts
└─ Validate biomarker inputs
└─ Escape special characters
Integration: Fix Issue #5 (JSON safety), prevent prompt injection
Outcome: Inputs validated before LLM processing
Effort: 3-4 hours
#20 API Rate Limiting
Phase: 1, Week 1
Apply To: api/app/middleware/rate_limiter.py (NEW)
Key Files:
└─ api/app/middleware/rate_limiter.py (NEW) - Token bucket limiter
└─ api/main.py - Add to middleware chain
└─ Tiered limits (free/pro based on API key)
Integration: Protect API from abuse
Outcome: Rate limiting in place
Effort: 2-3 hours
#21 Python Error Handling
Phase: 2, Week 2
Apply To: src/exceptions.py (NEW), src/agents/
Key Files:
└─ src/exceptions.py (NEW) - Custom exception hierarchy
└─ RagBotException, BiomarkerValidationError, LLMTimeoutError, etc.
└─ All agents - Replace generic try-except
└─ API - Proper error responses
Integration: Graceful error handling throughout system
Outcome: No uncaught exceptions, useful error messages
Effort: 3-4 hours
#22 Python Testing Patterns
Phase: 1, Week 1 + Phase 2, Week 3 (primary), Week 4
Apply To: tests/ (throughout project)
Key Files:
└─ tests/conftest.py - Shared fixtures
└─ tests/fixtures/ - auth, biomarkers, patients
└─ tests/test_api_auth.py - Auth tests (Week 1)
└─ tests/test_parametrized_*.py - 50+ parametrized tests (Week 3)
└─ tests/test_response_schema.py - Schema validation (Week 1)
└─ 80-90% code coverage
Integration: Comprehensive test suite ensures reliability
Outcome: 125+ tests, 90%+ coverage
Effort: 10-13 hours total
#23 Code Review Excellence
Phase: 4, Week 10
Apply To: docs/REVIEW_GUIDELINES.md (NEW), all PRs
Key Files:
└─ docs/REVIEW_GUIDELINES.md (NEW) - Medical code review standards
└─ Apply to all Phase 1-3 pull requests
└─ Self-review checklist
Integration: Maintain code quality
Outcome: Clear review guidelines
Effort: 2-3 hours
#24 GitHub Actions Templates
Phase: 1, Week 2
Apply To: .github/workflows/ (NEW)
Key Files:
└─ .github/workflows/test.yml - Run tests on PR
└─ .github/workflows/security.yml - Security checks
└─ .github/workflows/docker.yml - Build Docker images
Integration: Automated CI/CD pipeline
Outcome: Tests run automatically
Effort: 2-3 hours
#25 FastAPI Templates
Phase: 4, Week 9
Apply To: api/app/main.py, api/app/dependencies.py
Key Files:
└─ api/app/main.py - REFACTOR with best practices
└─ Async patterns, dependency injection
└─ Connection pooling, caching headers
└─ Health check endpoints
Integration: Production-grade FastAPI configuration
Outcome: Optimized API performance
Effort: 3-4 hours
#26 Python Design Patterns
Phase: 2, Week 3
Apply To: src/agents/base_agent.py (NEW), src/agents/
Key Files:
└─ src/agents/base_agent.py (NEW) - Extract common pattern
└─ Factory pattern for agent creation
└─ Composition over inheritance
└─ Refactor BiomarkerAnalyzerAgent, etc.
Integration: Cleaner, more maintainable code
Outcome: Reduced coupling, better abstractions
Effort: 4-5 hours
#27 Python Observability
Phase: 1, Week 2 (logging) / Phase 4, Week 10 (metrics) / Phase 2, Week 5
Apply To: src/, api/app/
Key Files:
└─ src/observability.py (NEW) - Logging infrastructure (Week 2)
└─ All agents - Add structured JSON logging
└─ src/monitoring/ (NEW) - Prometheus metrics (Week 10)
└─ Track latency, accuracy, costs
Integration: Visibility into system behavior
Outcome: JSON logs, metrics at /metrics
Effort: 12-15 hours total
#28 Memory Management
Phase: 3, Week 7
Apply To: src/memory_manager.py (NEW)
Key Files:
└─ src/memory_manager.py (NEW) - Sliding window memory
└─ Context compression for conversation history
└─ Token usage optimization
Integration: Handle long conversations without exceeding limits
Outcome: 20-30% token savings
Effort: 3-4 hours
#29 API Docs Generator
Phase: 4, Week 9
Apply To: api/app/routes/ (documentation)
Key Files:
└─ api/app/routes/*.py - Enhance docstrings
└─ Add examples to endpoints
└─ Auto-generates /docs (Swagger UI), /redoc
Integration: API discoverable by developers
Outcome: Interactive API documentation
Effort: 2-3 hours
#30 GitHub PR Review Workflow
Phase: 4, Week 9
Apply To: .github/ (NEW)
Key Files:
└─ .github/CODEOWNERS - Code ownership rules
└─ .github/pull_request_template.md - PR checklist
└─ Branch protection rules
Integration: Establish code review standards
Outcome: Consistent PR quality
Effort: 2-3 hours
#31 CI-CD Best Practices
Phase: 4, Week 10
Apply To: .github/workflows/deploy.yml (NEW)
Key Files:
└─ .github/workflows/deploy.yml (NEW) - Deployment pipeline
└─ Build β†’ Test β†’ Staging β†’ Canary β†’ Production
└─ Environment management (.env files)
Integration: Automated, safe deployments
Outcome: Confident production deployments
Effort: 3-4 hours
#32 Frontend Accessibility (OPTIONAL)
Phase: 4, Week 10
Apply To: examples/web_interface/ (if building web UI)
Key Files:
└─ examples/web_interface/ - WCAG 2.1 AA compliance
Integration: Accessible web interface (if needed)
Outcome: Screen-reader friendly, keyboard navigable
Effort: 2-3 hours (skip if CLI only)
#33 Webhook Receiver Hardener (OPTIONAL)
Phase: 4, Week 11
Apply To: api/app/webhooks/ (NEW, if integrations needed)
Key Files:
└─ api/app/webhooks/ (NEW) - Webhook handlers
└─ Signature verification, replay protection
Integration: Secure webhook handling for EHR integrations
Outcome: Protected webhook endpoints
Effort: 2-3 hours (skip if no webhooks)
════════════════════════════════════════════════════════════════════════════════
QUICK LOOKUP: BY FILE
api/app/main.py
β”œβ”€ #17 API Security Hardening (JWT middleware)
β”œβ”€ #20 Rate Limiting (rate limiter middleware)
β”œβ”€ #25 FastAPI Templates (async patterns)
β”œβ”€ #24 GitHub Actions (workflow) (CI/CD reference)
└─ #29 API Docs Generator (docstrings)
api/app/models/response.py (NEW)
β”œβ”€ #16 AI Wrapper/Structured Output (unified schema)
└─ #22 Testing Patterns (Pydantic validation)
api/app/middleware/ (NEW)
β”œβ”€ auth.py #17 API Security Hardening
β”œβ”€ input_validation.py #19 LLM Security
└─ rate_limiter.py #20 API Rate Limiting
src/state.py
β”œβ”€ #2 Workflow Orchestration (fix state fields)
β”œβ”€ #16 Structured Output (enforce schema)
└─ #22 Testing Patterns (state tests)
src/workflow.py
β”œβ”€ #2 Workflow Orchestration (state passing)
β”œβ”€ #3 Multi-Agent Orchestration (agent order)
└─ #27 Observability (logging)
src/agents/base_agent.py (NEW)
β”œβ”€ #26 Python Design Patterns (factory, composition)
β”œβ”€ #6 LLM App Dev LangChain (lifecycle)
β”œβ”€ #21 Error Handling (graceful degradation)
└─ #27 Observability (logging)
src/agents/biomarker_analyzer.py
β”œβ”€ #4 Agentic Development (confidence thresholds)
β”œβ”€ #13 Senior Prompt Engineer (prompt optimization)
β”œβ”€ #2 Workflow Orchestration (return complete state)
└─ #12 Knowledge Graph (use relationships)
src/agents/disease_explainer.py
β”œβ”€ #8 Hybrid Search (retriever)
β”œβ”€ #11 RAG Implementation (enforcement)
β”œβ”€ #13 Senior Prompt Engineer (chain-of-thought)
β”œβ”€ #1 LangChain Architecture (advanced patterns)
└─ #7 RAG Agent Builder (RAG best practices)
src/agents/confidence_assessor.py
β”œβ”€ #4 Agentic Development (decision logic)
β”œβ”€ #13 Senior Prompt Engineer (better reasoning)
β”œβ”€ #14 LLM Evaluation (benchmark)
└─ #22 Testing Patterns (confidence tests)
src/agents/clinical_guidelines.py
β”œβ”€ #13 Senior Prompt Engineer (evidence-based)
└─ #1 LangChain Architecture (advanced retrieval)
src/exceptions.py (NEW)
β”œβ”€ #21 Python Error Handling (exception hierarchy)
└─ #27 Observability (error logging)
src/retrievers/hybrid_retriever.py (NEW)
β”œβ”€ #8 Hybrid Search Implementation (BM25 + FAISS)
β”œβ”€ #9 Chunking Strategy (better chunks)
β”œβ”€ #10 Embedding Pipeline (semantic search)
└─ #27 Observability (retrieval metrics)
src/chunking_strategy.py (NEW)
β”œβ”€ #9 Chunking Strategy (medical section splitting)
β”œβ”€ #10 Embedding Pipeline (prepare for embedding)
└─ #4 Agentic Development (standardization)
src/knowledge_graph.py (NEW)
β”œβ”€ #12 Knowledge Graph Builder (extract relationships)
β”œβ”€ #13 Senior Prompt Engineer (entity extraction prompt)
└─ #1 LangChain Architecture (graph traversal)
src/memory_manager.py (NEW)
β”œβ”€ #28 Memory Management (sliding window, compression)
└─ #15 Cost-Aware Pipeline (token optimization)
src/llm_config.py
β”œβ”€ #15 Cost-Aware LLM Pipeline (model routing, caching)
β”œβ”€ #10 Embedding Pipeline (embedding model config)
└─ #27 Observability (cost tracking)
src/observability.py (NEW)
β”œβ”€ #27 Python Observability (logging, metrics)
β”œβ”€ #21 Error Handling (error tracking)
└─ #14 LLM Evaluation (metric collection)
src/monitoring/ (NEW)
└─ #27 Python Observability (metrics, dashboards)
tests/conftest.py
└─ #22 Python Testing Patterns (shared fixtures)
tests/fixtures/
β”œβ”€ auth.py #22 Testing Patterns
β”œβ”€ biomarkers.py #22 Testing Patterns
└─ evaluation_patients.py #14 LLM Evaluation
tests/test_api_auth.py (NEW)
β”œβ”€ #22 Python Testing Patterns
β”œβ”€ #17 API Security Hardening
└─ #25 FastAPI Templates
tests/test_parametrized_*.py (NEW)
└─ #22 Python Testing Patterns
tests/evaluation_metrics.py (NEW)
└─ #14 LLM Evaluation
.github/workflows/
β”œβ”€ test.yml #24 GitHub Actions Templates
β”œβ”€ security.yml #18 OWASP Check + #24 Actions
β”œβ”€ docker.yml #24 Actions
└─ deploy.yml #31 CI-CD Best Practices
.github/
β”œβ”€ CODEOWNERS #30 GitHub PR Review Workflow
β”œβ”€ pull_request_template.md #30 Workflow
└─ branch protection rules
docs/
β”œβ”€ SECURITY_AUDIT.md #18 OWASP Check
β”œβ”€ REVIEW_GUIDELINES.md #23 Code Review Excellence
└─ API.md (updated by #29 API Docs Generator)
════════════════════════════════════════════════════════════════════════════════
SKILL DEPENDENCY GRAPH
════════════════════════════════════════════════════════════════════════════════
Phase 1 must finish before Phase 2:
#18, #17, #22, #2, #16, #20, #3, #19, #21, #27, #24
↓
Phase 2 requires Phase 1:
#22, #26, #4, #13, #14, #5
↓
Phase 3 requires Phases 1-2:
#8, #9, #10, #11, #12, #1, #28, #15
↓
Phase 4 requires Phases 1-3:
#25, #29, #30, #27, #23, #31, #32*, #6, #33*, #7
Within phases, some order dependencies:
- #16 should complete before other Phase 1 work finalizes
- #13 should complete before #14 evaluation
- #8, #9, #10 should coordinate (hybrid search β†’ chunking β†’ embeddings)
- #11 depends on #8 (retriever first)
- #12 depends on #13 (prompt engineering for entity extraction)
- #27 used 3 times (Week 2, Week 5, Week 10)
- #22 used 2 times (Week 1, Weeks 3-4)
════════════════════════════════════════════════════════════════════════════════
DAILY WORKFLOW
════════════════════════════════════════════════════════════════════════════════
1. Open the skill SKILL.md documented in ~/.agents/skills/<skill-name>/
2. Read the relevant section for your task
3. Apply to specific code files listed above
4. Write tests immediately (use #22 Testing Patterns)
5. Commit with clear message: "feat: [Skill #X] [Description]"
6. Track in IMPLEMENTATION_STATUS_TRACKER.md
════════════════════════════════════════════════════════════════════════════════