Agentic-RagBot / docs /archive /SKILL_TO_CODE_MAPPING.md
Nikhil Pravin Pise
docs: update all documentation to reflect current codebase state
aefac4f

╔════════════════════════════════════════════════════════════════════════════╗ β•‘ πŸ“š SKILL-TO-CODE MAPPING: Where Each Skill Applies in RagBot β•‘ β•‘ Reference guide showing skill application locations β•‘ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

This document maps each of the 34 skills to specific code files and critical issues they resolve. Use this for quick lookup: "Where do I apply Skill #X?"

════════════════════════════════════════════════════════════════════════════════

CRITICAL ISSUES MAPPING TO SKILLS ════════════════════════════════════════════════════════════════════════════════

ISSUE #1: biomarker_flags & safety_alerts not propagating through workflow ────────────────────────────────────────────────────────────────────────────── Problem Location: src/state.py, src/agents/*.py, src/workflow.py Affected Code: β”œβ”€ GuildState (missing fields) β”œβ”€ BiomarkerAnalyzerAgent.invoke() (only returns biomarkers) β”œβ”€ ResponseSynthesizerAgent.invoke() (fields missing in input) └─ Workflow edges (state not fully passed)

Primary Skills: βœ“ #2 Workflow Orchestration Patterns β†’ Fix state passing βœ“ #3 Multi-Agent Orchestration β†’ Ensure deterministic flow βœ“ #16 Structured Output β†’ Enforce complete schema

Secondary Skills: β€’ #22 Testing Patterns β†’ Write tests for state flow β€’ #27 Observability β†’ Log state changes

Action: Read src/state.py β†’ identify missing fields β†’ update all agents to return complete state β†’ test end-to-end

ISSUE #2: Schema mismatch between workflow output & API formatter ────────────────────────────────────────────────────────────────────────────── Problem Location: src/workflow.py, api/app/models/ (missing or inconsistent) Affected Code: β”œβ”€ ResponseSynthesizerAgent output structure (varies) β”œβ”€ api/app/services/ragbot.py format_response() (expects different keys) β”œβ”€ CLI scripts/chat.py (different field names) └─ Tests referencing old schema

Primary Skills: βœ“ #16 AI Wrapper/Structured Output β†’ Create unified Pydantic model βœ“ #22 Testing Patterns β†’ Write schema validation tests

Secondary Skills: β€’ #27 Observability β†’ Log schema mismatches (debugging)

Action: Create api/app/models/response.py with BaseAnalysisResponse β†’ update all agents to return it β†’ validate in API

ISSUE #3: Prediction confidence forced to 0.5 (dangerous for medical) ────────────────────────────────────────────────────────────────────────────── Problem Location: src/agents/confidence_assessor.py, api/app/routes/analyze.py Affected Code: β”œβ”€ ConfidenceAssessorAgent.invoke() (ignores actual assessment) β”œβ”€ Default response in analyze endpoint (hardcoded 0.5) └─ CLI logic (no failure path for low confidence)

Primary Skills: βœ“ #13 Senior Prompt Engineer β†’ Better reasoning in assessor βœ“ #14 LLM Evaluation β†’ Benchmark accuracy

Secondary Skills: β€’ #4 Agentic Development β†’ Decision logic improvements β€’ #22 Testing Patterns β†’ Test confidence boundaries β€’ #27 Observability β†’ Track confidence distributions

Action: Update confidence_assessor.py to use actual evidence β†’ test with multiple biomarker scenarios β†’ Add high/medium/low confidence paths

ISSUE #4: Biomarker naming inconsistency (API vs CLI) ────────────────────────────────────────────────────────────────────────────── Problem Location: config/biomarker_references.json, src/agents/, api/ Affected Code: β”œβ”€ config/biomarker_references.json (canonical list) β”œβ”€ BiomarkerAnalyzerAgent (validation against reference) β”œβ”€ CLI scripts/chat.py (different naming) └─ API endpoints (naming transformation)

Primary Skills: βœ“ #9 Chunking Strategy β†’ Include standard names in embedding βœ“ #16 Structured Output β†’ Enforce standard field names

Secondary Skills: β€’ #10 Embedding Pipeline β†’ Index with canonical names β€’ #22 Testing Patterns β†’ Test name transformation β€’ #27 Observability β†’ Log name mismatches

Action: Create biomarker_normalizer() β†’ apply in all code paths β†’ add mapping tests

ISSUE #5: JSON parsing breaks on malformed LLM output ────────────────────────────────────────────────────────────────────────────── Problem Location: api/app/services/extraction.py, src/agents/extraction code Affected Code: β”œβ”€ LLM.predict() returns text β”œβ”€ json.loads() has no error handling β”œβ”€ Invalid JSON crashes endpoint └─ No fallback strategy

Primary Skills: βœ“ #5 Tool/Function Calling β†’ Use function calling instead βœ“ #21 Python Error Handling β†’ Graceful degradation

Secondary Skills: β€’ #16 Structured Output β†’ Pydantic validation β€’ #19 LLM Security β†’ Prevent injection in JSON β€’ #27 Observability β†’ Log parsing failures β€’ #14 LLM Evaluation β†’ Track failure rate

Action: Replace json.loads() with Pydantic validator β†’ implement retry logic β†’ add function calling as fallback

ISSUE #6: No citation enforcement in RAG outputs ────────────────────────────────────────────────────────────────────────────── Problem Location: src/agents/disease_explainer.py, response synthesis Affected Code: β”œβ”€ retriever.retrieve() returns docs but citations dropped β”œβ”€ DiseaseExplainerAgent doesn't track sources β”œβ”€ ResponseSynthesizerAgent loses citation info └─ API response has no source attribution

Primary Skills: βœ“ #11 RAG Implementation β†’ Enforce citations in loop βœ“ #8 Hybrid Search β†’ Better relevance = better cites βœ“ #12 Knowledge Graph β†’ Link to authoritative sources

Secondary Skills: β€’ #1 LangChain Architecture β†’ Tool for citation tracking β€’ #7 RAG Agent Builder β†’ Full RAG best practices β€’ #14 LLM Evaluation β†’ Test for hallucinations β€’ #27 Observability β†’ Track citation frequency

Action: Modify disease_explainer.py to preserve doc metadata β†’ add citation validation β†’ return sources in API response

════════════════════════════════════════════════════════════════════════════════

SKILL-BY-SKILL APPLICATION GUIDE ════════════════════════════════════════════════════════════════════════════════

#1 LangChain Architecture Phase: 3, Week 7 Apply To: src/agents/, src/services/ Key Files: └─ src/agents/base_agent.py (NEW) - Create BaseAgent with LangChain patterns └─ src/agents//invoke() - Add callbacks, chains, tools └─ src/services/.py - RunnableWithMessageHistory for conversation Integration: Advanced chain composition, callbacks for metrics Outcome: More sophisticated agent orchestration Effort: 3-4 hours

#2 Workflow Orchestration Patterns Phase: 1, Week 1 / Phase 4, Week 12 (final review) Apply To: src/workflow.py, src/state.py Key Files: └─ src/state.py - REFACTOR GuildState with all fields └─ src/workflow.py - REFACTOR state passing between agents └─ src/agents/biomarker_analyzer.py - Return complete state └─ src/agents/disease_explainer.py - Preserve incoming state Integration: Fix Issue #1 (state propagation) Outcome: biomarker_flags & safety_alerts flow through entire workflow Effort: 4-6 hours (Week 1) + 2 hours (Week 12 refine)

#3 Multi-Agent Orchestration Phase: 1, Week 2 Apply To: src/workflow.py Key Files: └─ src/workflow.py - Ensure deterministic agent order └─ Parallel execution order documentation Integration: Ensure agents execute in correct order with proper state passing Outcome: Deterministic workflow execution Effort: 3-4 hours

#4 Agentic Development Phase: 2, Week 3 Apply To: src/agents/biomarker_analyzer.py, confidence_assessor.py Key Files: └─ BiomarkerAnalyzerAgent.invoke() - Add confidence thresholds └─ ConfidenceAssessorAgent - Better decision logic └─ Add reasoning trace to responses Integration: Better medical decisions, alternatives for low confidence Outcome: More reliable biomarker analysis Effort: 3-4 hours

#5 Tool/Function Calling Patterns Phase: 2, Week 4 Apply To: api/app/services/extraction.py, src/agents/extraction.py Key Files: └─ api/app/services/extraction.py - Define extraction tools/functions └─ src/agents/ - Use function returns instead of JSON parsing Integration: Fix Issue #5 (JSON parsing fragility) Outcome: Structured LLM outputs guaranteed valid Effort: 3-4 hours

#6 LLM Application Dev with LangChain Phase: 4, Week 11 Apply To: src/agents/ (production patterns) Key Files: └─ src/agents/base_agent.py - Implement lifecycle (setup, execute, cleanup) └─ All agents - Add retry logic, graceful degradation └─ Agent composition patterns - Chain agents Integration: Production-ready agent code Outcome: Robust, maintainable agents with error recovery Effort: 4-5 hours

#7 RAG Agent Builder Phase: 4, Week 12 Apply To: src/agents/ (full review) Key Files: └─ src/agents/disease_explainer.py - RAG pattern review └─ Ensure all responses cite sources └─ Verify accuracy benchmarks Integration: Full RAG agent validation before production Outcome: Production-ready RAG agents Effort: 4-5 hours

#8 Hybrid Search Implementation Phase: 3, Week 6 Apply To: src/retrievers/ (NEW) Key Files: └─ src/retrievers/hybrid_retriever.py (NEW) - Combine BM25 + FAISS └─ src/agents/disease_explainer.py - Use hybrid retriever Integration: Better document retrieval (semantic + keyword) Outcome: +15% recall on rare disease queries Effort: 4-6 hours

#9 Chunking Strategy Phase: 3, Week 6 Apply To: src/chunking_strategy.py (NEW), src/pdf_processor.py Key Files: └─ src/chunking_strategy.py (NEW) - Split by medical sections └─ scripts/setup_embeddings.py - Use new chunking └─ Re-chunk and re-embed medical_knowledge.faiss Integration: Fix Issue #4 (naming), improve context window usage Outcome: Better semantic chunks, improved retrieval quality Effort: 4-5 hours

#10 Embedding Pipeline Builder Phase: 3, Week 6 Apply To: src/llm_config.py, scripts/setup_embeddings.py Key Files: └─ src/llm_config.py - Consider medical embedding models └─ scripts/setup_embeddings.py - Use new embeddings └─ Benchmark embedding quality Integration: Better semantic search for medical terminology Outcome: Improved document relevance ranking Effort: 3-4 hours

#11 RAG Implementation Phase: 3, Week 6 Apply To: src/agents/disease_explainer.py Key Files: └─ src/agents/disease_explainer.py - Track and enforce citations └─ src/models/response.py - Add sources field └─ api/app/routes/analyze.py - Return sources Integration: Fix Issue #6 (no citations), enforce medical accuracy Outcome: All claims backed by sources Effort: 3-4 hours

#12 Knowledge Graph Builder Phase: 3, Week 7 Apply To: src/knowledge_graph.py (NEW) Key Files: └─ src/knowledge_graph.py (NEW) - Disease β†’ Biomarker β†’ Treatment graph └─ Extract entities from medical PDFs └─ src/agents/biomarker_analyzer.py - Use knowledge graph └─ Create graph.html visualization Integration: Better disease prediction via relationships Outcome: Knowledge graph with 100+ nodes, 500+ edges Effort: 6-8 hours

#13 Senior Prompt Engineer Phase: 2, Week 3 Apply To: src/agents/ (all agent prompts) Key Files: └─ src/agents/biomarker_analyzer.py - Prompt: few-shot extraction └─ src/agents/disease_explainer.py - Prompt: chain-of-thought reasoning └─ src/agents/confidence_assessor.py - Prompt: decision logic └─ src/agents/clinical_guidelines.py - Prompt: evidence-based Integration: Fix Issue #3 (confidence), improve medical reasoning Outcome: +15% accuracy improvement Effort: 5-6 hours

#14 LLM Evaluation Phase: 2, Week 4 Apply To: tests/evaluation_metrics.py (NEW) Key Files: └─ tests/evaluation_metrics.py (NEW) - Benchmarking suite └─ tests/fixtures/evaluation_patients.py - Test scenarios └─ Benchmark Groq vs Gemini performance └─ Track before/after improvements Integration: Measure all improvements quantitatively Outcome: Clear metrics showing progress Effort: 4-5 hours

#15 Cost-Aware LLM Pipeline Phase: 3, Week 8 Apply To: src/llm_config.py Key Files: └─ src/llm_config.py - Model routing by complexity └─ Implement caching (hash β†’ result) └─ Cost tracking and reporting └─ Target: -40% cost reduction Integration: Optimize API costs without sacrificing accuracy Outcome: Lower operational costs Effort: 4-5 hours

#16 AI Wrapper/Structured Output Phase: 1, Week 1 Apply To: api/app/models/ (NEW and REFACTORED) Key Files: └─ api/app/models/response.py (NEW) - Create unified BaseAnalysisResponse └─ api/app/services/ragbot.py - Use unified schema └─ All agents - Match unified output └─ API responses - Validate with Pydantic Integration: Fix Issues #1, #2, #4, #5 (schema consistency) Outcome: Single canonical response format Effort: 3-5 hours

#17 API Security Hardening Phase: 1, Week 1 Apply To: api/app/middleware/, api/main.py Key Files: └─ api/app/middleware/auth.py (NEW) - JWT auth └─ api/main.py - Add security middleware chain └─ CORS, headers, rate limiting Integration: Secure REST API endpoints Outcome: API hardened against common attacks Effort: 4-6 hours

#18 OWASP Security Check Phase: 1, Week 1 Apply To: docs/ (audit report) Key Files: └─ docs/SECURITY_AUDIT.md (NEW) - Security findings └─ Scan api/ and src/ for vulnerabilities └─ Create tickets for each issue Integration: Establish security baseline Outcome: All vulnerabilities documented and prioritized Effort: 2-3 hours

#19 LLM Security Phase: 1, Week 2 Apply To: api/app/middleware/input_validation.py (NEW) Key Files: └─ api/app/middleware/input_validation.py (NEW) - Input sanitization └─ Detect prompt injection attempts └─ Validate biomarker inputs └─ Escape special characters Integration: Fix Issue #5 (JSON safety), prevent prompt injection Outcome: Inputs validated before LLM processing Effort: 3-4 hours

#20 API Rate Limiting Phase: 1, Week 1 Apply To: api/app/middleware/rate_limiter.py (NEW) Key Files: └─ api/app/middleware/rate_limiter.py (NEW) - Token bucket limiter └─ api/main.py - Add to middleware chain └─ Tiered limits (free/pro based on API key) Integration: Protect API from abuse Outcome: Rate limiting in place Effort: 2-3 hours

#21 Python Error Handling Phase: 2, Week 2 Apply To: src/exceptions.py (NEW), src/agents/ Key Files: └─ src/exceptions.py (NEW) - Custom exception hierarchy └─ RagBotException, BiomarkerValidationError, LLMTimeoutError, etc. └─ All agents - Replace generic try-except └─ API - Proper error responses Integration: Graceful error handling throughout system Outcome: No uncaught exceptions, useful error messages Effort: 3-4 hours

#22 Python Testing Patterns Phase: 1, Week 1 + Phase 2, Week 3 (primary), Week 4 Apply To: tests/ (throughout project) Key Files: └─ tests/conftest.py - Shared fixtures └─ tests/fixtures/ - auth, biomarkers, patients └─ tests/test_api_auth.py - Auth tests (Week 1) └─ tests/test_parametrized_*.py - 50+ parametrized tests (Week 3) └─ tests/test_response_schema.py - Schema validation (Week 1) └─ 80-90% code coverage Integration: Comprehensive test suite ensures reliability Outcome: 125+ tests, 90%+ coverage Effort: 10-13 hours total

#23 Code Review Excellence Phase: 4, Week 10 Apply To: docs/REVIEW_GUIDELINES.md (NEW), all PRs Key Files: └─ docs/REVIEW_GUIDELINES.md (NEW) - Medical code review standards └─ Apply to all Phase 1-3 pull requests └─ Self-review checklist Integration: Maintain code quality Outcome: Clear review guidelines Effort: 2-3 hours

#24 GitHub Actions Templates Phase: 1, Week 2 Apply To: .github/workflows/ (NEW) Key Files: └─ .github/workflows/test.yml - Run tests on PR └─ .github/workflows/security.yml - Security checks └─ .github/workflows/docker.yml - Build Docker images Integration: Automated CI/CD pipeline Outcome: Tests run automatically Effort: 2-3 hours

#25 FastAPI Templates Phase: 4, Week 9 Apply To: api/app/main.py, api/app/dependencies.py Key Files: └─ api/app/main.py - REFACTOR with best practices └─ Async patterns, dependency injection └─ Connection pooling, caching headers └─ Health check endpoints Integration: Production-grade FastAPI configuration Outcome: Optimized API performance Effort: 3-4 hours

#26 Python Design Patterns Phase: 2, Week 3 Apply To: src/agents/base_agent.py (NEW), src/agents/ Key Files: └─ src/agents/base_agent.py (NEW) - Extract common pattern └─ Factory pattern for agent creation └─ Composition over inheritance └─ Refactor BiomarkerAnalyzerAgent, etc. Integration: Cleaner, more maintainable code Outcome: Reduced coupling, better abstractions Effort: 4-5 hours

#27 Python Observability Phase: 1, Week 2 (logging) / Phase 4, Week 10 (metrics) / Phase 2, Week 5 Apply To: src/, api/app/ Key Files: └─ src/observability.py (NEW) - Logging infrastructure (Week 2) └─ All agents - Add structured JSON logging └─ src/monitoring/ (NEW) - Prometheus metrics (Week 10) └─ Track latency, accuracy, costs Integration: Visibility into system behavior Outcome: JSON logs, metrics at /metrics Effort: 12-15 hours total

#28 Memory Management Phase: 3, Week 7 Apply To: src/memory_manager.py (NEW) Key Files: └─ src/memory_manager.py (NEW) - Sliding window memory └─ Context compression for conversation history └─ Token usage optimization Integration: Handle long conversations without exceeding limits Outcome: 20-30% token savings Effort: 3-4 hours

#29 API Docs Generator Phase: 4, Week 9 Apply To: api/app/routes/ (documentation) Key Files: └─ api/app/routes/*.py - Enhance docstrings └─ Add examples to endpoints └─ Auto-generates /docs (Swagger UI), /redoc Integration: API discoverable by developers Outcome: Interactive API documentation Effort: 2-3 hours

#30 GitHub PR Review Workflow Phase: 4, Week 9 Apply To: .github/ (NEW) Key Files: └─ .github/CODEOWNERS - Code ownership rules └─ .github/pull_request_template.md - PR checklist └─ Branch protection rules Integration: Establish code review standards Outcome: Consistent PR quality Effort: 2-3 hours

#31 CI-CD Best Practices Phase: 4, Week 10 Apply To: .github/workflows/deploy.yml (NEW) Key Files: └─ .github/workflows/deploy.yml (NEW) - Deployment pipeline └─ Build β†’ Test β†’ Staging β†’ Canary β†’ Production └─ Environment management (.env files) Integration: Automated, safe deployments Outcome: Confident production deployments Effort: 3-4 hours

#32 Frontend Accessibility (OPTIONAL) Phase: 4, Week 10 Apply To: examples/web_interface/ (if building web UI) Key Files: └─ examples/web_interface/ - WCAG 2.1 AA compliance Integration: Accessible web interface (if needed) Outcome: Screen-reader friendly, keyboard navigable Effort: 2-3 hours (skip if CLI only)

#33 Webhook Receiver Hardener (OPTIONAL) Phase: 4, Week 11 Apply To: api/app/webhooks/ (NEW, if integrations needed) Key Files: └─ api/app/webhooks/ (NEW) - Webhook handlers └─ Signature verification, replay protection Integration: Secure webhook handling for EHR integrations Outcome: Protected webhook endpoints Effort: 2-3 hours (skip if no webhooks)

════════════════════════════════════════════════════════════════════════════════

QUICK LOOKUP: BY FILE

api/app/main.py β”œβ”€ #17 API Security Hardening (JWT middleware) β”œβ”€ #20 Rate Limiting (rate limiter middleware) β”œβ”€ #25 FastAPI Templates (async patterns) β”œβ”€ #24 GitHub Actions (workflow) (CI/CD reference) └─ #29 API Docs Generator (docstrings)

api/app/models/response.py (NEW) β”œβ”€ #16 AI Wrapper/Structured Output (unified schema) └─ #22 Testing Patterns (Pydantic validation)

api/app/middleware/ (NEW) β”œβ”€ auth.py #17 API Security Hardening β”œβ”€ input_validation.py #19 LLM Security └─ rate_limiter.py #20 API Rate Limiting

src/state.py β”œβ”€ #2 Workflow Orchestration (fix state fields) β”œβ”€ #16 Structured Output (enforce schema) └─ #22 Testing Patterns (state tests)

src/workflow.py β”œβ”€ #2 Workflow Orchestration (state passing) β”œβ”€ #3 Multi-Agent Orchestration (agent order) └─ #27 Observability (logging)

src/agents/base_agent.py (NEW) β”œβ”€ #26 Python Design Patterns (factory, composition) β”œβ”€ #6 LLM App Dev LangChain (lifecycle) β”œβ”€ #21 Error Handling (graceful degradation) └─ #27 Observability (logging)

src/agents/biomarker_analyzer.py β”œβ”€ #4 Agentic Development (confidence thresholds) β”œβ”€ #13 Senior Prompt Engineer (prompt optimization) β”œβ”€ #2 Workflow Orchestration (return complete state) └─ #12 Knowledge Graph (use relationships)

src/agents/disease_explainer.py β”œβ”€ #8 Hybrid Search (retriever) β”œβ”€ #11 RAG Implementation (enforcement) β”œβ”€ #13 Senior Prompt Engineer (chain-of-thought) β”œβ”€ #1 LangChain Architecture (advanced patterns) └─ #7 RAG Agent Builder (RAG best practices)

src/agents/confidence_assessor.py β”œβ”€ #4 Agentic Development (decision logic) β”œβ”€ #13 Senior Prompt Engineer (better reasoning) β”œβ”€ #14 LLM Evaluation (benchmark) └─ #22 Testing Patterns (confidence tests)

src/agents/clinical_guidelines.py β”œβ”€ #13 Senior Prompt Engineer (evidence-based) └─ #1 LangChain Architecture (advanced retrieval)

src/exceptions.py (NEW) β”œβ”€ #21 Python Error Handling (exception hierarchy) └─ #27 Observability (error logging)

src/retrievers/hybrid_retriever.py (NEW) β”œβ”€ #8 Hybrid Search Implementation (BM25 + FAISS) β”œβ”€ #9 Chunking Strategy (better chunks) β”œβ”€ #10 Embedding Pipeline (semantic search) └─ #27 Observability (retrieval metrics)

src/chunking_strategy.py (NEW) β”œβ”€ #9 Chunking Strategy (medical section splitting) β”œβ”€ #10 Embedding Pipeline (prepare for embedding) └─ #4 Agentic Development (standardization)

src/knowledge_graph.py (NEW) β”œβ”€ #12 Knowledge Graph Builder (extract relationships) β”œβ”€ #13 Senior Prompt Engineer (entity extraction prompt) └─ #1 LangChain Architecture (graph traversal)

src/memory_manager.py (NEW) β”œβ”€ #28 Memory Management (sliding window, compression) └─ #15 Cost-Aware Pipeline (token optimization)

src/llm_config.py β”œβ”€ #15 Cost-Aware LLM Pipeline (model routing, caching) β”œβ”€ #10 Embedding Pipeline (embedding model config) └─ #27 Observability (cost tracking)

src/observability.py (NEW) β”œβ”€ #27 Python Observability (logging, metrics) β”œβ”€ #21 Error Handling (error tracking) └─ #14 LLM Evaluation (metric collection)

src/monitoring/ (NEW) └─ #27 Python Observability (metrics, dashboards)

tests/conftest.py └─ #22 Python Testing Patterns (shared fixtures)

tests/fixtures/ β”œβ”€ auth.py #22 Testing Patterns β”œβ”€ biomarkers.py #22 Testing Patterns └─ evaluation_patients.py #14 LLM Evaluation

tests/test_api_auth.py (NEW) β”œβ”€ #22 Python Testing Patterns β”œβ”€ #17 API Security Hardening └─ #25 FastAPI Templates

tests/test_parametrized_*.py (NEW) └─ #22 Python Testing Patterns

tests/evaluation_metrics.py (NEW) └─ #14 LLM Evaluation

.github/workflows/ β”œβ”€ test.yml #24 GitHub Actions Templates β”œβ”€ security.yml #18 OWASP Check + #24 Actions β”œβ”€ docker.yml #24 Actions └─ deploy.yml #31 CI-CD Best Practices

.github/ β”œβ”€ CODEOWNERS #30 GitHub PR Review Workflow β”œβ”€ pull_request_template.md #30 Workflow └─ branch protection rules

docs/ β”œβ”€ SECURITY_AUDIT.md #18 OWASP Check β”œβ”€ REVIEW_GUIDELINES.md #23 Code Review Excellence └─ API.md (updated by #29 API Docs Generator)

════════════════════════════════════════════════════════════════════════════════

SKILL DEPENDENCY GRAPH ════════════════════════════════════════════════════════════════════════════════

Phase 1 must finish before Phase 2: #18, #17, #22, #2, #16, #20, #3, #19, #21, #27, #24 ↓ Phase 2 requires Phase 1: #22, #26, #4, #13, #14, #5 ↓ Phase 3 requires Phases 1-2: #8, #9, #10, #11, #12, #1, #28, #15 ↓ Phase 4 requires Phases 1-3: #25, #29, #30, #27, #23, #31, #32*, #6, #33*, #7

Within phases, some order dependencies:

  • #16 should complete before other Phase 1 work finalizes
  • #13 should complete before #14 evaluation
  • #8, #9, #10 should coordinate (hybrid search β†’ chunking β†’ embeddings)
  • #11 depends on #8 (retriever first)
  • #12 depends on #13 (prompt engineering for entity extraction)
  • #27 used 3 times (Week 2, Week 5, Week 10)
  • #22 used 2 times (Week 1, Weeks 3-4)

════════════════════════════════════════════════════════════════════════════════

DAILY WORKFLOW ════════════════════════════════════════════════════════════════════════════════

  1. Open the skill SKILL.md documented in ~/.agents/skills//
  2. Read the relevant section for your task
  3. Apply to specific code files listed above
  4. Write tests immediately (use #22 Testing Patterns)
  5. Commit with clear message: "feat: [Skill #X] [Description]"
  6. Track in IMPLEMENTATION_STATUS_TRACKER.md

════════════════════════════════════════════════════════════════════════════════