Embeddings Brainstorm - Conclusions
Date: November 2025 Status: CLOSED - Conclusions reached, no action needed
The Question
Should DeepBoner implement:
- Internal codebase embeddings/ingestion pipeline?
- mGREP for internal tool selection?
- Self-knowledge components for agents?
The Answer: NO
After research and first-principles analysis, the conclusion is clear:
Why Not Internal Embeddings/Ingestion
DeepBoner's Core Task:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Query: "Evidence for testosterone in HSDD?" β
β β β
β 1. Search PubMed, ClinicalTrials, Europe PMC β
β 2. Judge: Is evidence sufficient? β
β 3. Synthesize: Generate report β
β β β
β Output: Research report with citations β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Does ANY step require self-knowledge of codebase? NO.
Why Not mGREP for Tool Selection
| Approach | Complexity | Accuracy |
|---|---|---|
| Embeddings + mGREP for tool selection | High | Medium (semantic similarity β correct tool) |
| Direct prompting with tool descriptions | Low | High (LLM reasons about applicability) |
No real agent system uses embeddings for tool selection. All major frameworks (LangChain, OpenAI, Anthropic, Magentic) use prompt-based tool selection because:
- LLMs are already doing semantic matching internally
- Tool count is small (5-20) - fits easily in context
- Prompts allow reasoning, not just similarity
What We Already Have
DeepBoner already uses embeddings for the right thing: research evidence retrieval.
src/services/embeddings.py- ChromaDB + sentence-transformerssrc/services/llamaindex_rag.py- OpenAI embeddings for premium tier
The Real Priority
Instead of internal embeddings/mGREP, focus on:
- Deduplication across PubMed/Europe PMC/OpenAlex
- Outcome measures from ClinicalTrials.gov
- Citation graph traversal via OpenAlex
See: TOOL_ANALYSIS_CRITICAL.md for detailed improvement roadmap.
Research Sources
- SICA Paper (ICLR 2025) - Self-improving agents
- GΓΆdel Agent (ACL 2025) - Recursive self-modification
- Introspection Paradox (EMNLP 2025) - Self-knowledge can hurt performance
- Anthropic Introspection Research - ~20% accuracy on genuine introspection
This document is closed. The conclusion is: don't implement internal embeddings/mGREP for this use case.