Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

DeepBoner's Core Task:
┌─────────────────────────────────────────────────────────┐
│  User Query: "Evidence for testosterone in HSDD?"       │
│                         ↓                               │
│  1. Search PubMed, ClinicalTrials, Europe PMC          │
│  2. Judge: Is evidence sufficient?                      │
│  3. Synthesize: Generate report                         │
│                         ↓                               │
│  Output: Research report with citations                 │
└─────────────────────────────────────────────────────────┘

Does ANY step require self-knowledge of codebase? NO.

Why Not mGREP for Tool Selection

Approach	Complexity	Accuracy
Embeddings + mGREP for tool selection	High	Medium (semantic similarity ≠ correct tool)
Direct prompting with tool descriptions	Low	High (LLM reasons about applicability)

No real agent system uses embeddings for tool selection. All major frameworks (LangChain, OpenAI, Anthropic, Magentic) use prompt-based tool selection because:

LLMs are already doing semantic matching internally
Tool count is small (5-20) - fits easily in context
Prompts allow reasoning, not just similarity

What We Already Have

DeepBoner already uses embeddings for the right thing: research evidence retrieval.

src/services/embeddings.py - ChromaDB + sentence-transformers
src/services/llamaindex_rag.py - OpenAI embeddings for premium tier

The Real Priority

Instead of internal embeddings/mGREP, focus on:

Deduplication across PubMed/Europe PMC/OpenAlex
Outcome measures from ClinicalTrials.gov
Citation graph traversal via OpenAlex

See: TOOL_ANALYSIS_CRITICAL.md for detailed improvement roadmap.

Research Sources

SICA Paper (ICLR 2025) - Self-improving agents
Gödel Agent (ACL 2025) - Recursive self-modification
Introspection Paradox (EMNLP 2025) - Self-knowledge can hurt performance
Anthropic Introspection Research - ~20% accuracy on genuine introspection

This document is closed. The conclusion is: don't implement internal embeddings/mGREP for this use case.