VibecoderMcSwaggins commited on
Commit
ed76153
Β·
1 Parent(s): d8b1415

docs: Add known content quality limitations for 7B Free Tier

Browse files

Documents expected model behavior limitations:
- Hallucinated citations (fake paper titles/authors)
- Anatomical confusion (male/female context errors)
- Nonsensical medical claims
- Duplicate content sections

Clarifies these are model capacity limits, not stack bugs.

docs/architecture/HF_FREE_TIER_ANALYSIS.md CHANGED
@@ -64,5 +64,50 @@ For the Unified Chat Client architecture:
64
  1. **Tier 0 (Free):** Hardcoded to Native Models (Qwen 7B, Mistral Nemo).
65
  2. **Tier 1 (BYO Key):** Allow user to select any model (70B+), assuming they provide a key that grants access to premium providers or PRO tier.
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  ---
68
  *Analysis performed by Gemini CLI Agent, Dec 2, 2025*
 
 
64
  1. **Tier 0 (Free):** Hardcoded to Native Models (Qwen 7B, Mistral Nemo).
65
  2. **Tier 1 (BYO Key):** Allow user to select any model (70B+), assuming they provide a key that grants access to premium providers or PRO tier.
66
 
67
+ ---
68
+
69
+ ## 5. Known Content Quality Limitations (7B Models)
70
+
71
+ **Status**: As of December 2025, the Free Tier (Qwen 2.5 7B) produces **working multi-agent orchestration** but with notable content quality limitations.
72
+
73
+ ### What Works Well
74
+ - Multi-agent coordination (Manager β†’ Search β†’ Hypothesis β†’ Report)
75
+ - Clean streaming output (no garbage tokens, no raw JSON)
76
+ - Proper agent handoffs and progress tracking
77
+ - Coherent narrative structure
78
+
79
+ ### Known Limitations
80
+
81
+ | Issue | Description | Severity |
82
+ |-------|-------------|----------|
83
+ | **Hallucinated Citations** | Model generates plausible-sounding but fake paper titles/authors instead of using actual search results | Medium |
84
+ | **Anatomical Confusion** | May apply male anatomy (e.g., "penile rigidity") to female health queries | High |
85
+ | **Nonsensical Medical Claims** | May generate claims like "prostate cancer risk" in context of female patients | High |
86
+ | **Duplicate Content** | Final reports sometimes contain repeated sections | Low |
87
+
88
+ ### Why This Happens
89
+
90
+ 7B parameter models have limited:
91
+ - **World knowledge**: Can't reliably recall specific paper titles/authors
92
+ - **Context grounding**: May ignore search results and hallucinate instead
93
+ - **Domain reasoning**: Complex medical topics exceed reasoning capacity
94
+
95
+ ### User Guidance
96
+
97
+ **Free Tier is best for:**
98
+ - Understanding the research workflow
99
+ - Getting general topic overviews
100
+ - Testing the system before committing to paid tier
101
+
102
+ **For accurate medical research:**
103
+ - Use Paid Tier (GPT-5) for citation accuracy
104
+ - Always verify citations against actual databases
105
+ - Treat Free Tier output as "draft quality"
106
+
107
+ ### Not a Stack Bug
108
+
109
+ These are **model capability limitations**, not bugs in the DeepBoner architecture. The orchestration, streaming, and agent coordination are working correctly.
110
+
111
  ---
112
  *Analysis performed by Gemini CLI Agent, Dec 2, 2025*
113
+ *Content quality section added Dec 3, 2025*