Commit
Β·
ed76153
1
Parent(s):
d8b1415
docs: Add known content quality limitations for 7B Free Tier
Browse filesDocuments expected model behavior limitations:
- Hallucinated citations (fake paper titles/authors)
- Anatomical confusion (male/female context errors)
- Nonsensical medical claims
- Duplicate content sections
Clarifies these are model capacity limits, not stack bugs.
docs/architecture/HF_FREE_TIER_ANALYSIS.md
CHANGED
|
@@ -64,5 +64,50 @@ For the Unified Chat Client architecture:
|
|
| 64 |
1. **Tier 0 (Free):** Hardcoded to Native Models (Qwen 7B, Mistral Nemo).
|
| 65 |
2. **Tier 1 (BYO Key):** Allow user to select any model (70B+), assuming they provide a key that grants access to premium providers or PRO tier.
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
---
|
| 68 |
*Analysis performed by Gemini CLI Agent, Dec 2, 2025*
|
|
|
|
|
|
| 64 |
1. **Tier 0 (Free):** Hardcoded to Native Models (Qwen 7B, Mistral Nemo).
|
| 65 |
2. **Tier 1 (BYO Key):** Allow user to select any model (70B+), assuming they provide a key that grants access to premium providers or PRO tier.
|
| 66 |
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
## 5. Known Content Quality Limitations (7B Models)
|
| 70 |
+
|
| 71 |
+
**Status**: As of December 2025, the Free Tier (Qwen 2.5 7B) produces **working multi-agent orchestration** but with notable content quality limitations.
|
| 72 |
+
|
| 73 |
+
### What Works Well
|
| 74 |
+
- Multi-agent coordination (Manager β Search β Hypothesis β Report)
|
| 75 |
+
- Clean streaming output (no garbage tokens, no raw JSON)
|
| 76 |
+
- Proper agent handoffs and progress tracking
|
| 77 |
+
- Coherent narrative structure
|
| 78 |
+
|
| 79 |
+
### Known Limitations
|
| 80 |
+
|
| 81 |
+
| Issue | Description | Severity |
|
| 82 |
+
|-------|-------------|----------|
|
| 83 |
+
| **Hallucinated Citations** | Model generates plausible-sounding but fake paper titles/authors instead of using actual search results | Medium |
|
| 84 |
+
| **Anatomical Confusion** | May apply male anatomy (e.g., "penile rigidity") to female health queries | High |
|
| 85 |
+
| **Nonsensical Medical Claims** | May generate claims like "prostate cancer risk" in context of female patients | High |
|
| 86 |
+
| **Duplicate Content** | Final reports sometimes contain repeated sections | Low |
|
| 87 |
+
|
| 88 |
+
### Why This Happens
|
| 89 |
+
|
| 90 |
+
7B parameter models have limited:
|
| 91 |
+
- **World knowledge**: Can't reliably recall specific paper titles/authors
|
| 92 |
+
- **Context grounding**: May ignore search results and hallucinate instead
|
| 93 |
+
- **Domain reasoning**: Complex medical topics exceed reasoning capacity
|
| 94 |
+
|
| 95 |
+
### User Guidance
|
| 96 |
+
|
| 97 |
+
**Free Tier is best for:**
|
| 98 |
+
- Understanding the research workflow
|
| 99 |
+
- Getting general topic overviews
|
| 100 |
+
- Testing the system before committing to paid tier
|
| 101 |
+
|
| 102 |
+
**For accurate medical research:**
|
| 103 |
+
- Use Paid Tier (GPT-5) for citation accuracy
|
| 104 |
+
- Always verify citations against actual databases
|
| 105 |
+
- Treat Free Tier output as "draft quality"
|
| 106 |
+
|
| 107 |
+
### Not a Stack Bug
|
| 108 |
+
|
| 109 |
+
These are **model capability limitations**, not bugs in the DeepBoner architecture. The orchestration, streaming, and agent coordination are working correctly.
|
| 110 |
+
|
| 111 |
---
|
| 112 |
*Analysis performed by Gemini CLI Agent, Dec 2, 2025*
|
| 113 |
+
*Content quality section added Dec 3, 2025*
|