Spaces:

DataQuests
/

DeepCritical

Sleeping

# Fetch recent papers (last 90 days, first 100 papers)
url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"
# Then filter client-side for keywords

What Actually Happens:

Fetches the first 100 papers from medRxiv in the last 90 days (chronological order)
Filters those 100 random papers for query keywords
Returns whatever garbage matches

Result: For "Long COVID medications", you get random papers like:

"Calf muscle structure-function adaptations"
"Work-Life Balance of Ophthalmologists During COVID"

These papers contain "COVID" somewhere but have NOTHING to do with Long COVID treatments.

Root Cause: The /0/json pagination only returns 100 papers. You'd need to paginate through ALL papers (thousands) to do proper keyword filtering.

Fix Options:

Remove BioRxiv entirely - It's unusable without proper search API
Use a different preprint aggregator - Europe PMC has preprints WITH search
Add pagination - Fetch all papers (slow, expensive)
Use Semantic Scholar API - Has preprints and proper search

P0-002: Free Tier LLM Cannot Perform Drug Identification

File: src/agent_factory/judges.py:153-211

The Problem: Without an API key, the app uses HFInferenceJudgeHandler with:

Llama 3.1 8B Instruct
Mistral 7B Instruct

These are 7-8 billion parameter models. They cannot:

Reliably parse complex biomedical abstracts
Identify drug candidates from scientific text
Generate structured JSON output consistently
Reason about mechanism of action

Evidence of Failure:

# From MockJudgeHandler - the honest fallback when LLM fails
drug_candidates=[
    "Drug identification requires AI analysis",
    "Enter API key above for full results",
]

The team KNEW the free tier can't identify drugs and added this message.

Root Cause: Drug repurposing requires understanding:

Drug mechanisms
Disease pathophysiology
Clinical trial phases
Statistical significance

This requires GPT-4 / Claude Sonnet class models (100B+ parameters).

Fix Options:

Require API key - No free tier, be honest
Use larger HF models - Llama 70B or Mixtral 8x7B (expensive on free tier)
Hybrid approach - Use free tier for search, require paid for synthesis

P0-003: PubMed Query Not Optimized

File: src/tools/pubmed.py:54-71

The Problem: The query is passed directly to PubMed without optimization:

search_params = self._build_params(
    db="pubmed",
    term=query,  # Raw user query!
    retmax=max_results,
    sort="relevance",
)

What User Enters: "What medications show promise for Long COVID?"

What PubMed Receives: What medications show promise for Long COVID?

What PubMed Should Receive:

("long covid"[Title/Abstract] OR "post-COVID"[Title/Abstract] OR "PASC"[Title/Abstract])
AND (drug[Title/Abstract] OR treatment[Title/Abstract] OR medication[Title/Abstract] OR therapy[Title/Abstract])
AND (clinical trial[Publication Type] OR randomized[Title/Abstract])

Root Cause: No query preprocessing or medical term expansion.

Fix Options:

Add query preprocessor - Extract medical entities, expand synonyms
Use MeSH terms - PubMed's controlled vocabulary for better recall
LLM query generation - Use LLM to generate optimized PubMed query

P0-004: Loop Terminates Too Early

File: src/app.py:42-45 and src/utils/models.py

The Problem:

config = OrchestratorConfig(
    max_iterations=5,
    max_results_per_tool=10,
)

5 iterations is not enough to:

Search multiple variations of the query
Gather enough evidence for the Judge to synthesize
Refine queries based on initial results

Evidence: The user's output shows "Max Iterations Reached" with only 6 sources.

Root Cause: Conservative defaults to avoid API costs, but makes app useless.

Fix Options:

Increase default to 10-15 - More iterations = better results
Dynamic termination - Stop when confidence > threshold, not iteration count
Parallel query expansion - Run more queries per iteration

P0-005: No Query Understanding Layer

Files: src/orchestrator.py, src/tools/search_handler.py

The Problem: There's no NLU (Natural Language Understanding) layer. The system:

Takes raw user query
Passes directly to search tools
No entity extraction
No intent classification
No query expansion

For drug repurposing, you need to extract:

Disease: "Long COVID" → [Long COVID, PASC, Post-COVID syndrome, chronic COVID]
Drug intent: "medications" → [drugs, treatments, therapeutics, interventions]
Evidence type: "show promise" → [clinical trials, efficacy, RCT]

Root Cause: No preprocessing pipeline between user input and search execution.

Fix Options:

Add entity extraction - Use BioBERT or PubMedBERT for medical NER
Add query expansion - Use medical ontologies (UMLS, MeSH)
LLM preprocessing - Use LLM to generate search strategy before searching

P0-006: ClinicalTrials.gov Results Not Filtered

File: src/tools/clinicaltrials.py

The Problem: ClinicalTrials.gov returns ALL matching trials including:

Withdrawn trials
Terminated trials
Not yet recruiting
Observational studies (not interventional)

For drug repurposing, you want:

Interventional studies
Phase 2+ (has safety/efficacy data)
Completed or with results

Root Cause: No filtering of trial metadata.

Summary: Why This App Produces Garbage

User Query: "What medications show promise for Long COVID?"
    │
    ▼
┌─────────────────────────────────────────────────────────────┐
│ NO QUERY PREPROCESSING                                       │
│ - No entity extraction                                       │
│ - No synonym expansion                                       │
│ - No medical term normalization                              │
└─────────────────────────────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────────────────────────────┐
│ BROKEN SEARCH LAYER                                          │
│ - PubMed: Raw query, no MeSH, gets 1 result                 │
│ - BioRxiv: Returns random papers (API doesn't support search)│
│ - ClinicalTrials: Returns all trials, no filtering          │
└─────────────────────────────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────────────────────────────┐
│ GARBAGE EVIDENCE                                             │
│ - 6 papers, most irrelevant                                  │
│ - "Calf muscle adaptations" (mentions COVID once)            │
│ - "Ophthalmologist work-life balance"                        │
└─────────────────────────────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────────────────────────────┐
│ DUMB JUDGE (Free Tier)                                       │
│ - Llama 8B can't identify drugs from garbage                 │
│ - JSON parsing fails                                         │
│ - Falls back to "Drug identification requires AI analysis"   │
└─────────────────────────────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────────────────────────────┐
│ LOOP HITS MAX (5 iterations)                                 │
│ - Never finds enough good evidence                           │
│ - Never synthesizes anything useful                          │
└─────────────────────────────────────────────────────────────┘
    │
    ▼
    GARBAGE OUTPUT

What Would Make This Actually Work

Minimum Viable Fix (1-2 days)

Remove BioRxiv - It doesn't work
Require API key - Be honest that free tier is useless
Add basic query preprocessing - Strip question words, expand COVID synonyms
Increase iterations to 10

Proper Fix (1-2 weeks)

Query Understanding Layer
- Medical NER (BioBERT/SciBERT)
- Query expansion with MeSH/UMLS
- Intent classification (drug discovery vs mechanism vs safety)
Optimized Search
- PubMed: Proper query syntax with MeSH terms
- ClinicalTrials: Filter by phase, status, intervention type
- Replace BioRxiv with Europe PMC (has preprints + search)
Evidence Ranking
- Score by publication type (RCT > cohort > case report)
- Score by journal impact factor
- Score by recency
- Score by citation count
Proper LLM Pipeline
- Use GPT-4 / Claude for synthesis
- Structured extraction of: drug, mechanism, evidence level, effect size
- Multi-step reasoning: identify → validate → rank → synthesize

The Hard Truth

Building a drug repurposing agent that works is HARD. The state of the art is:

Drug2Disease (IBM) - Uses knowledge graphs + ML
COVID-KG (Stanford) - Dedicated COVID knowledge graph
Literature Mining at scale (PubMed) - Millions of papers, not 10

This hackathon project is fundamentally a search wrapper with an LLM prompt. That's not enough.

To make it useful:

Either scope it down (e.g., "find clinical trials for X disease")
Or invest serious engineering in the NLU + search + ranking pipeline