Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.3.0
Market Research Storytelling Enhancements
Version: 3.0.0-Market-Research Date: 2025-10-20 Focus: Transform academic research summaries into compelling market research client deliverables
Overview
This enhancement package transforms TranscriptorAI from a research tool into a professional market research deliverable system. The focus is on creating reports that tell compelling, data-driven stories for business clients rather than academic research summaries.
Key Philosophy Changes
BEFORE: Academic Research Style
- Research-focused language
- "Findings" and "Results"
- Data presented separately from interpretation
- Minimal human voice
- Generic recommendations
AFTER: Market Research Consulting Style
- Business-focused language with "So What?" orientation
- "Insights" and "Opportunities"
- Data woven into narrative with business implications
- Participant quotes bring findings to life
- Prioritized, actionable recommendations
Phase 1 Enhancements (COMPLETED)
1. Business-Focused Narrative Prompts
File Modified: story_writer.py
Lines: 10-100
What Changed:
- Rewrote LLM prompts to generate consulting-style reports
- Added "THE HEADLINE" format for executive impact
- Structured findings as: Data β Business Implication β Recommended Action
- Audience-specific context (executive, detailed, presentation styles)
- Active voice and present tense requirements
- Market-oriented section headers
Key Features:
STRUCTURE:
1. EXECUTIVE SUMMARY with "THE HEADLINE"
2. KEY TAKEAWAYS (finding β implication β action)
3. RESEARCH CONTEXT (brief methodology)
4. KEY INSIGHTS (3-5 main findings with implications)
5. MARKET OPPORTUNITIES & BARRIERS
6. PARTICIPANT PERSPECTIVES (consensus vs. divergence)
7. STRATEGIC RECOMMENDATIONS (prioritized by timeline)
Writing Style Requirements:
- β Lead with impact, not methodology
- β Active voice: "HCPs prefer..." not "It was found..."
- β Frame findings as opportunities/challenges
- β Connect insights to business decisions
- β Headers promise value: "What's Driving Switching Behavior"
- β Write for skimmers (key points in headers/first sentences)
Example Output:
# Executive Summary
**THE HEADLINE:** Prior authorization delays are creating a 6-month sales cycle gap
and pushing HCPs toward competitor products with faster approvals.
**KEY TAKEAWAYS:**
β’ Reimbursement Barrier: 10 of 12 HCPs (83%) cite prior authorization as their #1
prescribing barrier β Your sales team needs patient assistance resources during
the 4-6 week approval window β Launch patient bridge program (IMMEDIATE)
2. Visual Callout Boxes for PDFs
File Modified: narrative_report_generator.py
Lines: 19-255
What Added: Four new visual element types for professional PDF reports:
A) Key Stat Callouts
create_key_stat_callout(stat, description, context)
- Large, bold statistics (e.g., "12" or "67%")
- Colored borders (#3498db)
- Gray background for emphasis
- Perfect for highlighting participant counts, quality scores
B) Insight Boxes
create_insight_box(title, content, icon="π‘")
- Yellow background (#fff9e6) with orange accent line
- Icon + bold title
- Justified content text
- Great for key findings or "aha moments"
C) Quote Boxes
create_quote_box(quote, attribution="")
- Italicized quote text with smart quotes
- Light gray background (#f8f9fa)
- Blue accent line at top
- Attribution in smaller text, right-aligned
- Brings participant voice into reports
D) Recommendation Boxes
create_recommendation_box(priority, action, details)
- Color-coded priority labels:
- IMMEDIATE: Red (#e74c3c)
- HIGH: Orange (#e67e22)
- MEDIUM: Yellow (#f39c12)
- LOW: Gray (#95a5a6)
- Priority badge on left, action + details on right
- Clear visual hierarchy for prioritization
Enhanced PDF Title Page:
- Centered "Market Research Insights Report" title
- Subtitle with study type
- Key stats displayed prominently at top
- Professional, consulting-firm aesthetic
3. Quote Extraction System
File Created: quote_extractor.py
Lines: 1-373
A sophisticated system for finding and scoring impactful quotes from transcripts.
Core Function:
extract_verbatim_quotes(transcript_text, interviewee_type, min_length=30, max_length=200)
How It Works:
Step 1: Pattern Matching Extracts quotes using three patterns:
- Direct quotes with quotation marks:
"quote text" - Speaker-attributed:
Speaker 1: quote textorHCP: quote text - Narrative references:
As one HCP noted, "quote"
Step 2: Filtering Removes non-meaningful quotes:
- Administrative phrases ("thank you", "one moment")
- Greetings and pleasantries
- Too short (< 20 chars) or too long (> 200 chars)
- Insufficient substantive words
Step 3: Categorization Assigns theme to each quote:
For HCPs:
- prescribing, diagnosis, barriers, efficacy, safety
- patient_management, competitive
For Patients:
- symptoms, treatment, quality_of_life, side_effects
- emotional, healthcare_experience, effectiveness
Step 4: Impact Scoring (0.0 to 1.0)
Factors that increase score:
- β Optimal length (50-150 chars): +0.15
- β Emotional language: +0.1 per word (cap +0.2)
- β Contains numbers: +0.15
- β Concrete examples ("for example"): +0.15
- β Comparative language ("better than"): +0.1
- β Causal language ("because", "leads to"): +0.1
- β First-person perspective ("I", "my"): +0.1
Factors that decrease score:
- β Generic phrases ("it depends", "maybe"): -0.15
Step 5: Deduplication
- Uses first 10 words as "fingerprint"
- Removes near-duplicate quotes
- Keeps highest-impact version
Step 6: Organization
organize_quotes_by_theme(quotes)
Returns quotes organized by theme, sorted by impact score within each theme.
Key Functions:
extract_quotes_from_results()- Batch process all transcriptscategorize_quote()- Assign themescore_quote_impact()- Calculate storytelling valueget_top_quotes_summary()- Debug/review output
Example Quote Score:
Quote: "By the time insurance approves, the patient's cancer has often progressed
to the point where we need to consider more aggressive options."
Score: 0.85 (High Impact)
Factors:
- Length: 140 chars (optimal) β +0.15
- Emotional: "cancer", "aggressive" β +0.2
- Causal: "by the time... has progressed" β +0.1
- First-person: "we need" β +0.1
- Specific: medical terminology β +0.15
4. Quote Integration into Analysis Pipeline
File Modified: app.py
Lines: 12, 242-244, 255-261, 281-285, 308-323
What Changed:
A) Import quote extractor
from quote_extractor import extract_quotes_from_results
B) Extract quotes after transcript processing
# After valid_results are compiled
quotes_data = extract_quotes_from_results(valid_results, interviewee_type)
print(f"[Quotes] Extracted {len(quotes_data['all_quotes'])} quotes")
C) Add quotes to summary prompt
# Top 10 quotes added to LLM prompt
summary_prompt += f"""
TOP PARTICIPANT QUOTES (use these to bring findings to life):
1. [THEME] (from Transcript 1)
"Actual quote text..."
"""
D) Update analysis requirements
2. INTEGRATE PARTICIPANT VOICE:
- Weave in quotes from the "TOP PARTICIPANT QUOTES" section
- Use quotes to bring data to life and prove points
- Format as: "X out of Y mentioned [finding]. As one HCP described, '[quote]'"
- Include 3-5 quotes in your narrative
Result: Cross-transcript summaries now include participant voice, making findings more memorable and credible.
5. Quote Integration into Narrative Reports
File Modified: story_writer.py
Lines: 222-245
What Changed:
Function Signature Updated:
def generate_narrative(parsed_data, tables, style, llm_backend, quotes=None)
Quote Addition to Prompt: When quotes are provided, the function now appends:
TOP PARTICIPANT QUOTES TO INTEGRATE:
(Weave 4-6 of these quotes into your narrative to bring findings to life)
1. [THEME] (Impact: 0.85)
"Quote text..."
IMPORTANT: Integrate quotes naturally using phrases like:
- 'As one participant described...'
- 'One HCP/patient noted...'
- 'In the words of a participant...'
Result: Narrative reports now incorporate authentic participant voice throughout the document, not just in data tables.
Impact Summary
| Aspect | Before | After | Improvement |
|---|---|---|---|
| Report Style | Academic research | Management consulting | Client-ready deliverable |
| Language | "Findings", "Results" | "Insights", "Opportunities" | Business-oriented |
| Participant Voice | None (data only) | 5-8 quotes per report | Human element |
| Visual Appeal | Plain text + tables | Callouts, boxes, highlights | Professional polish |
| Actionability | Generic recommendations | Prioritized (IMMEDIATE/30d/90d) | Clear next steps |
| Skimmability | Linear narrative | Headers + callouts + bullets | Executive-friendly |
| Business Context | Minimal | Every finding β implication | Strategic value |
Usage Examples
Example 1: Running Analysis with Quote Extraction
# In app.py analyze() function
# Quotes are automatically extracted after transcript processing
progress(0.9, desc="Generating summary and reports...")
valid_results = [r for r in all_results if r["quality_score"] > 0]
# Extract quotes for storytelling
quotes_data = extract_quotes_from_results(valid_results, interviewee_type)
# Returns: {'all_quotes': [...], 'by_theme': {...}, 'top_quotes': [...]}
# Quotes are automatically integrated into:
# 1. Cross-transcript summary prompt
# 2. Narrative report generation (if using narrative report tab)
Example 2: Generating Narrative Report with Storytelling
# In narrative_report_generator.py
pdf_path, word_path, html_path = generate_narrative_report(
csv_path="report.csv",
summary_path="summary.txt",
interviewee_type="HCP",
report_style="executive", # or "detailed" or "presentation"
llm_backend="hf_api"
)
# Generates reports with:
# - Market research-focused narrative
# - Integrated participant quotes
# - Visual callout boxes for key stats
# - Prioritized recommendations with color coding
Example 3: Using Visual Elements Programmatically
from narrative_report_generator import (
create_key_stat_callout,
create_insight_box,
create_quote_box,
create_recommendation_box
)
# Add to PDF story list
story.append(create_key_stat_callout(
stat="12",
description="HCPs Interviewed",
context="In-depth qualitative research"
))
story.append(create_quote_box(
quote="By the time insurance approves, the disease has often progressed.",
attribution="Oncologist, Transcript 3"
))
story.append(create_recommendation_box(
priority="IMMEDIATE",
action="Launch patient bridge program",
details="Address the 4-6 week prior authorization gap identified by 83% of HCPs"
))
File Inventory
Modified Files
story_writer.py- Market research prompt engineeringnarrative_report_generator.py- Visual elements for PDFsapp.py- Quote extraction integration
New Files
quote_extractor.py- Quote extraction and scoring systemMARKET_RESEARCH_ENHANCEMENTS.md- This documentation
Unchanged (Still Used)
report_parser.py- CSV parsingtable_builder.py- Data table generationllm.py/llm_robust.py- LLM interfacevalidation.py- Data quality checksextractors.py,tagging.py,chunking.py- Transcript processing- All other supporting files
Report Style Guide
For Market Research Clients
DO: β Lead with "THE HEADLINE" - most important finding β Use active voice ("HCPs prefer" not "It was preferred") β Include percentages AND counts ("8 out of 12, 67%") β Weave in 5-8 impactful quotes β Connect every finding to business implication β Prioritize recommendations (IMMEDIATE vs. 30 days vs. 90 days) β Use section headers that promise value β Format for skimmers (key points visible quickly)
DON'T: β Use vague language ("many", "most", "some") β Present data without interpretation β Write academic-style "findings" sections β Give generic recommendations β Bury the lead in methodology β Use passive voice β Create walls of text without visual breaks
Testing & Validation
Recommended Test Cases
Small Dataset (3-5 transcripts)
- Verify quote extraction works
- Check that percentages are calculated correctly
- Ensure recommendations are prioritized
Medium Dataset (10-15 transcripts)
- Test consensus level categorization (80%, 60%, 40% thresholds)
- Verify quotes are deduplicated
- Check visual elements render correctly in PDF
Large Dataset (20+ transcripts)
- Ensure quote selection prioritizes impact scores
- Verify performance (quote extraction adds ~5-10 seconds)
- Check PDF file size remains reasonable
Different Interviewee Types
- HCP: Medical terminology, prescribing themes
- Patient: Symptoms, quality of life themes
- Other: General themes
Report Styles
- Executive: Concise, ROI-focused
- Detailed: Comprehensive analysis
- Presentation: Slide-ready format
Future Enhancement Opportunities
Phase 2 (Not Yet Implemented)
Visual Storytelling
- Patient/HCP journey maps
- Timeline visualizations
- Competitive positioning diagrams
- Opportunity sizing matrices
Advanced Quote Features
- Extract from original raw transcripts (not just analyzed text)
- Audio timestamp references (if audio available)
- Quote sentiment scoring
- Thematic quote clustering visualization
Interactive HTML Reports
- Expandable quote sections
- Filterable by theme
- Hover-over definitions for medical terms
- Embedded dashboards
Client Customization
- Industry-specific templates (pharma, medical device, payer)
- Competitor set customization
- Brand name replacement
- Custom color schemes
Multi-Language Support
- Quote translation preservation
- Cultural context notes
- Bilingual reports
Performance Considerations
Quote Extraction:
- Adds ~2-5 seconds per transcript
- Total impact: ~10-30 seconds for 10 transcripts
- Minimal memory overhead
PDF Generation:
- Visual elements add ~50-100KB per report
- No performance impact on generation time
- Slightly larger file sizes (10-20% increase)
LLM Token Usage:
- Quote integration adds ~500-1000 tokens to prompt
- Within acceptable limits for most models
- May need larger context window for 20+ transcripts
Troubleshooting
Issue: No quotes extracted
Cause: Transcript format doesn't match expected patterns
Solution: Check if transcripts have speaker labels or quotation marks. Adjust patterns in quote_extractor.py lines 38-61.
Issue: Low-impact quotes selected
Cause: Scoring weights need adjustment for your use case
Solution: Modify score_quote_impact() in quote_extractor.py lines 145-205 to emphasize different factors.
Issue: PDF visual elements not rendering
Cause: ReportLab version or missing imports
Solution: Verify KeepTogether import on line 11 of narrative_report_generator.py. Update ReportLab: pip install --upgrade reportlab
Issue: Narrative doesn't include quotes
Cause: LLM ignoring quote instructions
Solution: Increase temperature slightly (0.7 β 0.8) in story_writer.py line 93, or add more explicit examples in the prompt.
Backward Compatibility
β All changes are backward compatible
- Existing analysis pipeline unchanged
- Quote extraction is optional (graceful degradation if quotes unavailable)
- Visual elements fall back to plain text if rendering fails
- Legacy report formats still supported
Deployment Checklist
- All new files added to repository
- Dependencies documented (no new dependencies required)
- Backward compatibility verified
- Documentation complete
- User testing with sample client reports
- Performance benchmarking with large datasets
- A/B testing: academic style vs. market research style
Client Success Metrics
Track these to measure enhancement impact:
Report Readability
- Time to understand key findings (target: < 5 minutes)
- % of readers who reach recommendations section
Actionability
- Number of recommendations implemented by client
- Speed of decision-making post-report
Memorability
- Client recall of key findings after 1 week
- Quote usage in client's internal presentations
Business Value
- Client satisfaction scores
- Repeat business rate
- Referrals generated
Support & Maintenance
Primary Contact: Development Team Documentation: This file + inline code comments Version Control: See git history for detailed changes Feedback: Submit issues to project repository
END OF DOCUMENTATION
This enhancement package transforms research data into compelling business stories that drive client action.