Spaces:
Sleeping
Sleeping
| # π― New Features Added to Active Reading Demo | |
| ## π **Category Selection Feature** | |
| ### What It Does | |
| Users can now manually select or override the document category detection: | |
| **Available Categories:** | |
| - **Auto-Detect** (default) - AI detects domain automatically | |
| - **Finance** - Financial reports, earnings, budgets | |
| - **Legal** - Contracts, agreements, policies | |
| - **Technical** - API docs, manuals, specifications | |
| - **Medical** - Clinical trials, research, treatments | |
| - **General** - Any other document type | |
| ### Category-Specific Extraction Patterns | |
| #### π Finance Category | |
| - **Revenue**: `$150 million revenue`, `sales of $2.5B` | |
| - **Profit**: `profit margin 25%`, `net profit $50M` | |
| - **Growth**: `15% growth`, `increased by 20%` | |
| - **Dates**: `Q3 2024`, `fiscal year 2023` | |
| - **Employees**: `hire 200 engineers`, `workforce of 5000` | |
| - **Market Cap**: `market cap $10B` | |
| #### βοΈ Legal Category | |
| - **Parties**: `between Company A and Company B` | |
| - **Terms**: `term of 36 months`, `duration 3 years` | |
| - **Liability**: `liability not to exceed $1M` | |
| - **Termination**: `90 days written notice` | |
| - **Governing Law**: `governed by laws of Delaware` | |
| - **Effective Date**: `effective January 1, 2024` | |
| #### π§ Technical Category | |
| - **API Endpoints**: `GET /api/users`, `POST /auth/login` | |
| - **Versions**: `version 2.1.0`, `v3.5` | |
| - **Response Time**: `response time 150ms` | |
| - **Rate Limits**: `1000 requests per minute` | |
| - **Authentication**: `OAuth 2.0`, `JWT tokens` | |
| - **Status Codes**: `HTTP 200`, `status code 404` | |
| #### π₯ Medical Category | |
| - **Dosage**: `50mg daily`, `100ml twice daily` | |
| - **Duration**: `treatment for 12 weeks` | |
| - **Efficacy**: `85% efficacy rate` | |
| - **Side Effects**: `side effects in 12% of patients` | |
| - **Patient Count**: `500 patients enrolled` | |
| - **P-Values**: `p<0.001`, `p=0.025` | |
| ## π **Custom Keys Feature** | |
| ### What It Does | |
| Users can specify their own extraction terms as comma-separated values: | |
| **Example Inputs:** | |
| ``` | |
| CEO, budget, deadline, timeline | |
| risk assessment, compliance, audit | |
| performance, scalability, security | |
| treatment, dosage, clinical trial | |
| ``` | |
| ### How It Works | |
| - **Smart Extraction**: Finds sentences containing the custom terms | |
| - **Context Preservation**: Returns full sentences, not just keywords | |
| - **Confidence Scoring**: Shows extraction confidence levels | |
| - **JSON Output**: Structured data for easy integration | |
| ## π― **New Strategy: Category-Specific Extraction** | |
| ### What's New | |
| Added a specialized strategy that combines: | |
| 1. **Category-specific patterns** for targeted extraction | |
| 2. **Custom key extraction** for user-defined terms | |
| 3. **Structured output** with confidence scores | |
| 4. **Domain expertise** for each business category | |
| ### Example Output | |
| ```json | |
| { | |
| "category": "Finance", | |
| "extracted_data": { | |
| "revenue": ["$150 million", "$2.5 billion sales"], | |
| "growth": ["15% increase", "20% growth rate"], | |
| "date": ["Q3 2024", "fiscal year 2023"] | |
| }, | |
| "custom_extractions": { | |
| "CEO": ["CEO announced plans to expand", "CEO John Smith reported"], | |
| "investment": ["$50M investment in AI", "investment in new markets"] | |
| }, | |
| "confidence_scores": { | |
| "revenue": 8.5, | |
| "custom_CEO": 6.2 | |
| } | |
| } | |
| ``` | |
| ## π¨ **Enhanced UI Elements** | |
| ### New Input Controls | |
| - **π Category Dropdown**: Manual category selection | |
| - **π Custom Keys Input**: Text field for custom extraction terms | |
| - **π Enhanced Strategy Selection**: Added "Category-Specific Extraction" | |
| ### New Output Tabs | |
| - **π― Category Analysis**: Dedicated tab for category-specific results | |
| - **Enhanced JSON**: Structured category extraction data | |
| - **Confidence Scores**: Shows extraction reliability | |
| ### Improved User Experience | |
| - **Dynamic Help Text**: Context-aware guidance | |
| - **Example Suggestions**: Sample custom keys for each category | |
| - **Better Visual Organization**: Clearer result presentation | |
| ## π **Usage Examples** | |
| ### Finance Document Analysis | |
| ``` | |
| Document Category: Finance | |
| Custom Keys: CEO, quarterly results, investment | |
| Strategy: Category-Specific Extraction | |
| ``` | |
| **Result**: Extracts revenue figures, profit margins, growth rates PLUS CEO mentions, quarterly data, and investment information. | |
| ### Legal Contract Review | |
| ``` | |
| Document Category: Legal | |
| Custom Keys: liability, termination, governing law | |
| Strategy: Category-Specific Extraction | |
| ``` | |
| **Result**: Finds contract parties, terms, dates PLUS specific liability clauses, termination conditions, and jurisdiction details. | |
| ### Technical Documentation | |
| ``` | |
| Document Category: Technical | |
| Custom Keys: security, performance, scalability | |
| Strategy: Category-Specific Extraction | |
| ``` | |
| **Result**: Extracts API endpoints, versions, rate limits PLUS security features, performance metrics, and scalability considerations. | |
| ## π― **Why This Makes Active Reading Better** | |
| ### 1. **Adaptive Intelligence** | |
| - AI now adapts not just to document type, but to user-specific needs | |
| - Combines automated domain detection with custom requirements | |
| ### 2. **Enterprise Flexibility** | |
| - Users can extract exactly what they need for their business case | |
| - Supports diverse enterprise document analysis workflows | |
| ### 3. **Structured Output** | |
| - Category-specific patterns ensure consistent extraction | |
| - Custom keys add user-defined flexibility | |
| - JSON format enables easy integration | |
| ### 4. **Demonstrable Value** | |
| - Shows how Active Reading adapts to different business domains | |
| - Proves the framework can handle real enterprise requirements | |
| - Highlights the superiority over one-size-fits-all approaches | |
| ## π¨ **Implementation Impact** | |
| ### What Changed in Code | |
| - **Added**: `extract_category_specific_info()` method | |
| - **Enhanced**: `process_document()` function with category/custom key parameters | |
| - **New**: Category-specific regex patterns for each domain | |
| - **Improved**: UI with additional input controls and output tabs | |
| ### Backward Compatibility | |
| - β All existing strategies still work | |
| - β Auto-detection remains the default | |
| - β Original demo functionality preserved | |
| - β Enhanced with new capabilities | |
| This makes your Active Reading demo much more interactive and showcases the adaptive intelligence that makes it superior to traditional document processing approaches! π | |