Spaces:
Sleeping
Sleeping
| # RLHF (Reinforcement Learning from Human Feedback) Features | |
| ## Overview | |
| FinRyver now includes RLHF capabilities that allow the system to learn from human feedback and improve the quality of generated financial statements over time. | |
| ## Key Components | |
| ### 1. **Enhanced Workflows** | |
| - RLHF-enhanced versions of all financial statement generation workflows | |
| - Multiple candidate generation and selection using reward models | |
| - Quality prediction and confidence scoring | |
| ### 2. **Feedback Collection System** | |
| - Web-based review interface for human feedback | |
| - Structured feedback forms with technical and quality metrics | |
| - Storage and management of feedback data | |
| ### 3. **Reward Model** | |
| - Machine learning model that predicts statement quality | |
| - Trained on human feedback data | |
| - Automatic retraining when sufficient new feedback is available | |
| ## Usage | |
| ### Basic Financial Statement Generation | |
| **Standard workflow (existing functionality):** | |
| ```bash | |
| curl -X POST "http://localhost:8000/notes" \ | |
| -F "file=@trial_balance.xlsx" | |
| ``` | |
| **RLHF-enhanced workflow:** | |
| ```bash | |
| curl -X POST "http://localhost:8000/notes?use_rlhf=true" \ | |
| -F "file=@trial_balance.xlsx" | |
| ``` | |
| The RLHF-enhanced workflow will: | |
| 1. Generate multiple candidates (if reward model is trained) | |
| 2. Use the reward model to select the best candidate | |
| 3. Provide quality predictions and confidence scores | |
| 4. Store the result for potential human feedback | |
| ### Response Headers | |
| When using RLHF workflows, additional metadata is included in response headers: | |
| - `X-RLHF-Statement-ID`: Unique ID for the generated statement | |
| - `X-RLHF-Quality-Score`: Predicted quality score (1-5) | |
| - `X-RLHF-Confidence`: Model confidence in the prediction | |
| ### Feedback Collection | |
| #### 1. Get Statements Needing Review | |
| ```bash | |
| curl "http://localhost:8000/rlhf/pending-reviews" | |
| ``` | |
| #### 2. Review Interface | |
| Visit: `http://localhost:8000/rlhf/review/{statement_id}` | |
| This provides an HTML form for structured feedback collection. | |
| #### 3. Submit Feedback Programmatically | |
| ```bash | |
| curl -X POST "http://localhost:8000/rlhf/feedback" \ | |
| -F "statement_id=123e4567-e89b-12d3-a456-426614174000" \ | |
| -F "calculation_accuracy=4" \ | |
| -F "account_classification=5" \ | |
| -F "statement_balance=4" \ | |
| -F "accounting_standards=4" \ | |
| -F "regulatory_compliance=5" \ | |
| -F "completeness=3" \ | |
| -F "professional_presentation=4" \ | |
| -F "would_accept_for_audit=true" \ | |
| -F "specific_errors=Minor formatting issues" \ | |
| -F "improvement_suggestions=Add more detailed notes" | |
| ``` | |
| ### Monitoring and Statistics | |
| #### Get Feedback Statistics | |
| ```bash | |
| curl "http://localhost:8000/rlhf/stats" | |
| ``` | |
| Returns: | |
| - Total feedback collected | |
| - Average quality scores | |
| - Audit approval rates | |
| - Model training status | |
| - Feature importance | |
| #### Get Model Information | |
| ```bash | |
| curl "http://localhost:8000/rlhf/model-info" | |
| ``` | |
| #### Manual Model Retraining | |
| ```bash | |
| curl -X POST "http://localhost:8000/rlhf/retrain" | |
| ``` | |
| ## Feedback Metrics | |
| ### Technical Accuracy (1-5 scale) | |
| - **Calculation Accuracy**: Mathematical correctness | |
| - **Account Classification**: Proper categorization of accounts | |
| - **Statement Balance**: Internal consistency and reconciliation | |
| ### Compliance (1-5 scale) | |
| - **Accounting Standards**: GAAP/IFRS compliance | |
| - **Regulatory Compliance**: Meeting regulatory requirements | |
| ### Quality (1-5 scale) | |
| - **Completeness**: All necessary items included | |
| - **Professional Presentation**: Formatting and language quality | |
| ### Qualitative Feedback | |
| - **Specific Errors**: Detailed error descriptions | |
| - **Missing Items**: Items that should be included | |
| - **Improvement Suggestions**: Recommendations for enhancement | |
| - **Audit Acceptance**: Binary approval for professional use | |
| ## Training Process | |
| 1. **Initial Phase**: System operates with default models | |
| 2. **Feedback Collection**: Human experts review generated statements | |
| 3. **Model Training**: When 20+ feedback samples are available, reward model is trained | |
| 4. **Enhanced Generation**: RLHF workflows use trained model for better results | |
| 5. **Continuous Learning**: Model retrains automatically with new feedback | |
| ## Benefits | |
| - **Quality Improvement**: Statements become more accurate over time | |
| - **Domain Adaptation**: System learns specific requirements and preferences | |
| - **Consistency**: Reduces variability in output quality | |
| - **Professional Standards**: Aligns with human expert expectations | |
| ## Implementation Notes | |
| - RLHF features are optional and backward-compatible | |
| - Existing workflows continue to work unchanged | |
| - Feedback data is stored locally and can be exported for analysis | |
| - Models can be backed up and restored | |
| - Multiple reward models can be maintained for different statement types | |
| ## File Structure | |
| ``` | |
| data/ | |
| βββ feedback/ | |
| β βββ human_feedback.json # Collected feedback data | |
| β βββ generated_statements.json # Statement metadata | |
| βββ models/ | |
| βββ reward_model.pkl # Trained reward model | |
| βββ feature_names.json # Model feature definitions | |
| βββ model_stats.json # Training statistics | |
| ``` | |
| ## Security and Privacy | |
| - Feedback data is stored locally | |
| - No external transmission of financial data | |
| - Anonymous feedback collection supported | |
| - Data can be cleaned/anonymized before training | |