Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
RLHF (Reinforcement Learning from Human Feedback) Features
Overview
FinRyver now includes RLHF capabilities that allow the system to learn from human feedback and improve the quality of generated financial statements over time.
Key Components
1. Enhanced Workflows
- RLHF-enhanced versions of all financial statement generation workflows
- Multiple candidate generation and selection using reward models
- Quality prediction and confidence scoring
2. Feedback Collection System
- Web-based review interface for human feedback
- Structured feedback forms with technical and quality metrics
- Storage and management of feedback data
3. Reward Model
- Machine learning model that predicts statement quality
- Trained on human feedback data
- Automatic retraining when sufficient new feedback is available
Usage
Basic Financial Statement Generation
Standard workflow (existing functionality):
curl -X POST "http://localhost:8000/notes" \
-F "file=@trial_balance.xlsx"
RLHF-enhanced workflow:
curl -X POST "http://localhost:8000/notes?use_rlhf=true" \
-F "file=@trial_balance.xlsx"
The RLHF-enhanced workflow will:
- Generate multiple candidates (if reward model is trained)
- Use the reward model to select the best candidate
- Provide quality predictions and confidence scores
- Store the result for potential human feedback
Response Headers
When using RLHF workflows, additional metadata is included in response headers:
X-RLHF-Statement-ID: Unique ID for the generated statementX-RLHF-Quality-Score: Predicted quality score (1-5)X-RLHF-Confidence: Model confidence in the prediction
Feedback Collection
1. Get Statements Needing Review
curl "http://localhost:8000/rlhf/pending-reviews"
2. Review Interface
Visit: http://localhost:8000/rlhf/review/{statement_id}
This provides an HTML form for structured feedback collection.
3. Submit Feedback Programmatically
curl -X POST "http://localhost:8000/rlhf/feedback" \
-F "statement_id=123e4567-e89b-12d3-a456-426614174000" \
-F "calculation_accuracy=4" \
-F "account_classification=5" \
-F "statement_balance=4" \
-F "accounting_standards=4" \
-F "regulatory_compliance=5" \
-F "completeness=3" \
-F "professional_presentation=4" \
-F "would_accept_for_audit=true" \
-F "specific_errors=Minor formatting issues" \
-F "improvement_suggestions=Add more detailed notes"
Monitoring and Statistics
Get Feedback Statistics
curl "http://localhost:8000/rlhf/stats"
Returns:
- Total feedback collected
- Average quality scores
- Audit approval rates
- Model training status
- Feature importance
Get Model Information
curl "http://localhost:8000/rlhf/model-info"
Manual Model Retraining
curl -X POST "http://localhost:8000/rlhf/retrain"
Feedback Metrics
Technical Accuracy (1-5 scale)
- Calculation Accuracy: Mathematical correctness
- Account Classification: Proper categorization of accounts
- Statement Balance: Internal consistency and reconciliation
Compliance (1-5 scale)
- Accounting Standards: GAAP/IFRS compliance
- Regulatory Compliance: Meeting regulatory requirements
Quality (1-5 scale)
- Completeness: All necessary items included
- Professional Presentation: Formatting and language quality
Qualitative Feedback
- Specific Errors: Detailed error descriptions
- Missing Items: Items that should be included
- Improvement Suggestions: Recommendations for enhancement
- Audit Acceptance: Binary approval for professional use
Training Process
- Initial Phase: System operates with default models
- Feedback Collection: Human experts review generated statements
- Model Training: When 20+ feedback samples are available, reward model is trained
- Enhanced Generation: RLHF workflows use trained model for better results
- Continuous Learning: Model retrains automatically with new feedback
Benefits
- Quality Improvement: Statements become more accurate over time
- Domain Adaptation: System learns specific requirements and preferences
- Consistency: Reduces variability in output quality
- Professional Standards: Aligns with human expert expectations
Implementation Notes
- RLHF features are optional and backward-compatible
- Existing workflows continue to work unchanged
- Feedback data is stored locally and can be exported for analysis
- Models can be backed up and restored
- Multiple reward models can be maintained for different statement types
File Structure
data/
βββ feedback/
β βββ human_feedback.json # Collected feedback data
β βββ generated_statements.json # Statement metadata
βββ models/
βββ reward_model.pkl # Trained reward model
βββ feature_names.json # Model feature definitions
βββ model_stats.json # Training statistics
Security and Privacy
- Feedback data is stored locally
- No external transmission of financial data
- Anonymous feedback collection supported
- Data can be cleaned/anonymized before training