Spaces:
Sleeping
Sleeping
File size: 5,251 Bytes
c172f37 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# RLHF (Reinforcement Learning from Human Feedback) Features
## Overview
FinRyver now includes RLHF capabilities that allow the system to learn from human feedback and improve the quality of generated financial statements over time.
## Key Components
### 1. **Enhanced Workflows**
- RLHF-enhanced versions of all financial statement generation workflows
- Multiple candidate generation and selection using reward models
- Quality prediction and confidence scoring
### 2. **Feedback Collection System**
- Web-based review interface for human feedback
- Structured feedback forms with technical and quality metrics
- Storage and management of feedback data
### 3. **Reward Model**
- Machine learning model that predicts statement quality
- Trained on human feedback data
- Automatic retraining when sufficient new feedback is available
## Usage
### Basic Financial Statement Generation
**Standard workflow (existing functionality):**
```bash
curl -X POST "http://localhost:8000/notes" \
-F "file=@trial_balance.xlsx"
```
**RLHF-enhanced workflow:**
```bash
curl -X POST "http://localhost:8000/notes?use_rlhf=true" \
-F "file=@trial_balance.xlsx"
```
The RLHF-enhanced workflow will:
1. Generate multiple candidates (if reward model is trained)
2. Use the reward model to select the best candidate
3. Provide quality predictions and confidence scores
4. Store the result for potential human feedback
### Response Headers
When using RLHF workflows, additional metadata is included in response headers:
- `X-RLHF-Statement-ID`: Unique ID for the generated statement
- `X-RLHF-Quality-Score`: Predicted quality score (1-5)
- `X-RLHF-Confidence`: Model confidence in the prediction
### Feedback Collection
#### 1. Get Statements Needing Review
```bash
curl "http://localhost:8000/rlhf/pending-reviews"
```
#### 2. Review Interface
Visit: `http://localhost:8000/rlhf/review/{statement_id}`
This provides an HTML form for structured feedback collection.
#### 3. Submit Feedback Programmatically
```bash
curl -X POST "http://localhost:8000/rlhf/feedback" \
-F "statement_id=123e4567-e89b-12d3-a456-426614174000" \
-F "calculation_accuracy=4" \
-F "account_classification=5" \
-F "statement_balance=4" \
-F "accounting_standards=4" \
-F "regulatory_compliance=5" \
-F "completeness=3" \
-F "professional_presentation=4" \
-F "would_accept_for_audit=true" \
-F "specific_errors=Minor formatting issues" \
-F "improvement_suggestions=Add more detailed notes"
```
### Monitoring and Statistics
#### Get Feedback Statistics
```bash
curl "http://localhost:8000/rlhf/stats"
```
Returns:
- Total feedback collected
- Average quality scores
- Audit approval rates
- Model training status
- Feature importance
#### Get Model Information
```bash
curl "http://localhost:8000/rlhf/model-info"
```
#### Manual Model Retraining
```bash
curl -X POST "http://localhost:8000/rlhf/retrain"
```
## Feedback Metrics
### Technical Accuracy (1-5 scale)
- **Calculation Accuracy**: Mathematical correctness
- **Account Classification**: Proper categorization of accounts
- **Statement Balance**: Internal consistency and reconciliation
### Compliance (1-5 scale)
- **Accounting Standards**: GAAP/IFRS compliance
- **Regulatory Compliance**: Meeting regulatory requirements
### Quality (1-5 scale)
- **Completeness**: All necessary items included
- **Professional Presentation**: Formatting and language quality
### Qualitative Feedback
- **Specific Errors**: Detailed error descriptions
- **Missing Items**: Items that should be included
- **Improvement Suggestions**: Recommendations for enhancement
- **Audit Acceptance**: Binary approval for professional use
## Training Process
1. **Initial Phase**: System operates with default models
2. **Feedback Collection**: Human experts review generated statements
3. **Model Training**: When 20+ feedback samples are available, reward model is trained
4. **Enhanced Generation**: RLHF workflows use trained model for better results
5. **Continuous Learning**: Model retrains automatically with new feedback
## Benefits
- **Quality Improvement**: Statements become more accurate over time
- **Domain Adaptation**: System learns specific requirements and preferences
- **Consistency**: Reduces variability in output quality
- **Professional Standards**: Aligns with human expert expectations
## Implementation Notes
- RLHF features are optional and backward-compatible
- Existing workflows continue to work unchanged
- Feedback data is stored locally and can be exported for analysis
- Models can be backed up and restored
- Multiple reward models can be maintained for different statement types
## File Structure
```
data/
βββ feedback/
β βββ human_feedback.json # Collected feedback data
β βββ generated_statements.json # Statement metadata
βββ models/
βββ reward_model.pkl # Trained reward model
βββ feature_names.json # Model feature definitions
βββ model_stats.json # Training statistics
```
## Security and Privacy
- Feedback data is stored locally
- No external transmission of financial data
- Anonymous feedback collection supported
- Data can be cleaned/anonymized before training
|