# RLHF (Reinforcement Learning from Human Feedback) Features

## Overview

FinRyver now includes RLHF capabilities that allow the system to learn from human feedback and improve the quality of generated financial statements over time.

## Key Components

### 1. **Enhanced Workflows**
- RLHF-enhanced versions of all financial statement generation workflows
- Multiple candidate generation and selection using reward models
- Quality prediction and confidence scoring

### 2. **Feedback Collection System**
- Web-based review interface for human feedback
- Structured feedback forms with technical and quality metrics
- Storage and management of feedback data

### 3. **Reward Model**
- Machine learning model that predicts statement quality
- Trained on human feedback data
- Automatic retraining when sufficient new feedback is available

## Usage

### Basic Financial Statement Generation

**Standard workflow (existing functionality):**
```bash
curl -X POST "http://localhost:8000/notes" \
  -F "file=@trial_balance.xlsx"
```

**RLHF-enhanced workflow:**
```bash
curl -X POST "http://localhost:8000/notes?use_rlhf=true" \
  -F "file=@trial_balance.xlsx"
```

The RLHF-enhanced workflow will:
1. Generate multiple candidates (if reward model is trained)
2. Use the reward model to select the best candidate
3. Provide quality predictions and confidence scores
4. Store the result for potential human feedback

### Response Headers

When using RLHF workflows, additional metadata is included in response headers:
- `X-RLHF-Statement-ID`: Unique ID for the generated statement
- `X-RLHF-Quality-Score`: Predicted quality score (1-5)
- `X-RLHF-Confidence`: Model confidence in the prediction

### Feedback Collection

#### 1. Get Statements Needing Review
```bash
curl "http://localhost:8000/rlhf/pending-reviews"
```

#### 2. Review Interface
Visit: `http://localhost:8000/rlhf/review/{statement_id}`

This provides an HTML form for structured feedback collection.

#### 3. Submit Feedback Programmatically
```bash
curl -X POST "http://localhost:8000/rlhf/feedback" \
  -F "statement_id=123e4567-e89b-12d3-a456-426614174000" \
  -F "calculation_accuracy=4" \
  -F "account_classification=5" \
  -F "statement_balance=4" \
  -F "accounting_standards=4" \
  -F "regulatory_compliance=5" \
  -F "completeness=3" \
  -F "professional_presentation=4" \
  -F "would_accept_for_audit=true" \
  -F "specific_errors=Minor formatting issues" \
  -F "improvement_suggestions=Add more detailed notes"
```

### Monitoring and Statistics

#### Get Feedback Statistics
```bash
curl "http://localhost:8000/rlhf/stats"
```

Returns:
- Total feedback collected
- Average quality scores
- Audit approval rates
- Model training status
- Feature importance

#### Get Model Information
```bash
curl "http://localhost:8000/rlhf/model-info"
```

#### Manual Model Retraining
```bash
curl -X POST "http://localhost:8000/rlhf/retrain"
```

## Feedback Metrics

### Technical Accuracy (1-5 scale)
- **Calculation Accuracy**: Mathematical correctness
- **Account Classification**: Proper categorization of accounts
- **Statement Balance**: Internal consistency and reconciliation

### Compliance (1-5 scale)
- **Accounting Standards**: GAAP/IFRS compliance
- **Regulatory Compliance**: Meeting regulatory requirements

### Quality (1-5 scale)
- **Completeness**: All necessary items included
- **Professional Presentation**: Formatting and language quality

### Qualitative Feedback
- **Specific Errors**: Detailed error descriptions
- **Missing Items**: Items that should be included
- **Improvement Suggestions**: Recommendations for enhancement
- **Audit Acceptance**: Binary approval for professional use

## Training Process

1. **Initial Phase**: System operates with default models
2. **Feedback Collection**: Human experts review generated statements
3. **Model Training**: When 20+ feedback samples are available, reward model is trained
4. **Enhanced Generation**: RLHF workflows use trained model for better results
5. **Continuous Learning**: Model retrains automatically with new feedback

## Benefits

- **Quality Improvement**: Statements become more accurate over time
- **Domain Adaptation**: System learns specific requirements and preferences
- **Consistency**: Reduces variability in output quality
- **Professional Standards**: Aligns with human expert expectations

## Implementation Notes

- RLHF features are optional and backward-compatible
- Existing workflows continue to work unchanged
- Feedback data is stored locally and can be exported for analysis
- Models can be backed up and restored
- Multiple reward models can be maintained for different statement types

## File Structure

```
data/
├── feedback/
│   ├── human_feedback.json     # Collected feedback data
│   └── generated_statements.json  # Statement metadata
└── models/
    ├── reward_model.pkl        # Trained reward model
    ├── feature_names.json      # Model feature definitions
    └── model_stats.json        # Training statistics
```

## Security and Privacy

- Feedback data is stored locally
- No external transmission of financial data
- Anonymous feedback collection supported
- Data can be cleaned/anonymized before training