Spaces:

point9
/

finryver-dev

Sleeping

App Files Files Community

finryver-dev / RLHF_GUIDE.md

Sahil Garg

initial RLHF applied

c172f37 4 months ago

preview code

raw

history blame contribute delete

5.25 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

RLHF (Reinforcement Learning from Human Feedback) Features

Overview

FinRyver now includes RLHF capabilities that allow the system to learn from human feedback and improve the quality of generated financial statements over time.

Key Components

1. Enhanced Workflows

RLHF-enhanced versions of all financial statement generation workflows
Multiple candidate generation and selection using reward models
Quality prediction and confidence scoring

2. Feedback Collection System

Web-based review interface for human feedback
Structured feedback forms with technical and quality metrics
Storage and management of feedback data

3. Reward Model

Machine learning model that predicts statement quality
Trained on human feedback data
Automatic retraining when sufficient new feedback is available

Usage

Basic Financial Statement Generation

Standard workflow (existing functionality):

curl -X POST "http://localhost:8000/notes" \
  -F "file=@trial_balance.xlsx"

RLHF-enhanced workflow:

curl -X POST "http://localhost:8000/notes?use_rlhf=true" \
  -F "file=@trial_balance.xlsx"

The RLHF-enhanced workflow will:

Generate multiple candidates (if reward model is trained)
Use the reward model to select the best candidate
Provide quality predictions and confidence scores
Store the result for potential human feedback

Response Headers

When using RLHF workflows, additional metadata is included in response headers:

X-RLHF-Statement-ID: Unique ID for the generated statement
X-RLHF-Quality-Score: Predicted quality score (1-5)
X-RLHF-Confidence: Model confidence in the prediction

Feedback Collection

1. Get Statements Needing Review

curl "http://localhost:8000/rlhf/pending-reviews"

2. Review Interface

Visit: http://localhost:8000/rlhf/review/{statement_id}

This provides an HTML form for structured feedback collection.

3. Submit Feedback Programmatically

curl -X POST "http://localhost:8000/rlhf/feedback" \
  -F "statement_id=123e4567-e89b-12d3-a456-426614174000" \
  -F "calculation_accuracy=4" \
  -F "account_classification=5" \
  -F "statement_balance=4" \
  -F "accounting_standards=4" \
  -F "regulatory_compliance=5" \
  -F "completeness=3" \
  -F "professional_presentation=4" \
  -F "would_accept_for_audit=true" \
  -F "specific_errors=Minor formatting issues" \
  -F "improvement_suggestions=Add more detailed notes"

Monitoring and Statistics

Get Feedback Statistics

curl "http://localhost:8000/rlhf/stats"

Returns:

Total feedback collected
Average quality scores
Audit approval rates
Model training status
Feature importance

Get Model Information

curl "http://localhost:8000/rlhf/model-info"

Manual Model Retraining

curl -X POST "http://localhost:8000/rlhf/retrain"

Feedback Metrics

Technical Accuracy (1-5 scale)

Calculation Accuracy: Mathematical correctness
Account Classification: Proper categorization of accounts
Statement Balance: Internal consistency and reconciliation

Compliance (1-5 scale)

Accounting Standards: GAAP/IFRS compliance
Regulatory Compliance: Meeting regulatory requirements

Quality (1-5 scale)

Completeness: All necessary items included
Professional Presentation: Formatting and language quality

Qualitative Feedback

Specific Errors: Detailed error descriptions
Missing Items: Items that should be included
Improvement Suggestions: Recommendations for enhancement
Audit Acceptance: Binary approval for professional use

Training Process

Initial Phase: System operates with default models
Feedback Collection: Human experts review generated statements
Model Training: When 20+ feedback samples are available, reward model is trained
Enhanced Generation: RLHF workflows use trained model for better results
Continuous Learning: Model retrains automatically with new feedback

Benefits

Quality Improvement: Statements become more accurate over time
Domain Adaptation: System learns specific requirements and preferences
Consistency: Reduces variability in output quality
Professional Standards: Aligns with human expert expectations

Implementation Notes

RLHF features are optional and backward-compatible
Existing workflows continue to work unchanged
Feedback data is stored locally and can be exported for analysis
Models can be backed up and restored
Multiple reward models can be maintained for different statement types

File Structure

data/
├── feedback/
│   ├── human_feedback.json     # Collected feedback data
│   └── generated_statements.json  # Statement metadata
└── models/
    ├── reward_model.pkl        # Trained reward model
    ├── feature_names.json      # Model feature definitions
    └── model_stats.json        # Training statistics

Security and Privacy

Feedback data is stored locally
No external transmission of financial data
Anonymous feedback collection supported
Data can be cleaned/anonymized before training