Spaces:

point9
/

finryver-dev

Runtime error

App Files Files Community

Sahil Garg commited on Aug 23, 2025

Commit

c172f37

1 Parent(s): 41cb3f5

initial RLHF applied

Browse files

Files changed (8) hide show

.gitignore +8 -1
RLHF_GUIDE.md +167 -0
agents/feedback_manager.py +248 -0
agents/reward_model.py +307 -0
agents/rlhf_routes.py +352 -0
agents/rlhf_workflows.py +267 -0
app.py +80 -14
requirements.txt +7 -0

.gitignore CHANGED Viewed

@@ -20,4 +20,11 @@ app/__pycache__/
 pnlbs/__pycache__/
 AGENT_GUIDE.md
 docker-compose.dev.yml
-file_cleanup.py

 pnlbs/__pycache__/
 AGENT_GUIDE.md
 docker-compose.dev.yml
+file_cleanup.py
+agents/langgraph_routes.py
+# RLHF related data
+data/feedback/
+data/models/
+*.pkl
+*.joblib

RLHF_GUIDE.md ADDED Viewed

	@@ -0,0 +1,167 @@

+# RLHF (Reinforcement Learning from Human Feedback) Features
+## Overview
+FinRyver now includes RLHF capabilities that allow the system to learn from human feedback and improve the quality of generated financial statements over time.
+## Key Components
+### 1. **Enhanced Workflows**
+- RLHF-enhanced versions of all financial statement generation workflows
+- Multiple candidate generation and selection using reward models
+- Quality prediction and confidence scoring
+### 2. **Feedback Collection System**
+- Web-based review interface for human feedback
+- Structured feedback forms with technical and quality metrics
+- Storage and management of feedback data
+### 3. **Reward Model**
+- Machine learning model that predicts statement quality
+- Trained on human feedback data
+- Automatic retraining when sufficient new feedback is available
+## Usage
+### Basic Financial Statement Generation
+**Standard workflow (existing functionality):**
+```bash
+curl -X POST "http://localhost:8000/notes" \
+  -F "file=@trial_balance.xlsx"
+```
+**RLHF-enhanced workflow:**
+```bash
+curl -X POST "http://localhost:8000/notes?use_rlhf=true" \
+  -F "file=@trial_balance.xlsx"
+```
+The RLHF-enhanced workflow will:
+1. Generate multiple candidates (if reward model is trained)
+2. Use the reward model to select the best candidate
+3. Provide quality predictions and confidence scores
+4. Store the result for potential human feedback
+### Response Headers
+When using RLHF workflows, additional metadata is included in response headers:
+- `X-RLHF-Statement-ID`: Unique ID for the generated statement
+- `X-RLHF-Quality-Score`: Predicted quality score (1-5)
+- `X-RLHF-Confidence`: Model confidence in the prediction
+### Feedback Collection
+#### 1. Get Statements Needing Review
+```bash
+curl "http://localhost:8000/rlhf/pending-reviews"
+```
+#### 2. Review Interface
+Visit: `http://localhost:8000/rlhf/review/{statement_id}`
+This provides an HTML form for structured feedback collection.
+#### 3. Submit Feedback Programmatically
+```bash
+curl -X POST "http://localhost:8000/rlhf/feedback" \
+  -F "statement_id=123e4567-e89b-12d3-a456-426614174000" \
+  -F "calculation_accuracy=4" \
+  -F "account_classification=5" \
+  -F "statement_balance=4" \
+  -F "accounting_standards=4" \
+  -F "regulatory_compliance=5" \
+  -F "completeness=3" \
+  -F "professional_presentation=4" \
+  -F "would_accept_for_audit=true" \
+  -F "specific_errors=Minor formatting issues" \
+  -F "improvement_suggestions=Add more detailed notes"
+```
+### Monitoring and Statistics
+#### Get Feedback Statistics
+```bash
+curl "http://localhost:8000/rlhf/stats"
+```
+Returns:
+- Total feedback collected
+- Average quality scores
+- Audit approval rates
+- Model training status
+- Feature importance
+#### Get Model Information
+```bash
+curl "http://localhost:8000/rlhf/model-info"
+```
+#### Manual Model Retraining
+```bash
+curl -X POST "http://localhost:8000/rlhf/retrain"
+```
+## Feedback Metrics
+### Technical Accuracy (1-5 scale)
+- **Calculation Accuracy**: Mathematical correctness
+- **Account Classification**: Proper categorization of accounts
+- **Statement Balance**: Internal consistency and reconciliation
+### Compliance (1-5 scale)
+- **Accounting Standards**: GAAP/IFRS compliance
+- **Regulatory Compliance**: Meeting regulatory requirements
+### Quality (1-5 scale)
+- **Completeness**: All necessary items included
+- **Professional Presentation**: Formatting and language quality
+### Qualitative Feedback
+- **Specific Errors**: Detailed error descriptions
+- **Missing Items**: Items that should be included
+- **Improvement Suggestions**: Recommendations for enhancement
+- **Audit Acceptance**: Binary approval for professional use
+## Training Process
+1. **Initial Phase**: System operates with default models
+2. **Feedback Collection**: Human experts review generated statements
+3. **Model Training**: When 20+ feedback samples are available, reward model is trained
+4. **Enhanced Generation**: RLHF workflows use trained model for better results
+5. **Continuous Learning**: Model retrains automatically with new feedback
+## Benefits
+- **Quality Improvement**: Statements become more accurate over time
+- **Domain Adaptation**: System learns specific requirements and preferences
+- **Consistency**: Reduces variability in output quality
+- **Professional Standards**: Aligns with human expert expectations
+## Implementation Notes
+- RLHF features are optional and backward-compatible
+- Existing workflows continue to work unchanged
+- Feedback data is stored locally and can be exported for analysis
+- Models can be backed up and restored
+- Multiple reward models can be maintained for different statement types
+## File Structure
+```
+data/
+├── feedback/
+│   ├── human_feedback.json     # Collected feedback data
+│   └── generated_statements.json  # Statement metadata
+└── models/
+    ├── reward_model.pkl        # Trained reward model
+    ├── feature_names.json      # Model feature definitions
+    └── model_stats.json        # Training statistics
+```
+## Security and Privacy
+- Feedback data is stored locally
+- No external transmission of financial data
+- Anonymous feedback collection supported
+- Data can be cleaned/anonymized before training

agents/feedback_manager.py ADDED Viewed

	@@ -0,0 +1,248 @@

+"""
+RLHF Feedback Management System for FinRyver
+Handles collection, storage, and management of human feedback on financial statements
+"""
+import json
+import os
+import time
+import uuid
+from typing import Dict, Any, List, Optional
+import logging
+logger = logging.getLogger(__name__)
+class FeedbackManager:
+    """Manages human feedback collection for RLHF training"""
+    def __init__(self, feedback_dir: str = "data/feedback"):
+        self.feedback_dir = feedback_dir
+        self.feedback_db = os.path.join(feedback_dir, "human_feedback.json")
+        self.statements_db = os.path.join(feedback_dir, "generated_statements.json")
+        os.makedirs(feedback_dir, exist_ok=True)
+    def store_generated_statement(self, statement_data: Dict[str, Any]) -> str:
+        """Store generated statement for later feedback collection"""
+        statement_id = str(uuid.uuid4())
+        statement_record = {
+            "statement_id": statement_id,
+            "timestamp": time.time(),
+            "statement_type": statement_data.get("type", "unknown"),
+            "file_path": statement_data.get("file_path"),
+            "output_path": statement_data.get("output_path"),
+            "generation_time": statement_data.get("generation_time", 0),
+            "metadata": statement_data.get("metadata", {})
+        }
+        # Load existing statements
+        statements = self._load_statements()
+        statements.append(statement_record)
+        # Save updated statements
+        with open(self.statements_db, "w") as f:
+            json.dump(statements, f, indent=2)
+        logger.info(f"Stored statement {statement_id} for feedback collection")
+        return statement_id
+    def store_feedback(self, feedback: Dict[str, Any]) -> str:
+        """Store human feedback for RLHF training"""
+        feedback_id = str(uuid.uuid4())
+        feedback_record = {
+            "feedback_id": feedback_id,
+            "statement_id": feedback.get("statement_id"),
+            "timestamp": time.time(),
+            "reviewer_id": feedback.get("reviewer_id", "anonymous"),
+            # Technical accuracy metrics
+            "calculation_accuracy": feedback.get("calculation_accuracy"),
+            "account_classification": feedback.get("account_classification"),
+            "statement_balance": feedback.get("statement_balance"),
+            # Compliance metrics
+            "accounting_standards": feedback.get("accounting_standards"),
+            "regulatory_compliance": feedback.get("regulatory_compliance"),
+            # Quality metrics
+            "completeness": feedback.get("completeness"),
+            "professional_presentation": feedback.get("professional_presentation"),
+            # Overall quality score (computed)
+            "overall_score": self._compute_overall_score(feedback),
+            # Qualitative feedback
+            "specific_errors": feedback.get("specific_errors", ""),
+            "missing_items": feedback.get("missing_items", ""),
+            "improvement_suggestions": feedback.get("improvement_suggestions", ""),
+            "would_accept_for_audit": feedback.get("would_accept_for_audit", False),
+            # Additional context
+            "statement_type": feedback.get("statement_type"),
+            "complexity_level": feedback.get("complexity_level", "medium")
+        }
+        # Load existing feedback
+        all_feedback = self._load_feedback()
+        all_feedback.append(feedback_record)
+        # Save updated feedback
+        with open(self.feedback_db, "w") as f:
+            json.dump(all_feedback, f, indent=2)
+        logger.info(f"Stored feedback {feedback_id} for statement {feedback.get('statement_id')}")
+        return feedback_id
+    def get_training_data(self, min_feedback_count: int = 2) -> List[Dict[str, Any]]:
+        """Get feedback data suitable for RLHF training"""
+        feedback_data = self._load_feedback()
+        if len(feedback_data) < min_feedback_count:
+            logger.warning(f"Only {len(feedback_data)} feedback samples available, need at least {min_feedback_count}")
+            return []
+        # Filter and prepare training data
+        training_data = []
+        for feedback in feedback_data:
+            if feedback.get("overall_score") is not None:
+                training_sample = {
+                    "statement_id": feedback["statement_id"],
+                    "statement_type": feedback["statement_type"],
+                    "reward_score": feedback["overall_score"],
+                    "binary_approval": feedback["would_accept_for_audit"],
+                    "technical_metrics": {
+                        "calculation_accuracy": feedback.get("calculation_accuracy"),
+                        "account_classification": feedback.get("account_classification"),
+                        "statement_balance": feedback.get("statement_balance")
+                    },
+                    "quality_metrics": {
+                        "completeness": feedback.get("completeness"),
+                        "professional_presentation": feedback.get("professional_presentation"),
+                        "accounting_standards": feedback.get("accounting_standards")
+                    },
+                    "feedback_text": {
+                        "errors": feedback.get("specific_errors", ""),
+                        "missing": feedback.get("missing_items", ""),
+                        "suggestions": feedback.get("improvement_suggestions", "")
+                    }
+                }
+                training_data.append(training_sample)
+        return training_data
+    def get_statement_for_review(self, statement_id: str) -> Optional[Dict[str, Any]]:
+        """Get statement data for human review"""
+        statements = self._load_statements()
+        for statement in statements:
+            if statement["statement_id"] == statement_id:
+                return statement
+        return None
+    def get_pending_reviews(self, limit: int = 10) -> List[Dict[str, Any]]:
+        """Get statements that need human review"""
+        statements = self._load_statements()
+        feedback_data = self._load_feedback()
+        # Get statement IDs that already have feedback
+        reviewed_ids = {fb["statement_id"] for fb in feedback_data}
+        # Return statements without feedback
+        pending = [s for s in statements if s["statement_id"] not in reviewed_ids]
+        return pending[-limit:]  # Return most recent
+    def get_feedback_stats(self) -> Dict[str, Any]:
+        """Get statistics about collected feedback"""
+        feedback_data = self._load_feedback()
+        statements = self._load_statements()
+        if not feedback_data:
+            return {"total_feedback": 0, "total_statements": len(statements)}
+        # Calculate statistics
+        scores = [fb["overall_score"] for fb in feedback_data if fb.get("overall_score")]
+        audit_approvals = [fb["would_accept_for_audit"] for fb in feedback_data]
+        stats = {
+            "total_feedback": len(feedback_data),
+            "total_statements": len(statements),
+            "avg_overall_score": sum(scores) / len(scores) if scores else 0,
+            "audit_approval_rate": sum(audit_approvals) / len(audit_approvals) if audit_approvals else 0,
+            "feedback_by_type": {},
+            "recent_trend": self._calculate_trend()
+        }
+        # Group by statement type
+        for fb in feedback_data:
+            stmt_type = fb.get("statement_type", "unknown")
+            if stmt_type not in stats["feedback_by_type"]:
+                stats["feedback_by_type"][stmt_type] = {"count": 0, "avg_score": 0}
+            stats["feedback_by_type"][stmt_type]["count"] += 1
+        return stats
+    def _load_feedback(self) -> List[Dict[str, Any]]:
+        """Load feedback from storage"""
+        if os.path.exists(self.feedback_db):
+            try:
+                with open(self.feedback_db, "r") as f:
+                    return json.load(f)
+            except (json.JSONDecodeError, FileNotFoundError):
+                logger.warning("Could not load feedback database, starting fresh")
+        return []
+    def _load_statements(self) -> List[Dict[str, Any]]:
+        """Load statements from storage"""
+        if os.path.exists(self.statements_db):
+            try:
+                with open(self.statements_db, "r") as f:
+                    return json.load(f)
+            except (json.JSONDecodeError, FileNotFoundError):
+                logger.warning("Could not load statements database, starting fresh")
+        return []
+    def _compute_overall_score(self, feedback: Dict[str, Any]) -> float:
+        """Compute overall quality score from individual metrics"""
+        metrics = [
+            feedback.get("calculation_accuracy"),
+            feedback.get("account_classification"),
+            feedback.get("statement_balance"),
+            feedback.get("accounting_standards"),
+            feedback.get("regulatory_compliance"),
+            feedback.get("completeness"),
+            feedback.get("professional_presentation")
+        ]
+        # Filter out None values
+        valid_metrics = [m for m in metrics if m is not None]
+        if not valid_metrics:
+            return 0.0
+        return sum(valid_metrics) / len(valid_metrics)
+    def _calculate_trend(self) -> Dict[str, float]:
+        """Calculate recent feedback trend"""
+        feedback_data = self._load_feedback()
+        if len(feedback_data) < 5:
+            return {"trend": "insufficient_data"}
+        # Sort by timestamp
+        sorted_feedback = sorted(feedback_data, key=lambda x: x.get("timestamp", 0))
+        # Compare recent vs older feedback
+        mid_point = len(sorted_feedback) // 2
+        older_scores = [fb["overall_score"] for fb in sorted_feedback[:mid_point] if fb.get("overall_score")]
+        recent_scores = [fb["overall_score"] for fb in sorted_feedback[mid_point:] if fb.get("overall_score")]
+        if older_scores and recent_scores:
+            older_avg = sum(older_scores) / len(older_scores)
+            recent_avg = sum(recent_scores) / len(recent_scores)
+            improvement = recent_avg - older_avg
+            return {
+                "older_average": older_avg,
+                "recent_average": recent_avg,
+                "improvement": improvement,
+                "trend": "improving" if improvement > 0.1 else "stable" if abs(improvement) <= 0.1 else "declining"
+            }
+        return {"trend": "insufficient_data"}

agents/reward_model.py ADDED Viewed

	@@ -0,0 +1,307 @@

+"""
+RLHF Reward Model for FinRyver
+Predicts quality scores for generated financial statements based on human feedback
+"""
+import json
+import os
+import logging
+from typing import Dict, Any, List, Optional, Tuple
+import numpy as np
+from sklearn.ensemble import RandomForestRegressor
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import mean_squared_error, r2_score
+import joblib
+import time
+logger = logging.getLogger(__name__)
+class FinancialRewardModel:
+    """
+    Reward model that predicts quality scores for financial statements
+    Uses traditional ML initially, can be upgraded to transformer-based models
+    """
+    def __init__(self, model_dir: str = "data/models"):
+        self.model_dir = model_dir
+        self.model_path = os.path.join(model_dir, "reward_model.pkl")
+        self.feature_names_path = os.path.join(model_dir, "feature_names.json")
+        self.model_stats_path = os.path.join(model_dir, "model_stats.json")
+        os.makedirs(model_dir, exist_ok=True)
+        # Initialize model
+        self.model = RandomForestRegressor(
+            n_estimators=100,
+            max_depth=10,
+            random_state=42,
+            n_jobs=-1
+        )
+        self.feature_names = []
+        self.is_trained = False
+        self.model_version = "1.0"
+        # Load existing model if available
+        self._load_model()
+    def extract_features(self, statement_data: Dict[str, Any], statement_content: str = "") -> np.ndarray:
+        """Extract features from statement data for reward prediction"""
+        features = []
+        # Basic metadata features
+        features.append(len(statement_content))  # Content length
+        features.append(statement_data.get("generation_time", 0))  # Generation time
+        features.append(1 if statement_data.get("statement_type") == "notes" else 0)
+        features.append(1 if statement_data.get("statement_type") == "balance_sheet" else 0)
+        features.append(1 if statement_data.get("statement_type") == "pnl" else 0)
+        features.append(1 if statement_data.get("statement_type") == "cash_flow" else 0)
+        # Content-based features (simple heuristics)
+        if statement_content:
+            features.append(statement_content.count("$"))  # Number of monetary values
+            features.append(statement_content.count("\n"))  # Number of lines
+            features.append(len(statement_content.split()))  # Word count
+            features.append(statement_content.count("."))  # Number of sentences
+            features.append(statement_content.count(","))  # Number of commas (complexity indicator)
+            # Financial keywords
+            financial_keywords = ["asset", "liability", "equity", "revenue", "expense", "cash", "account"]
+            keyword_count = sum(statement_content.lower().count(keyword) for keyword in financial_keywords)
+            features.append(keyword_count)
+            # Professional language indicators
+            professional_words = ["accordance", "pursuant", "whereas", "therefore", "respective"]
+            professional_count = sum(statement_content.lower().count(word) for word in professional_words)
+            features.append(professional_count)
+        else:
+            # Default values if no content available
+            features.extend([0] * 7)
+        # File-based features (if available)
+        metadata = statement_data.get("metadata", {})
+        features.append(metadata.get("file_size", 0))
+        features.append(metadata.get("num_accounts", 0))
+        features.append(metadata.get("complexity_score", 0))
+        # Ensure we have consistent feature names
+        if not self.feature_names:
+            self.feature_names = [
+                "content_length", "generation_time", "is_notes", "is_balance_sheet",
+                "is_pnl", "is_cash_flow", "monetary_values", "line_count",
+                "word_count", "sentence_count", "comma_count", "financial_keywords",
+                "professional_words", "file_size", "num_accounts", "complexity_score"
+            ]
+        return np.array(features).reshape(1, -1)
+    def train_reward_model(self, training_data: List[Dict[str, Any]]) -> Dict[str, float]:
+        """Train reward model from human feedback data"""
+        if len(training_data) < 2:  # Lowered from 10 to 2 for testing
+            logger.warning(f"Insufficient training data: {len(training_data)} samples")
+            return {"error": "insufficient_data", "sample_count": len(training_data)}
+        # Prepare training data
+        X = []
+        y = []
+        for sample in training_data:
+            # Create dummy statement data for feature extraction
+            statement_data = {
+                "statement_type": sample.get("statement_type", "unknown"),
+                "generation_time": sample.get("generation_time", 0),
+                "metadata": sample.get("metadata", {})
+            }
+            # Extract features
+            features = self.extract_features(statement_data, "")
+            X.append(features.flatten())
+            y.append(sample["reward_score"])
+        X = np.array(X)
+        y = np.array(y)
+        # Split data
+        if len(X) > 20:
+            X_train, X_test, y_train, y_test = train_test_split(
+                X, y, test_size=0.2, random_state=42
+            )
+        else:
+            X_train, X_test, y_train, y_test = X, X, y, y
+        # Train model
+        logger.info(f"Training reward model with {len(X_train)} samples")
+        self.model.fit(X_train, y_train)
+        # Evaluate model
+        train_pred = self.model.predict(X_train)
+        test_pred = self.model.predict(X_test)
+        metrics = {
+            "train_mse": mean_squared_error(y_train, train_pred),
+            "test_mse": mean_squared_error(y_test, test_pred),
+            "train_r2": r2_score(y_train, train_pred),
+            "test_r2": r2_score(y_test, test_pred),
+            "sample_count": len(training_data),
+            "feature_importance": dict(zip(self.feature_names, self.model.feature_importances_))
+        }
+        self.is_trained = True
+        # Save model
+        self._save_model(metrics)
+        logger.info(f"Reward model trained. R2 score: {metrics['test_r2']:.3f}")
+        return metrics
+    def predict_reward(self, statement_data: Dict[str, Any], statement_content: str = "") -> float:
+        """Predict reward score for a generated financial statement"""
+        if not self.is_trained:
+            logger.warning("Reward model not trained, returning default score")
+            return 3.0  # Default neutral score
+        try:
+            features = self.extract_features(statement_data, statement_content)
+            reward = self.model.predict(features)[0]
+            # Clamp to valid range [1, 5]
+            reward = max(1.0, min(5.0, reward))
+            return float(reward)
+        except Exception as e:
+            logger.error(f"Error predicting reward: {e}")
+            return 3.0  # Default score on error
+    def predict_with_confidence(self, statement_data: Dict[str, Any], statement_content: str = "") -> Tuple[float, float]:
+        """Predict reward with confidence interval"""
+        if not self.is_trained:
+            return 3.0, 0.0
+        try:
+            features = self.extract_features(statement_data, statement_content)
+            # For Random Forest, we can get prediction from all trees
+            tree_predictions = [tree.predict(features)[0] for tree in self.model.estimators_]
+            reward = np.mean(tree_predictions)
+            confidence = 1.0 / (1.0 + np.std(tree_predictions))  # Higher std = lower confidence
+            reward = max(1.0, min(5.0, reward))
+            return float(reward), float(confidence)
+        except Exception as e:
+            logger.error(f"Error predicting reward with confidence: {e}")
+            return 3.0, 0.0
+    def get_feature_importance(self) -> Dict[str, float]:
+        """Get feature importance from trained model"""
+        if not self.is_trained:
+            return {}
+        return dict(zip(self.feature_names, self.model.feature_importances_))
+    def get_model_stats(self) -> Dict[str, Any]:
+        """Get model training statistics"""
+        if os.path.exists(self.model_stats_path):
+            try:
+                with open(self.model_stats_path, "r") as f:
+                    return json.load(f)
+            except:
+                pass
+        return {"status": "not_trained"}
+    def _save_model(self, training_stats: Dict[str, Any]):
+        """Save trained model and metadata"""
+        try:
+            # Save model
+            joblib.dump(self.model, self.model_path)
+            # Save feature names
+            with open(self.feature_names_path, "w") as f:
+                json.dump(self.feature_names, f)
+            # Save training stats
+            stats = {
+                "model_version": self.model_version,
+                "training_timestamp": time.time(),
+                "is_trained": True,
+                **training_stats
+            }
+            with open(self.model_stats_path, "w") as f:
+                json.dump(stats, f, indent=2)
+            logger.info("Reward model saved successfully")
+        except Exception as e:
+            logger.error(f"Error saving model: {e}")
+    def _load_model(self):
+        """Load existing trained model"""
+        try:
+            if os.path.exists(self.model_path) and os.path.exists(self.feature_names_path):
+                self.model = joblib.load(self.model_path)
+                with open(self.feature_names_path, "r") as f:
+                    self.feature_names = json.load(f)
+                self.is_trained = True
+                logger.info("Existing reward model loaded successfully")
+        except Exception as e:
+            logger.warning(f"Could not load existing model: {e}")
+            self.is_trained = False
+class RLHFTrainer:
+    """Coordinates RLHF training pipeline"""
+    def __init__(self, feedback_manager, reward_model):
+        self.feedback_manager = feedback_manager
+        self.reward_model = reward_model
+        self.min_feedback_threshold = 2  # Lowered for testing (was 20)
+    def should_retrain(self) -> bool:
+        """Determine if model should be retrained"""
+        stats = self.feedback_manager.get_feedback_stats()
+        # Check if we have enough new feedback
+        total_feedback = stats.get("total_feedback", 0)
+        # Get last training count
+        model_stats = self.reward_model.get_model_stats()
+        last_training_count = model_stats.get("sample_count", 0)
+        new_feedback_count = total_feedback - last_training_count
+        return (total_feedback >= self.min_feedback_threshold and
+                new_feedback_count >= 2)  # At least 2 new samples (was 10)
+    def retrain_model(self) -> Dict[str, Any]:
+        """Retrain reward model with latest feedback"""
+        training_data = self.feedback_manager.get_training_data()
+        if len(training_data) < self.min_feedback_threshold:
+            return {
+                "status": "insufficient_data",
+                "current_count": len(training_data),
+                "required_count": self.min_feedback_threshold
+            }
+        # Train model
+        metrics = self.reward_model.train_reward_model(training_data)
+        return {
+            "status": "success",
+            "training_metrics": metrics,
+            "timestamp": time.time()
+        }
+    def periodic_training_check(self) -> Dict[str, Any]:
+        """Check if retraining is needed and perform if necessary"""
+        if self.should_retrain():
+            logger.info("Initiating automatic model retraining")
+            return self.retrain_model()
+        else:
+            return {"status": "no_retraining_needed"}

agents/rlhf_routes.py ADDED Viewed

	@@ -0,0 +1,352 @@

+"""
+RLHF Feedback Collection Routes for FinRyver
+Handles human feedback collection for financial statement quality
+"""
+from fastapi import APIRouter, HTTPException, Form, Query, Request
+from fastapi.responses import JSONResponse, HTMLResponse
+from typing import Optional, Dict, Any
+import logging
+from agents.feedback_manager import FeedbackManager
+from agents.reward_model import FinancialRewardModel, RLHFTrainer
+from agents.rlhf_workflows import get_rlhf_manager
+logger = logging.getLogger(__name__)
+# Create RLHF router
+rlhf_router = APIRouter(prefix="/rlhf", tags=["RLHF Feedback"])
+# Initialize components
+feedback_manager = FeedbackManager()
+reward_model = FinancialRewardModel()
+trainer = RLHFTrainer(feedback_manager, reward_model)
+@rlhf_router.post("/feedback")
+async def collect_feedback(
+    statement_id: str = Form(...),
+    reviewer_id: str = Form("anonymous"),
+    # Technical accuracy metrics (1-5 scale)
+    calculation_accuracy: float = Form(..., ge=1, le=5),
+    account_classification: float = Form(..., ge=1, le=5),
+    statement_balance: float = Form(..., ge=1, le=5),
+    # Compliance metrics (1-5 scale)
+    accounting_standards: float = Form(..., ge=1, le=5),
+    regulatory_compliance: float = Form(..., ge=1, le=5),
+    # Quality metrics (1-5 scale)
+    completeness: float = Form(..., ge=1, le=5),
+    professional_presentation: float = Form(..., ge=1, le=5),
+    # Qualitative feedback
+    specific_errors: str = Form(""),
+    missing_items: str = Form(""),
+    improvement_suggestions: str = Form(""),
+    # Binary approval
+    would_accept_for_audit: bool = Form(False),
+    # Additional context
+    complexity_level: str = Form("medium")  # low, medium, high
+):
+    """
+    Collect detailed human feedback on generated financial statements
+    This feedback is used to train and improve the AI models
+    """
+    try:
+        # Get statement info
+        statement_info = feedback_manager.get_statement_for_review(statement_id)
+        if not statement_info:
+            raise HTTPException(status_code=404, detail="Statement not found")
+        # Prepare feedback data
+        feedback_data = {
+            "statement_id": statement_id,
+            "reviewer_id": reviewer_id,
+            "calculation_accuracy": calculation_accuracy,
+            "account_classification": account_classification,
+            "statement_balance": statement_balance,
+            "accounting_standards": accounting_standards,
+            "regulatory_compliance": regulatory_compliance,
+            "completeness": completeness,
+            "professional_presentation": professional_presentation,
+            "specific_errors": specific_errors,
+            "missing_items": missing_items,
+            "improvement_suggestions": improvement_suggestions,
+            "would_accept_for_audit": would_accept_for_audit,
+            "statement_type": statement_info.get("statement_type"),
+            "complexity_level": complexity_level
+        }
+        # Store feedback
+        feedback_id = feedback_manager.store_feedback(feedback_data)
+        # Check if model should be retrained
+        retrain_result = trainer.periodic_training_check()
+        return {
+            "status": "success",
+            "feedback_id": feedback_id,
+            "message": "Feedback collected successfully",
+            "model_retrain_status": retrain_result.get("status"),
+            "overall_score": feedback_manager._compute_overall_score(feedback_data)
+        }
+    except Exception as e:
+        logger.error(f"Error collecting feedback: {e}")
+        raise HTTPException(status_code=500, detail=f"Error collecting feedback: {str(e)}")
+@rlhf_router.get("/review/{statement_id}")
+async def get_review_interface(statement_id: str):
+    """
+    Get a review interface for human feedback collection
+    Returns HTML form for statement review
+    """
+    try:
+        statement_info = feedback_manager.get_statement_for_review(statement_id)
+        if not statement_info:
+            raise HTTPException(status_code=404, detail="Statement not found")
+        # Generate HTML review form
+        html_content = generate_review_html(statement_id, statement_info)
+        return HTMLResponse(content=html_content)
+    except Exception as e:
+        logger.error(f"Error getting review interface: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+@rlhf_router.get("/pending-reviews")
+async def get_pending_reviews(limit: int = Query(10, ge=1, le=50)):
+    """
+    Get statements that need human review
+    """
+    try:
+        pending_statements = feedback_manager.get_pending_reviews(limit)
+        return {
+            "status": "success",
+            "pending_reviews": pending_statements,
+            "count": len(pending_statements)
+        }
+    except Exception as e:
+        logger.error(f"Error getting pending reviews: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+@rlhf_router.get("/stats")
+async def get_feedback_stats():
+    """
+    Get feedback and model training statistics
+    """
+    try:
+        feedback_stats = feedback_manager.get_feedback_stats()
+        model_stats = reward_model.get_model_stats()
+        feature_importance = reward_model.get_feature_importance()
+        return {
+            "status": "success",
+            "feedback_stats": feedback_stats,
+            "model_stats": model_stats,
+            "feature_importance": feature_importance,
+            "model_trained": reward_model.is_trained
+        }
+    except Exception as e:
+        logger.error(f"Error getting stats: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+@rlhf_router.post("/retrain")
+async def manual_retrain():
+    """
+    Manually trigger model retraining
+    """
+    try:
+        result = trainer.retrain_model()
+        return {
+            "status": "success",
+            "retrain_result": result
+        }
+    except Exception as e:
+        logger.error(f"Error during manual retrain: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+@rlhf_router.get("/model-info")
+async def get_model_info():
+    """
+    Get information about the current reward model
+    """
+    try:
+        return {
+            "status": "success",
+            "model_trained": reward_model.is_trained,
+            "model_version": reward_model.model_version,
+            "feature_count": len(reward_model.feature_names),
+            "feature_names": reward_model.feature_names,
+            "model_stats": reward_model.get_model_stats()
+        }
+    except Exception as e:
+        logger.error(f"Error getting model info: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+def generate_review_html(statement_id: str, statement_info: Dict) -> str:
+    """Generate HTML form for statement review"""
+    return f"""
+    <!DOCTYPE html>
+    <html>
+    <head>
+        <title>FinRyver - Statement Review</title>
+        <style>
+            body {{ font-family: Arial, sans-serif; margin: 40px; }}
+            .form-group {{ margin: 15px 0; }}
+            label {{ display: block; margin-bottom: 5px; font-weight: bold; }}
+            input, select, textarea {{ width: 100%; padding: 8px; margin-bottom: 10px; }}
+            .rating {{ display: flex; gap: 10px; }}
+            .rating input {{ width: auto; }}
+            button {{ background-color: #007bff; color: white; padding: 10px 20px; border: none; cursor: pointer; }}
+            .statement-info {{ background-color: #f8f9fa; padding: 15px; margin-bottom: 20px; border-radius: 5px; }}
+        </style>
+    </head>
+    <body>
+        <h1>Financial Statement Review</h1>
+        <div class="statement-info">
+            <h3>Statement Information</h3>
+            <p><strong>Statement ID:</strong> {statement_id}</p>
+            <p><strong>Type:</strong> {statement_info.get('statement_type', 'Unknown')}</p>
+            <p><strong>Generated:</strong> {statement_info.get('timestamp', 'Unknown')}</p>
+            <p><strong>File:</strong> {statement_info.get('file_path', 'Unknown')}</p>
+        </div>
+        <form action="/rlhf/feedback" method="post">
+            <input type="hidden" name="statement_id" value="{statement_id}">
+            <div class="form-group">
+                <label>Reviewer ID (optional):</label>
+                <input type="text" name="reviewer_id" placeholder="Enter your identifier">
+            </div>
+            <h3>Technical Accuracy (1-5 scale)</h3>
+            <div class="form-group">
+                <label>Calculation Accuracy:</label>
+                <select name="calculation_accuracy" required>
+                    <option value="">Select rating</option>
+                    <option value="1">1 - Major calculation errors</option>
+                    <option value="2">2 - Some calculation errors</option>
+                    <option value="3">3 - Minor calculation issues</option>
+                    <option value="4">4 - Mostly accurate calculations</option>
+                    <option value="5">5 - All calculations correct</option>
+                </select>
+            </div>
+            <div class="form-group">
+                <label>Account Classification:</label>
+                <select name="account_classification" required>
+                    <option value="">Select rating</option>
+                    <option value="1">1 - Major classification errors</option>
+                    <option value="2">2 - Some classification errors</option>
+                    <option value="3">3 - Minor classification issues</option>
+                    <option value="4">4 - Mostly correct classification</option>
+                    <option value="5">5 - Perfect classification</option>
+                </select>
+            </div>
+            <div class="form-group">
+                <label>Statement Balance/Reconciliation:</label>
+                <select name="statement_balance" required>
+                    <option value="">Select rating</option>
+                    <option value="1">1 - Does not balance</option>
+                    <option value="2">2 - Major balance issues</option>
+                    <option value="3">3 - Minor balance issues</option>
+                    <option value="4">4 - Mostly balanced</option>
+                    <option value="5">5 - Perfect balance</option>
+                </select>
+            </div>
+            <h3>Compliance & Standards (1-5 scale)</h3>
+            <div class="form-group">
+                <label>Accounting Standards Compliance:</label>
+                <select name="accounting_standards" required>
+                    <option value="">Select rating</option>
+                    <option value="1">1 - Major compliance issues</option>
+                    <option value="2">2 - Some compliance issues</option>
+                    <option value="3">3 - Minor compliance issues</option>
+                    <option value="4">4 - Mostly compliant</option>
+                    <option value="5">5 - Fully compliant</option>
+                </select>
+            </div>
+            <div class="form-group">
+                <label>Regulatory Compliance:</label>
+                <select name="regulatory_compliance" required>
+                    <option value="">Select rating</option>
+                    <option value="1">1 - Major regulatory issues</option>
+                    <option value="2">2 - Some regulatory issues</option>
+                    <option value="3">3 - Minor regulatory issues</option>
+                    <option value="4">4 - Mostly compliant</option>
+                    <option value="5">5 - Fully compliant</option>
+                </select>
+            </div>
+            <h3>Quality & Presentation (1-5 scale)</h3>
+            <div class="form-group">
+                <label>Completeness:</label>
+                <select name="completeness" required>
+                    <option value="">Select rating</option>
+                    <option value="1">1 - Major items missing</option>
+                    <option value="2">2 - Some items missing</option>
+                    <option value="3">3 - Minor items missing</option>
+                    <option value="4">4 - Mostly complete</option>
+                    <option value="5">5 - Complete</option>
+                </select>
+            </div>
+            <div class="form-group">
+                <label>Professional Presentation:</label>
+                <select name="professional_presentation" required>
+                    <option value="">Select rating</option>
+                    <option value="1">1 - Unprofessional</option>
+                    <option value="2">2 - Below standard</option>
+                    <option value="3">3 - Adequate</option>
+                    <option value="4">4 - Good presentation</option>
+                    <option value="5">5 - Excellent presentation</option>
+                </select>
+            </div>
+            <h3>Detailed Feedback</h3>
+            <div class="form-group">
+                <label>Specific Errors (if any):</label>
+                <textarea name="specific_errors" rows="3" placeholder="Describe any specific errors found..."></textarea>
+            </div>
+            <div class="form-group">
+                <label>Missing Items (if any):</label>
+                <textarea name="missing_items" rows="3" placeholder="List any missing items or information..."></textarea>
+            </div>
+            <div class="form-group">
+                <label>Improvement Suggestions:</label>
+                <textarea name="improvement_suggestions" rows="3" placeholder="Suggest improvements..."></textarea>
+            </div>
+            <div class="form-group">
+                <label>Complexity Level:</label>
+                <select name="complexity_level">
+                    <option value="low">Low</option>
+                    <option value="medium" selected>Medium</option>
+                    <option value="high">High</option>
+                </select>
+            </div>
+            <div class="form-group">
+                <label>
+                    <input type="checkbox" name="would_accept_for_audit" value="true">
+                    Would accept this statement for audit/compliance purposes
+                </label>
+            </div>
+            <button type="submit">Submit Feedback</button>
+        </form>
+    </body>
+    </html>
+    """

agents/rlhf_workflows.py ADDED Viewed

	@@ -0,0 +1,267 @@

+"""
+RLHF-Enhanced LangGraph Workflows for FinRyver
+Integrates reward model and feedback collection into existing workflows
+"""
+from typing import TypedDict, Dict, Any, List, Annotated, Optional
+import time
+import uuid
+import os
+import logging
+from langgraph.graph import StateGraph, END
+from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
+# Import existing tools and RLHF components
+from agents.simple_tools import (
+    generate_notes_full_pipeline_from_path,
+    generate_balance_sheet,
+    generate_pnl_statement,
+    generate_cash_flow_statement,
+)
+from agents.feedback_manager import FeedbackManager
+from agents.reward_model import FinancialRewardModel, RLHFTrainer
+logger = logging.getLogger(__name__)
+class RLHFFinancialAgentState(TypedDict):
+    """Enhanced state with RLHF capabilities"""
+    messages: Annotated[List[BaseMessage], "History"]
+    file_path: str
+    result: Dict[str, Any]
+    status: str
+    start_time: float
+    end_time: float
+    error: str
+    # RLHF-specific fields
+    statement_id: Optional[str]
+    predicted_quality: Optional[float]
+    confidence_score: Optional[float]
+    candidates_generated: Optional[List[Dict[str, Any]]]
+    best_candidate_index: Optional[int]
+    feedback_collected: Optional[bool]
+class RLHFWorkflowManager:
+    """Manages RLHF-enhanced workflows"""
+    def __init__(self):
+        self.feedback_manager = FeedbackManager()
+        self.reward_model = FinancialRewardModel()
+        self.trainer = RLHFTrainer(self.feedback_manager, self.reward_model)
+        # Check for model retraining on initialization
+        self._check_and_retrain()
+    def _check_and_retrain(self):
+        """Check if model needs retraining"""
+        try:
+            result = self.trainer.periodic_training_check()
+            if result.get("status") == "success":
+                logger.info("Reward model retrained successfully")
+        except Exception as e:
+            logger.error(f"Error during model retraining check: {e}")
+    def make_rlhf_workflow(self, tool_func, statement_type: str):
+        """Create RLHF-enhanced workflow"""
+        def rlhf_node(state: RLHFFinancialAgentState) -> RLHFFinancialAgentState:
+            state["start_time"] = time.time()
+            state["statement_id"] = str(uuid.uuid4())
+            try:
+                # Generate multiple candidates if reward model is trained
+                if self.reward_model.is_trained:
+                    candidates = self._generate_candidates(tool_func, state, num_candidates=3)
+                    state["candidates_generated"] = candidates
+                    # Select best candidate using reward model
+                    best_candidate, best_index = self._select_best_candidate(
+                        candidates, statement_type, state["file_path"]
+                    )
+                    state["result"] = best_candidate
+                    state["best_candidate_index"] = best_index
+                else:
+                    # Single generation if no trained model
+                    result = tool_func.invoke({"file_path": state["file_path"]})
+                    state["result"] = result
+                    state["candidates_generated"] = [result]
+                    state["best_candidate_index"] = 0
+                # Predict quality score
+                if state["result"].get("status") == "success":
+                    predicted_quality, confidence = self._predict_quality(
+                        state["result"], statement_type, state["file_path"]
+                    )
+                    state["predicted_quality"] = predicted_quality
+                    state["confidence_score"] = confidence
+                    state["status"] = "success"
+                    # Store statement for potential feedback
+                    self._store_for_feedback(state, statement_type)
+                else:
+                    state["status"] = "error"
+                    state["error"] = state["result"].get("error", "Unknown error")
+            except Exception as e:
+                state["status"] = "error"
+                state["error"] = str(e)
+                logger.error(f"Error in RLHF workflow: {e}")
+            state["end_time"] = time.time()
+            return state
+        # Create workflow graph
+        wf = StateGraph(RLHFFinancialAgentState)
+        wf.add_node("rlhf_run", rlhf_node)
+        wf.set_entry_point("rlhf_run")
+        wf.add_edge("rlhf_run", END)
+        return wf.compile()
+    def _generate_candidates(self, tool_func, state: RLHFFinancialAgentState, num_candidates: int = 3) -> List[Dict[str, Any]]:
+        """Generate multiple candidates for comparison"""
+        candidates = []
+        for i in range(num_candidates):
+            try:
+                result = tool_func.invoke({"file_path": state["file_path"]})
+                candidates.append({
+                    "index": i,
+                    "result": result,
+                    "timestamp": time.time()
+                })
+            except Exception as e:
+                logger.warning(f"Failed to generate candidate {i}: {e}")
+                candidates.append({
+                    "index": i,
+                    "result": {"status": "error", "error": str(e)},
+                    "timestamp": time.time()
+                })
+        return candidates
+    def _select_best_candidate(self, candidates: List[Dict[str, Any]], statement_type: str, file_path: str) -> tuple:
+        """Select best candidate using reward model"""
+        best_candidate = None
+        best_score = -1
+        best_index = 0
+        for candidate in candidates:
+            if candidate["result"].get("status") == "success":
+                # Create statement data for reward prediction
+                statement_data = {
+                    "statement_type": statement_type,
+                    "file_path": file_path,
+                    "generation_time": 0,  # Could be calculated from timestamps
+                    "metadata": {}
+                }
+                # Predict reward
+                predicted_reward, confidence = self.reward_model.predict_with_confidence(
+                    statement_data, ""
+                )
+                # Weight by confidence
+                weighted_score = predicted_reward * confidence
+                if weighted_score > best_score:
+                    best_score = weighted_score
+                    best_candidate = candidate["result"]
+                    best_index = candidate["index"]
+        # Fallback to first successful candidate
+        if best_candidate is None:
+            for candidate in candidates:
+                if candidate["result"].get("status") == "success":
+                    best_candidate = candidate["result"]
+                    best_index = candidate["index"]
+                    break
+        # Final fallback
+        if best_candidate is None and candidates:
+            best_candidate = candidates[0]["result"]
+            best_index = 0
+        return best_candidate, best_index
+    def _predict_quality(self, result: Dict[str, Any], statement_type: str, file_path: str) -> tuple:
+        """Predict quality score for generated statement"""
+        statement_data = {
+            "statement_type": statement_type,
+            "file_path": file_path,
+            "generation_time": 0,
+            "metadata": {}
+        }
+        return self.reward_model.predict_with_confidence(statement_data, "")
+    def _store_for_feedback(self, state: RLHFFinancialAgentState, statement_type: str):
+        """Store generated statement for feedback collection"""
+        try:
+            statement_data = {
+                "type": statement_type,
+                "file_path": state["file_path"],
+                "output_path": state["result"].get("output_path"),
+                "generation_time": state["end_time"] - state["start_time"],
+                "predicted_quality": state.get("predicted_quality"),
+                "confidence_score": state.get("confidence_score"),
+                "metadata": {
+                    "candidates_count": len(state.get("candidates_generated", [])),
+                    "best_candidate_index": state.get("best_candidate_index"),
+                    "workflow_version": "rlhf_v1"
+                }
+            }
+            stored_id = self.feedback_manager.store_generated_statement(statement_data)
+            state["statement_id"] = stored_id
+        except Exception as e:
+            logger.error(f"Error storing statement for feedback: {e}")
+# Global RLHF manager instance
+rlhf_manager = RLHFWorkflowManager()
+# RLHF-enhanced workflows
+rlhf_workflows = {
+    "notes": rlhf_manager.make_rlhf_workflow(generate_notes_full_pipeline_from_path, "notes"),
+    "pnl": rlhf_manager.make_rlhf_workflow(generate_pnl_statement, "pnl"),
+    "bs": rlhf_manager.make_rlhf_workflow(generate_balance_sheet, "balance_sheet"),
+    "cf": rlhf_manager.make_rlhf_workflow(generate_cash_flow_statement, "cash_flow"),
+}
+def run_rlhf_workflow(file_path: str, kind: str) -> Dict[str, Any]:
+    """Run RLHF-enhanced workflow"""
+    state = RLHFFinancialAgentState(
+        messages=[HumanMessage(content=f"Run RLHF {kind} for {file_path}")],
+        file_path=file_path,
+        result={},
+        status="",
+        start_time=0,
+        end_time=0,
+        error="",
+        statement_id=None,
+        predicted_quality=None,
+        confidence_score=None,
+        candidates_generated=None,
+        best_candidate_index=None,
+        feedback_collected=False
+    )
+    final_state = rlhf_workflows[kind].invoke(state)
+    # Add RLHF metadata to result
+    if final_state["status"] == "success":
+        final_state["result"]["rlhf_metadata"] = {
+            "statement_id": final_state.get("statement_id"),
+            "predicted_quality": final_state.get("predicted_quality"),
+            "confidence_score": final_state.get("confidence_score"),
+            "candidates_generated": len(final_state.get("candidates_generated", [])),
+            "model_used": "rlhf_enhanced"
+        }
+    return final_state
+def get_rlhf_manager() -> RLHFWorkflowManager:
+    """Get global RLHF manager instance"""
+    return rlhf_manager

app.py CHANGED Viewed

@@ -1,9 +1,11 @@
-from fastapi import FastAPI, APIRouter, UploadFile, File, HTTPException
 from fastapi.responses import FileResponse
 import os
 import shutil
 import logging
 from agents.langgraph import run_workflow
 # Configure logging for the application
 logging.basicConfig(level=logging.INFO)
@@ -13,9 +15,12 @@ logger = logging.getLogger("financial_notes_api")
 app = FastAPI(
     title="Financial Notes Generator API",
-    description="API for generating financial notes, balance sheets, cash flow statements, and P&L reports.",
     version="1.0.0"
 )
 @app.on_event("startup")
 async def startup_event():
     logger.info("Financial Notes Generator API has started.")
@@ -81,34 +86,70 @@ async def llm_generate_and_excel(
 @router.post("/notes")
-async def notes_route(file: UploadFile = File(...)):
     file_path = f"data/input/{file.filename}"
     os.makedirs("data/input", exist_ok=True)
     with open(file_path, "wb") as buffer:
         shutil.copyfileobj(file.file, buffer)
-    result = run_workflow(file_path, "notes")
     if result["status"] == "success":
-        return FileResponse(result["result"]["output_xlsx_path"], filename=os.path.basename(result["result"]["output_xlsx_path"]))
     raise HTTPException(status_code=500, detail=result["error"])
 @router.post("/pnl")
-async def pnl_route(file: UploadFile = File(...)):
     file_path = f"data/input/{file.filename}"
     os.makedirs("data/input", exist_ok=True)
     with open(file_path, "wb") as buffer:
         shutil.copyfileobj(file.file, buffer)
-    result = run_workflow(file_path, "pnl")
     if result["status"] == "success":
-        return FileResponse(result["result"].get("output_path", "data/pnl_statement.xlsx"), filename=os.path.basename(result["result"].get("output_path", "data/pnl_statement.xlsx")))
     raise HTTPException(status_code=500, detail=result["error"])
 @router.post("/bs")
-async def bs_route(file: UploadFile = File(...)):
     file_path = f"data/input/{file.filename}"
     os.makedirs("data/input", exist_ok=True)
     with open(file_path, "wb") as buffer:
         shutil.copyfileobj(file.file, buffer)
-    result = run_workflow(file_path, "bs")
     if result["status"] == "success":
         # Use first xlsx file in output dir if present
         output_file = result["result"].get("output_path")
@@ -120,19 +161,44 @@ async def bs_route(file: UploadFile = File(...)):
                 output_file = os.path.join(output_dir, xlsx_files[0])
             else:
                 raise HTTPException(status_code=500, detail="No balance sheet Excel file produced")
-        return FileResponse(output_file, filename=os.path.basename(output_file))
     else:
         raise HTTPException(status_code=500, detail=result["error"])
 @router.post("/cf")
-async def cf_route(file: UploadFile = File(...)):
     file_path = f"data/input/{file.filename}"
     os.makedirs("data/input", exist_ok=True)
     with open(file_path, "wb") as buffer:
         shutil.copyfileobj(file.file, buffer)
-    result = run_workflow(file_path, "cf")
     if result["status"] == "success":
-        return FileResponse(result["result"].get("output_path", "data/cash_flow_statements.xlsx"), filename=os.path.basename(result["result"].get("output_path", "data/cash_flow_statements.xlsx")))
     raise HTTPException(status_code=500, detail=result["error"])
 app.include_router(router)

+from fastapi import FastAPI, APIRouter, UploadFile, File, HTTPException, Query
 from fastapi.responses import FileResponse
 import os
 import shutil
 import logging
 from agents.langgraph import run_workflow
+from agents.rlhf_workflows import run_rlhf_workflow
+from agents.rlhf_routes import rlhf_router
 # Configure logging for the application
 logging.basicConfig(level=logging.INFO)
 app = FastAPI(
     title="Financial Notes Generator API",
+    description="API for generating financial notes, balance sheets, cash flow statements, and P&L reports with RLHF capabilities.",
     version="1.0.0"
 )
+# Include RLHF routes
+app.include_router(rlhf_router)
 @app.on_event("startup")
 async def startup_event():
     logger.info("Financial Notes Generator API has started.")
 @router.post("/notes")
+async def notes_route(file: UploadFile = File(...), use_rlhf: bool = Query(False)):
     file_path = f"data/input/{file.filename}"
     os.makedirs("data/input", exist_ok=True)
     with open(file_path, "wb") as buffer:
         shutil.copyfileobj(file.file, buffer)
+    # Choose workflow based on RLHF preference
+    if use_rlhf:
+        result = run_rlhf_workflow(file_path, "notes")
+    else:
+        result = run_workflow(file_path, "notes")
     if result["status"] == "success":
+        response = FileResponse(result["result"]["output_xlsx_path"], filename=os.path.basename(result["result"]["output_xlsx_path"]))
+        # Add RLHF metadata to headers if available
+        if "rlhf_metadata" in result.get("result", {}):
+            rlhf_data = result["result"]["rlhf_metadata"]
+            response.headers["X-RLHF-Statement-ID"] = str(rlhf_data.get("statement_id", ""))
+            response.headers["X-RLHF-Quality-Score"] = str(rlhf_data.get("predicted_quality", ""))
+            response.headers["X-RLHF-Confidence"] = str(rlhf_data.get("confidence_score", ""))
+        return response
     raise HTTPException(status_code=500, detail=result["error"])
 @router.post("/pnl")
+async def pnl_route(file: UploadFile = File(...), use_rlhf: bool = Query(False)):
     file_path = f"data/input/{file.filename}"
     os.makedirs("data/input", exist_ok=True)
     with open(file_path, "wb") as buffer:
         shutil.copyfileobj(file.file, buffer)
+    # Choose workflow based on RLHF preference
+    if use_rlhf:
+        result = run_rlhf_workflow(file_path, "pnl")
+    else:
+        result = run_workflow(file_path, "pnl")
     if result["status"] == "success":
+        response = FileResponse(result["result"].get("output_path", "data/pnl_statement.xlsx"), filename=os.path.basename(result["result"].get("output_path", "data/pnl_statement.xlsx")))
+        # Add RLHF metadata to headers if available
+        if "rlhf_metadata" in result.get("result", {}):
+            rlhf_data = result["result"]["rlhf_metadata"]
+            response.headers["X-RLHF-Statement-ID"] = str(rlhf_data.get("statement_id", ""))
+            response.headers["X-RLHF-Quality-Score"] = str(rlhf_data.get("predicted_quality", ""))
+            response.headers["X-RLHF-Confidence"] = str(rlhf_data.get("confidence_score", ""))
+        return response
     raise HTTPException(status_code=500, detail=result["error"])
 @router.post("/bs")
+async def bs_route(file: UploadFile = File(...), use_rlhf: bool = Query(False)):
     file_path = f"data/input/{file.filename}"
     os.makedirs("data/input", exist_ok=True)
     with open(file_path, "wb") as buffer:
         shutil.copyfileobj(file.file, buffer)
+    # Choose workflow based on RLHF preference
+    if use_rlhf:
+        result = run_rlhf_workflow(file_path, "bs")
+    else:
+        result = run_workflow(file_path, "bs")
     if result["status"] == "success":
         # Use first xlsx file in output dir if present
         output_file = result["result"].get("output_path")
                 output_file = os.path.join(output_dir, xlsx_files[0])
             else:
                 raise HTTPException(status_code=500, detail="No balance sheet Excel file produced")
+        response = FileResponse(output_file, filename=os.path.basename(output_file))
+        # Add RLHF metadata to headers if available
+        if "rlhf_metadata" in result.get("result", {}):
+            rlhf_data = result["result"]["rlhf_metadata"]
+            response.headers["X-RLHF-Statement-ID"] = str(rlhf_data.get("statement_id", ""))
+            response.headers["X-RLHF-Quality-Score"] = str(rlhf_data.get("predicted_quality", ""))
+            response.headers["X-RLHF-Confidence"] = str(rlhf_data.get("confidence_score", ""))
+        return response
     else:
         raise HTTPException(status_code=500, detail=result["error"])
 @router.post("/cf")
+async def cf_route(file: UploadFile = File(...), use_rlhf: bool = Query(False)):
     file_path = f"data/input/{file.filename}"
     os.makedirs("data/input", exist_ok=True)
     with open(file_path, "wb") as buffer:
         shutil.copyfileobj(file.file, buffer)
+    # Choose workflow based on RLHF preference
+    if use_rlhf:
+        result = run_rlhf_workflow(file_path, "cf")
+    else:
+        result = run_workflow(file_path, "cf")
     if result["status"] == "success":
+        response = FileResponse(result["result"].get("output_path", "data/cash_flow_statements.xlsx"), filename=os.path.basename(result["result"].get("output_path", "data/cash_flow_statements.xlsx")))
+        # Add RLHF metadata to headers if available
+        if "rlhf_metadata" in result.get("result", {}):
+            rlhf_data = result["result"]["rlhf_metadata"]
+            response.headers["X-RLHF-Statement-ID"] = str(rlhf_data.get("statement_id", ""))
+            response.headers["X-RLHF-Quality-Score"] = str(rlhf_data.get("predicted_quality", ""))
+            response.headers["X-RLHF-Confidence"] = str(rlhf_data.get("confidence_score", ""))
+        return response
     raise HTTPException(status_code=500, detail=result["error"])
 app.include_router(router)

requirements.txt CHANGED Viewed

@@ -13,4 +13,11 @@ langchain
 langchain-openai
 langchain-community
 langchain-core
 langgraph

 langchain-openai
 langchain-community
 langchain-core
+#langgraph
 langgraph
+# RLHF dependencies
+scikit-learn
+numpy
+joblib