Spaces:

AJAYKASU
/

QuantScaleAI

Sleeping

App Files Files Community

AJAY KASU commited on Feb 4

Commit

cafdd88

0 Parent(s):

Initial Release: QuantScale AI Institutional Engine

Browse files

Files changed (36) hide show

.env.example +1 -0
.gitignore +2 -0
Dockerfile +14 -0
README.md +122 -0
__pycache__/config.cpython-311.pyc +0 -0
__pycache__/config.cpython-39.pyc +0 -0
__pycache__/main.cpython-311.pyc +0 -0
ai/__pycache__/ai_reporter.cpython-39.pyc +0 -0
ai/__pycache__/prompts.cpython-39.pyc +0 -0
ai/ai_reporter.py +76 -0
ai/prompts.py +38 -0
analytics/__pycache__/attribution.cpython-39.pyc +0 -0
analytics/__pycache__/risk_model.cpython-39.pyc +0 -0
analytics/__pycache__/tax_module.cpython-39.pyc +0 -0
analytics/attribution.py +113 -0
analytics/risk_model.py +58 -0
analytics/tax_module.py +99 -0
api/__pycache__/app.cpython-311.pyc +0 -0
api/app.py +47 -0
api/static/index.html +537 -0
config.py +37 -0
core/__pycache__/schema.cpython-311.pyc +0 -0
core/__pycache__/schema.cpython-39.pyc +0 -0
core/schema.py +97 -0
data/__pycache__/data_manager.cpython-311.pyc +0 -0
data/__pycache__/data_manager.cpython-39.pyc +0 -0
data/__pycache__/optimizer.cpython-39.pyc +0 -0
data/data_manager.py +152 -0
data/optimizer.py +160 -0
data/sector_map.json +505 -0
data/sp500_universe.json +266 -0
debug_optimizer_tech.py +82 -0
debug_yf.py +9 -0
main.py +147 -0
qes_scale_optimizer.ipynb +93 -0
requirements.txt +14 -0

.env.example ADDED Viewed

	@@ -0,0 +1 @@


1	+ HF_TOKEN=hf_your_hugging_face_token_here

.gitignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ .env
2	+ .env

Dockerfile ADDED Viewed

	@@ -0,0 +1,14 @@

+FROM python:3.10-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+# Expose API Port
+EXPOSE 7860
+# Run FastAPI
+CMD ["uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md ADDED Viewed

	@@ -0,0 +1,122 @@

+---
+title: QuantScaleAI
+emoji: 📈
+colorFrom: blue
+colorTo: green
+sdk: docker
+pinned: false
+app_port: 7860
+---
+# QuantScale AI: Automated Direct Indexing & Attribution Engine
+**QuantScale AI** is an institutional-grade portfolio optimization engine designed to replicate the "Direct Indexing" capabilities of top asset managers (e.g., Goldman Sachs, BlackRock).
+[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live%20Demo-blue)](https://huggingface.co/spaces/AJAYKASU/QuantScaleAI)
+[![API Docs](https://img.shields.io/badge/Swagger-API%20Docs-green)](https://ajaykasu-quantscaleai.hf.space/docs)
+It specifically addresses the challenge of **Personalized Indexing at Scale**: allowing 60,000+ client portfolios to track a benchmark (S&P 500) while accommodating specific constraints (Values-based exclusions like "No Energy") and providing automated, high-precision performance attribution.
+---
+## Key Features
+### 1. Quantitative Engine (The Math)
+- **Tracking Error Minimization**: Uses `cvxpy` to solve the quadratic programming problem of minimizing active risk.
+- **Robust Risk Modeling**: Implements **Ledoit-Wolf Covariance Shrinkage** to handle the "High Dimensionality, Low Sample Size" problem inherent in 500-stock correlation matrices.
+- **Direct Indexing**: Optimizes individual stock weights rather than ETFs, enabling granular customization.
+### 2. Wealth Management Features
+- **Tax-Loss Harvesting**: Automated identification of loss lots with **Wash Sale Proxy logic**.
+    - *Example*: Detects a loss in Chevron (CVX) -> Suggests swap to Exxon (XOM) to maintain Energy exposure without triggering wash sale rules.
+- **Sector Caching**: Local caching layer to handle API rate limits and ensure low-latency performance for demos.
+### 3. AI Integration (Generation Alpha)
+- **Attribution Precision**: Uses the **Brinson-Fachler Attribution Model** to decompose excess return into **Allocation Effect** (Sector weighting) and **Selection Effect** (Stock picking).
+- **Hugging Face Integration**: Feeds high-signal attribution data (Top 5 Contributors/Detractors) into `Meta-Llama-3-8B-Instruct` to generate profound, natural language client commentaries.
+---
+## Mathematical Formulation
+The core optimizer solves the following Quadratic Program:
+$$
+\min_{w} \quad (w - w_b)^T \Sigma (w - w_b)
+$$
+**Subject to:**
+$$
+\sum_{i=1}^{N} w_i = 1 \quad (\text{Fully Invested})
+$$
+$$
+w_i \ge 0 \quad (\text{Long Only})
+$$
+$$
+w_{excluded} = 0 \quad (\text{Sector Constraints})
+$$
+Where:
+- $w$ is the vector of portfolio weights.
+- $w_b$ is the vector of benchmark weights.
+- $\Sigma$ is the Ledoit-Wolf shrunk covariance matrix.
+---
+## Tech Stack
+- **Languages**: Python 3.10+
+- **Optimization**: `cvxpy`, `scikit-learn` (Ledoit-Wolf)
+- **Data**: `yfinance` (Market Data), `pandas`, `numpy`
+- **AI/LLM**: `huggingface_hub` (Inference API)
+- **API**: `FastAPI` (Async REST Endpoints)
+- **Architecture**: Object-Oriented (Abstract Managers, Pydantic Schemas)
+---
+## Installation & Usage
+1. **Clone & Install**
+```bash
+git clone https://github.com/AjayKasu1/QuantScaleAI.git
+pip install -r requirements.txt
+```
+2. **Configure Credentials**
+Rename `.env.example` to `.env` and add your Hugging Face Token:
+```env
+HF_TOKEN=hf_...
+```
+3. **Run the API**
+```bash
+uvicorn api.app:app --reload
+```
+POST to `http://127.0.0.1:8000/optimize` with:
+```json
+{
+  "client_id": "CLIENT_01",
+  "excluded_sectors": ["Energy"]
+}
+```
+---
+## Architecture
+```mermaid
+graph TD
+    A[Client Request] --> B[FastAPI Layer]
+    B --> C[QuantScaleSystem]
+    C --> D[MarketDataEngine]
+    D --> E[(Sector Cache)]
+    C --> F[RiskModel]
+    F --> G[PortfolioOptimizer]
+    G --> H[AttributionEngine]
+    H --> I[AIReporter]
+    I --> J((Hugging Face API))
+    J --> I
+    I --> B
+```

__pycache__/config.cpython-311.pyc ADDED Viewed

Binary file (2.27 kB). View file

__pycache__/config.cpython-39.pyc ADDED Viewed

Binary file (1.39 kB). View file

__pycache__/main.cpython-311.pyc ADDED Viewed

Binary file (4.95 kB). View file

ai/__pycache__/ai_reporter.cpython-39.pyc ADDED Viewed

Binary file (2.11 kB). View file

ai/__pycache__/prompts.cpython-39.pyc ADDED Viewed

Binary file (1.6 kB). View file

ai/ai_reporter.py ADDED Viewed

	@@ -0,0 +1,76 @@

+import logging
+from huggingface_hub import InferenceClient
+from core.schema import AttributionReport
+from ai.prompts import SYSTEM_PROMPT, ATTRIBUTION_PROMPT_TEMPLATE
+from config import settings
+logger = logging.getLogger(__name__)
+class AIReporter:
+    """
+    Generates natural language commentary using Hugging Face Inference API.
+    Models used: meta-llama/Meta-Llama-3-8B-Instruct (or similar available via API).
+    """
+    def __init__(self):
+        token = settings.HF_TOKEN.get_secret_value() if settings.HF_TOKEN else None
+        if token:
+            self.client = InferenceClient(token=token)
+        else:
+            self.client = None
+            logger.warning("HF_TOKEN not found. AI features will be disabled.")
+        # Default to a robust instruction model
+        self.model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
+    def generate_report(self,
+                        attribution_report: AttributionReport,
+                        excluded_sector: str) -> str:
+        """
+        Constructs the prompt and calls the HF API to generate the commentary.
+        """
+        logger.info("Generating AI Commentary...")
+        from datetime import datetime
+        # Get current date in a specific format (e.g., "February 03, 2026")
+        current_date = datetime.now().strftime("%B %d, %Y")
+        # Format the user prompt
+        # We assume ATTRIBUTION_PROMPT_TEMPLATE handles the rest, but we force the date in context
+        user_prompt = f"""
+Current Date: {current_date}
+INSTRUCTION: Start your commentary exactly with the header: "Market Commentary - {current_date}"
+""" + ATTRIBUTION_PROMPT_TEMPLATE.format(
+            excluded_sector=excluded_sector,
+            total_active_return=attribution_report.total_active_return * 100, # Convert to %
+            allocation_effect=attribution_report.allocation_effect * 100,
+            selection_effect=attribution_report.selection_effect * 100,
+            top_contributors=", ".join(attribution_report.top_contributors),
+            top_detractors=", ".join(attribution_report.top_detractors),
+            current_date=current_date # Pass date to template
+        )
+        if not self.client:
+             return f"AI Commentary Unavailable. (Missing HF_TOKEN). Current Date: {current_date}"
+        messages = [
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": user_prompt}
+        ]
+        try:
+            response = self.client.chat_completion(
+                model=self.model_id,
+                messages=messages,
+                max_tokens=500,
+                temperature=0.7
+            )
+            commentary = response.choices[0].message.content
+            logger.info("AI Commentary generated successfully.")
+            return commentary
+        except Exception as e:
+            logger.error(f"Failed to generate AI report: {e}")
+            return "Error generating commentary. Please check API connection."

ai/prompts.py ADDED Viewed

	@@ -0,0 +1,38 @@

+# System Prompt for the Portfolio Manager Persona
+SYSTEM_PROMPT = """You are a Senior Portfolio Manager at a top-tier Asset Management firm (e.g., Goldman Sachs, BlackRock).
+Your goal is to write a concise, professional, and insightful performance commentary for a High Net Worth Application.
+Your tone should be:
+1. Professional and reassuring.
+2. Mathematically precise (cite the numbers).
+3. Explanatory (explain 'why' something happened).
+Avoid generic financial advice. Focus strictly on the attribution data provided.
+"""
+# User Prompt Template
+ATTRIBUTION_PROMPT_TEMPLATE = """
+Write a "Trailing 30-Day Risk & Performance Attribution" report relative to the S&P 500 benchmark.
+## Constraints Applied
+- Exclusions: {excluded_sector}
+## Brinson-Fachler Attribution Data (Trailing 30 Days)
+- Total Active Return (Alpha): {total_active_return:.2f}%
+- Allocation Effect (Impact of Exclusions): {allocation_effect:.2f}%
+- Selection Effect (Impact of Stock Picking): {selection_effect:.2f}%
+## Attribution Detail
+- Top Active Contributors: {top_contributors}
+- Top Active Detractors: {top_detractors}
+## Guidelines for the Narrative:
+1. **Timeframe**: Use the EXACT date provided. Write "For the trailing 30-day period ending {current_date}..." DO NOT generalize to "the month of...".
+2. **Ticker Validation (CRITICAL)**: Always verify tickers. ExxonMobil is XOM, Chevron is CVX. Do NOT swap them.
+3. **Attribution Logic**:
+   - If a sector is excluded (0% weight), attribute ALL gains/losses to the **Allocation Effect**.
+   - Do NOT mention 'Selection Effect' for sectors where we hold 0% (e.g., if Energy is excluded, you didn't "select" bad Energy stocks, you just didn't own the sector).
+4. **Detractor Clarity**:
+   - If an EXCLUDED stock (like AMZN, XOM, CVX) is listed as a "Top Detractor", explicitly state: "We suffered a drag because the portfolio missed out on the rally in [Stock] due to exclusion constraints."
+Write a professional, concise 3-paragraph commentary.
+"""

analytics/__pycache__/attribution.cpython-39.pyc ADDED Viewed

Binary file (2.93 kB). View file

analytics/__pycache__/risk_model.cpython-39.pyc ADDED Viewed

Binary file (1.89 kB). View file

analytics/__pycache__/tax_module.cpython-39.pyc ADDED Viewed

Binary file (3.55 kB). View file

analytics/attribution.py ADDED Viewed

	@@ -0,0 +1,113 @@

+import pandas as pd
+import numpy as np
+from typing import Dict, List
+import logging
+from core.schema import AttributionReport
+logger = logging.getLogger(__name__)
+class AttributionEngine:
+    """
+    Implements the Brinson-Fachler Attribution Model.
+    Decomposes portfolio excess return into:
+    1. Allocation Effect: Value added by sector weighting decisions.
+    2. Selection Effect: Value added by stock picking within sectors.
+    """
+    def __init__(self):
+        pass
+    def generate_attribution_report(self,
+                                    portfolio_weights: Dict[str, float],
+                                    benchmark_weights: Dict[str, float],
+                                    asset_returns: pd.Series,
+                                    sector_map: Dict[str, str]) -> AttributionReport:
+        """
+        Calculates attribution effects.
+        Args:
+            portfolio_weights: Ticker -> Weight
+            benchmark_weights: Ticker -> Weight
+            asset_returns: Ticker -> Return (period)
+            sector_map: Ticker -> Sector
+        Returns:
+            AttributionReport object
+        """
+        # Create a DataFrame for calculation
+        all_tickers = set(portfolio_weights.keys()) | set(benchmark_weights.keys())
+        df = pd.DataFrame(index=list(all_tickers))
+        df['wp'] = df.index.map(portfolio_weights).fillna(0.0)
+        df['wb'] = df.index.map(benchmark_weights).fillna(0.0)
+        df['ret'] = df.index.map(asset_returns).fillna(0.0)
+        df['sector'] = df.index.map(sector_map).fillna("Unknown")
+        # Calculate Sector Level Data
+        # Sector Portfolio Return (R_pi), Sector Benchmark Return (R_bi)
+        # Sector Portfolio Weight (w_pi), Sector Benchmark Weight (w_bi)
+        sector_groups = df.groupby('sector')
+        attribution_rows = []
+        total_benchmark_return = (df['wb'] * df['ret']).sum()
+        for sector, data in sector_groups:
+            w_p = data['wp'].sum()
+            w_b = data['wb'].sum()
+            # Avoid division by zero if weight is 0
+            R_p = (data['wp'] * data['ret']).sum() / w_p if w_p > 0 else 0
+            R_b = (data['wb'] * data['ret']).sum() / w_b if w_b > 0 else 0
+            # Brinson-Fachler Allocation: (w_p - w_b) * (R_b - R_total_benchmark)
+            allocation_effect = (w_p - w_b) * (R_b - total_benchmark_return)
+            # Selection Effect: w_b * (R_p - R_b)
+            # Note: Often interaction is w_p * ... or split.
+            # Brinson-Beebower uses w_b for selection.
+            selection_effect = w_b * (R_p - R_b)
+            # Interaction: (w_p - w_b) * (R_p - R_b)
+            interaction_effect = (w_p - w_b) * (R_p - R_b)
+            attribution_rows.append({
+                'sector': sector,
+                'allocation': allocation_effect,
+                'selection': selection_effect,
+                'interaction': interaction_effect,
+                'total_effect': allocation_effect + selection_effect + interaction_effect
+            })
+        attr_df = pd.DataFrame(attribution_rows)
+        total_allocation = attr_df['allocation'].sum()
+        total_selection = attr_df['selection'].sum() # + interaction usually bundled
+        total_interaction = attr_df['interaction'].sum()
+        # Calculate Top Contributors/Detractors to active return
+        # Active Weight * Asset Return? Or Contribution to Active Return?
+        # Contribution to Active Return = w_p*r_a - w_b*r_a ...
+        df['active_weight'] = df['wp'] - df['wb']
+        df['contribution'] = df['active_weight'] * df['ret'] # Simple approx
+        sorted_contrib = df.sort_values(by='contribution', ascending=False)
+        top_contributors = sorted_contrib.head(5).index.tolist()
+        top_detractors = sorted_contrib.tail(5).index.tolist()
+        # Narrative skeleton (to be filled by AI)
+        narrative_raw = (
+            f"Total Active Return: {(total_allocation + total_selection + total_interaction):.4f}. "
+            f"Allocation Effect: {total_allocation:.4f}. "
+            f"Selection Effect: {total_selection + total_interaction:.4f}."
+        )
+        return AttributionReport(
+            allocation_effect=total_allocation,
+            selection_effect=total_selection + total_interaction,
+            total_active_return=(total_allocation + total_selection + total_interaction),
+            top_contributors=top_contributors,
+            top_detractors=top_detractors,
+            narrative=narrative_raw
+        )

analytics/risk_model.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import pandas as pd
+import numpy as np
+from sklearn.covariance import LedoitWolf
+import logging
+logger = logging.getLogger(__name__)
+class RiskModel:
+    """
+    Computes the covariance matrix of asset returns using Ledoit-Wolf Shrinkage.
+    This is essential for high-dimensional portfolios (N > 500) where the
+    sample covariance matrix is often ill-conditioned or noisy.
+    """
+    def __init__(self):
+        pass
+    def compute_covariance_matrix(self, returns: pd.DataFrame) -> pd.DataFrame:
+        """
+        Calculates the shrunk covariance matrix.
+        Args:
+            returns (pd.DataFrame): Historical daily returns (Date index, Ticker columns).
+        Returns:
+            pd.DataFrame: Covariance matrix (Ticker index, Ticker columns).
+        """
+        if returns.empty:
+            logger.error("Returns dataframe is empty. Cannot compute covariance.")
+            raise ValueError("Empty returns dataframe.")
+        logger.info(f"Computing Ledoit-Wolf shrinkage covariance for {returns.shape[1]} assets...")
+        # Use scikit-learn's LedoitWolf estimator
+        lw = LedoitWolf()
+        # Fit logic
+        # Note: scikit-learn expects (n_samples, n_features).
+        # Our returns df is already (n_days, n_tickers), which matches.
+        try:
+            X = returns.values
+            lw.fit(X)
+            # The estimated covariance matrix
+            cov_matrix = lw.covariance_
+            # Reconstruct DataFrame
+            cov_df = pd.DataFrame(
+                cov_matrix,
+                index=returns.columns,
+                columns=returns.columns
+            )
+            logger.info("Covariance matrix computation successful.")
+            return cov_df
+        except Exception as e:
+            logger.error(f"Failed to compute covariance matrix: {e}")
+            raise e

analytics/tax_module.py ADDED Viewed

	@@ -0,0 +1,99 @@

+import pandas as pd
+import numpy as np
+from typing import List, Dict, Optional
+from datetime import date, timedelta
+import logging
+from core.schema import TaxLot, HarvestOpportunity, TickerData
+logger = logging.getLogger(__name__)
+class TaxEngine:
+    """
+    Identifies tax-loss harvesting opportunities and suggests proxies
+    to avoid Wash Sale violations.
+    """
+    def __init__(self, risk_model=None):
+        self.risk_model = risk_model
+    def check_wash_sale_rule(self, symbol: str, transaction_date: date,
+                             recent_transactions: List[Dict]) -> bool:
+        """
+        Checks if a sale would trigger a wash sale based on purchases
+        within +/- 30 days.
+        """
+        # Simplified simulation: Look for any 'buy' of this symbol in last 30 days
+        limit_date = transaction_date - timedelta(days=30)
+        for txn in recent_transactions:
+            if txn['symbol'] == symbol and txn['type'] == 'buy':
+                txn_date = txn['date']
+                if txn_date >= limit_date and txn_date <= transaction_date:
+                    return True
+        return False
+    def find_proxy(self, loser_ticker: str, sector: str,
+                   candidate_tickers: List[TickerData],
+                   correlation_matrix: Optional[pd.DataFrame] = None) -> str:
+        """
+        Finds a suitable proxy stock in the same sector.
+        Ideally high correlation (to maintain tracking) but not "substantially identical".
+        """
+        # Filter for same sector
+        sector_peers = [t.symbol for t in candidate_tickers if t.sector == sector and t.symbol != loser_ticker]
+        if not sector_peers:
+            return "SPY" # Fallback
+        if correlation_matrix is not None and not correlation_matrix.empty:
+            try:
+                # Get correlations for the loser ticker
+                if loser_ticker in correlation_matrix.index:
+                    corrs = correlation_matrix[loser_ticker]
+                    # Filter for sector peers
+                    peer_corrs = corrs[corrs.index.isin(sector_peers)]
+                    # Sort desc, pick top
+                    if not peer_corrs.empty:
+                        best_proxy = peer_corrs.idxmax()
+                        logger.info(f"Found proxy for {loser_ticker} using correlation: {best_proxy} (corr: {peer_corrs.max():.2f})")
+                        return best_proxy
+            except Exception as e:
+                logger.warning(f"Correlation lookup failed: {e}. Falling back to random peer.")
+        # Fallback: Pick a random peer in the sector
+        return sector_peers[0]
+    def harvest_losses(self, portfolio_lots: List[TaxLot],
+                       market_prices: Dict[str, float],
+                       candidate_tickers: List[TickerData],
+                       correlation_matrix: Optional[pd.DataFrame] = None) -> List[HarvestOpportunity]:
+        """
+        Scans portfolio for lots with > 10% Unrealized Loss.
+        """
+        opportunities = []
+        for lot in portfolio_lots:
+            # Update current price if available
+            if lot.symbol in market_prices:
+                lot.current_price = market_prices[lot.symbol]
+            # Check threshold (e.g. -10%)
+            if lot.loss_percentage <= -0.10:
+                # Find Proxy
+                # Need to find the sector for this ticker from candidate_tickers
+                ticker_obj = next((t for t in candidate_tickers if t.symbol == lot.symbol), None)
+                sector = ticker_obj.sector if ticker_obj else "Unknown"
+                proxy = self.find_proxy(lot.symbol, sector, candidate_tickers, correlation_matrix)
+                opp = HarvestOpportunity(
+                    sell_ticker=lot.symbol,
+                    buy_proxy_ticker=proxy,
+                    quantity=lot.quantity,
+                    estimated_loss_harvested=abs(lot.unrealized_pl),
+                    reason=f"Loss of {lot.loss_percentage*100:.1f}% exceeds 10% threshold."
+                )
+                opportunities.append(opp)
+        logger.info(f"Identified {len(opportunities)} tax-loss harvesting opportunities.")
+        return opportunities

api/__pycache__/app.cpython-311.pyc ADDED Viewed

Binary file (2.76 kB). View file

api/app.py ADDED Viewed

	@@ -0,0 +1,47 @@

+from fastapi import FastAPI, HTTPException, Depends
+from core.schema import OptimizationRequest, OptimizationResult
+from main import QuantScaleSystem
+import logging
+app = FastAPI(title="QuantScale AI API", version="1.0.0")
+logger = logging.getLogger("API")
+# Singleton System
+system = QuantScaleSystem()
+from fastapi.responses import RedirectResponse
+from fastapi.responses import FileResponse
+from fastapi.staticfiles import StaticFiles
+# Mount static files
+app.mount("/static", StaticFiles(directory="api/static"), name="static")
+@app.get("/")
+def root():
+    """Serves the AI Interface."""
+    return FileResponse('api/static/index.html')
+@app.get("/health")
+def health_check():
+    return {"status": "healthy", "service": "QuantScale AI Direct Indexing"}
+@app.post("/optimize", response_model=dict)
+def optimize_portfolio(request: OptimizationRequest):
+    """
+    Optimizes a portfolio based on exclusions and generates an AI Attribution report.
+    """
+    try:
+        result = system.run_pipeline(request)
+        if not result:
+            raise HTTPException(status_code=500, detail="Pipeline failed to execute.")
+        return {
+            "client_id": request.client_id,
+            "allocations": result['optimization'].weights,
+            "tracking_error": result['optimization'].tracking_error,
+            "attribution_narrative": result['commentary']
+        }
+    except Exception as e:
+        logger.error(f"API Error: {e}")
+        raise HTTPException(status_code=500, detail=str(e))

api/static/index.html ADDED Viewed

	@@ -0,0 +1,537 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>QuantScale AI</title>
+    <link
+        href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600&family=JetBrains+Mono:wght@400;700&display=swap"
+        rel="stylesheet">
+    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/html2pdf.js/0.10.1/html2pdf.bundle.min.js"></script>
+    <style>
+        :root {
+            --bg-color: #0f1117;
+            --card-bg: #1e212b;
+            --accent: #3b82f6;
+            --text-primary: #e2e8f0;
+            --text-secondary: #94a3b8;
+            --success: #10b981;
+        }
+        body {
+            font-family: 'Inter', sans-serif;
+            background-color: var(--bg-color);
+            color: var(--text-primary);
+            margin: 0;
+            display: flex;
+            flex-direction: column;
+            align-items: center;
+            min-height: 100vh;
+        }
+        .container {
+            width: 100%;
+            max-width: 900px;
+            padding: 2rem;
+            box-sizing: border-box;
+        }
+        header {
+            text-align: center;
+            margin-bottom: 3rem;
+        }
+        h1 {
+            font-size: 2.5rem;
+            margin-bottom: 0.5rem;
+            background: linear-gradient(90deg, #60a5fa, #34d399);
+            -webkit-background-clip: text;
+            -webkit-text-fill-color: transparent;
+        }
+        .subtitle {
+            color: var(--text-secondary);
+            font-size: 1.1rem;
+        }
+        .input-area {
+            background-color: var(--card-bg);
+            padding: 1.5rem;
+            border-radius: 12px;
+            box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1);
+            margin-bottom: 2rem;
+        }
+        textarea {
+            width: 100%;
+            background-color: #0f1117;
+            border: 1px solid #2d3748;
+            color: var(--text-primary);
+            border-radius: 8px;
+            padding: 1rem;
+            font-family: 'Inter', sans-serif;
+            font-size: 1rem;
+            resize: none;
+            height: 80px;
+            box-sizing: border-box;
+            outline: none;
+            transition: border-color 0.2s;
+        }
+        textarea:focus {
+            border-color: var(--accent);
+        }
+        .btn-primary {
+            background-color: var(--accent);
+            color: white;
+            border: none;
+            padding: 0.75rem 1.5rem;
+            border-radius: 8px;
+            font-weight: 600;
+            cursor: pointer;
+            margin-top: 1rem;
+            width: 100%;
+            transition: opacity 0.2s;
+        }
+        .btn-primary:hover {
+            opacity: 0.9;
+        }
+        .loader {
+            display: none;
+            text-align: center;
+            margin: 2rem 0;
+            color: var(--accent);
+        }
+        #results {
+            display: none;
+            animation: fadeIn 0.5s ease;
+        }
+        .report-grid {
+            display: grid;
+            grid-template-columns: 1fr 1fr;
+            gap: 1.5rem;
+            margin-bottom: 2rem;
+        }
+        .card {
+            background-color: var(--card-bg);
+            padding: 1.5rem;
+            border-radius: 12px;
+            border: 1px solid #2d3748;
+        }
+        h3 {
+            margin-top: 0;
+            font-size: 0.9rem;
+            text-transform: uppercase;
+            letter-spacing: 0.05em;
+            color: var(--text-secondary);
+        }
+        .metric {
+            font-size: 2rem;
+            font-weight: 700;
+            color: var(--text-primary);
+        }
+        .metric-label {
+            font-size: 0.875rem;
+            color: var(--text-secondary);
+        }
+        .narrative-box {
+            background-color: #1e212b;
+            border-left: 4px solid var(--success);
+            padding: 1.5rem;
+            border-radius: 0 12px 12px 0;
+            line-height: 1.6;
+        }
+        .holding-list {
+            max-height: 300px;
+            overflow-y: auto;
+            font-family: 'JetBrains Mono', monospace;
+            font-size: 0.9rem;
+        }
+        .holding-item {
+            display: flex;
+            justify-content: space-between;
+            padding: 0.5rem 0;
+            border-bottom: 1px solid #2d3748;
+        }
+        @keyframes fadeIn {
+            from {
+                opacity: 0;
+                transform: translateY(10px);
+            }
+            to {
+                opacity: 1;
+                transform: translateY(0);
+            }
+        }
+        /* PDF Export Styles (Professional Document Mode) */
+        .pdf-mode {
+            /* CRITICAL: Override Variables to Jet Black */
+            --bg-color: #ffffff !important;
+            --card-bg: transparent !important;
+            --text-primary: #000000 !important;
+            --text-secondary: #000000 !important;
+            /* Force subtitles to black */
+            --accent: #000000 !important;
+            background-color: #ffffff !important;
+            color: #000000 !important;
+            padding: 40px;
+        }
+        .pdf-mode .report-grid {
+            gap: 2rem;
+        }
+        .pdf-mode .card {
+            background-color: transparent !important;
+            border: 1px solid #000000 !important;
+            /* Sharp black border */
+            box-shadow: none !important;
+            border-radius: 4px !important;
+            /* Sharper corners */
+            padding: 1.5rem !important;
+            color: #000000 !important;
+        }
+        .pdf-mode h1 {
+            background: none !important;
+            -webkit-text-fill-color: #000000 !important;
+            color: #000000 !important;
+            font-size: 24pt !important;
+            margin-bottom: 5px !important;
+        }
+        .pdf-mode .subtitle {
+            color: #333333 !important;
+            font-size: 14pt !important;
+            margin-bottom: 20px !important;
+        }
+        .pdf-mode h1,
+        .pdf-mode h2,
+        .pdf-mode h3,
+        .pdf-mode p {
+            color: #000000 !important;
+        }
+        .pdf-mode h3 {
+            color: #000000 !important;
+            font-weight: 800 !important;
+            border-bottom: 1px solid #000000;
+            padding-bottom: 5px;
+            margin-bottom: 15px;
+            font-size: 12pt !important;
+        }
+        .pdf-mode .metric {
+            color: #000000 !important;
+            font-size: 28pt !important;
+        }
+        .pdf-mode .metric-label {
+            color: #333333 !important;
+            font-size: 10pt !important;
+            font-weight: 500 !important;
+        }
+        .pdf-mode .holding-item {
+            border-bottom: 1px solid #dddddd !important;
+            color: #000000 !important;
+            font-size: 10pt !important;
+        }
+        .pdf-mode .narrative-box {
+            background-color: transparent !important;
+            /* No grey box */
+            color: #000000 !important;
+            border-left: 4px solid #000000 !important;
+            /* Black accent */
+            padding-left: 15px !important;
+            font-size: 11pt !important;
+            line-height: 1.5 !important;
+            text-align: justify;
+        }
+        /* Force Chart Legends to be dark (might trigger re-render if I could, but simple CSS helps) */
+        .pdf-mode canvas {
+            filter: contrast(1.2);
+            /* Slight boost */
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <header>
+            <h1>QuantScale AI</h1>
+            <div class="subtitle">Direct Indexing & Attribution Engine</div>
+        </header>
+        <div class="input-area">
+            <textarea id="userInput"
+                placeholder="Describe your goal, e.g., 'Optimize my $100k portfolio but exclude the Energy and Utilities sectors.'"></textarea>
+            <button class="btn-primary" onclick="runOptimization()">Generate Portfolio Strategy</button>
+        </div>
+        <div class="loader" id="loader">
+            Running Convex Optimization & AI Model...
+        </div>
+        <div id="results">
+            <!-- Download Button -->
+            <div style="text-align: right; margin-bottom: 1rem;">
+                <button onclick="downloadPDF()"
+                    style="background: transparent; border: 1px solid #3b82f6; color: #3b82f6; padding: 0.5rem 1rem; border-radius: 6px; cursor: pointer; font-family: 'Inter', sans-serif;">
+                    📄 Generate Institutional Report
+                </button>
+            </div>
+            <!-- Top Metrics -->
+            <div class="report-grid">
+                <div class="card">
+                    <h3>Projected Tracking Error</h3>
+                    <div class="metric" id="teMetric">0.00%</div>
+                    <div class="metric-label">vs S&P 500 Benchmark</div>
+                </div>
+                <div class="card">
+                    <h3>Excluded Sectors</h3>
+                    <div class="metric" id="excludedMetric" style="color: #ef4444;">None</div>
+                    <div class="metric-label">Constraints applied</div>
+                </div>
+            </div>
+            <!-- AI Commentary -->
+            <div class="card" style="margin-bottom: 2rem;">
+                <h3>AI Performance Attribution</h3>
+                <div id="aiNarrative" class="narrative-box"></div>
+            </div>
+            <!-- Holdings & Chart -->
+            <div class="report-grid">
+                <div class="card">
+                    <h3>Top Holdings</h3>
+                    <div class="holding-list" id="holdingsList"></div>
+                </div>
+                <div class="card">
+                    <h3>Sector Allocation</h3>
+                    <canvas id="allocationChart"></canvas>
+                </div>
+            </div>
+        </div>
+    </div>
+    <script>
+        async function downloadPDF() {
+            const element = document.getElementById('results');
+            const btn = element.querySelector('button');
+            // 1. Switch to PDF Mode
+            element.classList.add('pdf-mode');
+            if (btn) btn.style.display = 'none';
+            // 2. Force Chart to Black Text (No Animation)
+            if (myChart) {
+                myChart.options.plugins.legend.labels.color = '#000000';
+                myChart.options.scales = myChart.options.scales || {};
+                myChart.update('none');
+            }
+            // 3. WAIT for Canvas Repaint (The "Freeze" Strategy)
+            await new Promise(resolve => setTimeout(resolve, 500));
+            const opt = {
+                margin: 1,
+                filename: 'QuantScale_Institutional_Report.pdf',
+                image: { type: 'jpeg', quality: 0.98 },
+                html2canvas: { scale: 3, backgroundColor: '#ffffff', useCORS: true, letterRendering: true },
+                jsPDF: { unit: 'in', format: 'letter', orientation: 'portrait' }
+            };
+            // 4. Generate & Save
+            await html2pdf().set(opt).from(element).save();
+            // 5. Cleanup / Restore
+            element.classList.remove('pdf-mode');
+            if (btn) btn.style.display = 'inline-block';
+            // Restore Chart Colors
+            if (myChart) {
+                myChart.options.plugins.legend.labels.color = '#94a3b8';
+                // Revert to animation default or none?
+                // Using 'none' to snap back instantly.
+                myChart.update('none');
+            }
+        }
+        async function runOptimization() {
+            const input = document.getElementById('userInput').value;
+            const loader = document.getElementById('loader');
+            const results = document.getElementById('results');
+            // UI Reset
+            results.style.display = 'none';
+            loader.style.display = 'block';
+            // 1. Simple Intent Parsing (Client-Side for Demo Speed)
+            // 1. Simple Intent Parsing (Client-Side for Demo Speed)
+            const sectorKeywords = {
+                "Energy": ["energy", "oil", "gas"],
+                "Technology": ["technology", "tech", "software", "it"],
+                "Financials": ["financials", "finance", "banks"],
+                "Healthcare": ["healthcare", "health", "pharma"],
+                "Utilities": ["utilities", "utility"],
+                "Materials": ["materials", "mining"],
+                "Consumer Discretionary": ["consumer", "retail", "discretionary"], // Note: Amazon is here
+                "Real Estate": ["real estate", "reit"],
+                "Communication Services": ["communication", "media", "telecom"] // Google/Meta/Netflix
+            };
+            // Single Stock Mapping (Common FAANG+ names)
+            const stockKeywords = {
+                "AMZN": ["amazon"],
+                "AAPL": ["apple", "iphone"],
+                "MSFT": ["microsoft", "windows"],
+                "GOOGL": ["google", "alphabet"],
+                "META": ["meta", "facebook"],
+                "TSLA": ["tesla"],
+                "NVDA": ["nvidia", "chips"],
+                "NFLX": ["netflix"]
+            };
+            let excluded = [];
+            let excludedTickers = [];
+            const lowerInput = input.toLowerCase();
+            // Check Sectors
+            for (const [sector, keywords] of Object.entries(sectorKeywords)) {
+                if (keywords.some(k => lowerInput.includes(k))) {
+                    // Avoid double counting if user said "Amazon" (Consumer) but didn't mean the whole sector?
+                    // For now, standard inclusion
+                    excluded.push(sector);
+                }
+            }
+            // Check Tickers
+            for (const [ticker, keywords] of Object.entries(stockKeywords)) {
+                if (keywords.some(k => lowerInput.includes(k))) {
+                    excludedTickers.push(ticker);
+                }
+            }
+            // Default fallback if query is generic
+            if (excluded.length === 0 && excludedTickers.length === 0 && input.length > 5) {
+                // If user typed something but matched nothing, maybe assume No Exclusions for now or ask?
+                // For demo, we send "None" effectively.
+            }
+            const payload = {
+                "client_id": "Web_User",
+                "excluded_sectors": excluded,
+                "excluded_tickers": excludedTickers,
+                "initial_investment": 100000
+            };
+            try {
+                const response = await fetch('/optimize', {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' },
+                    body: JSON.stringify(payload)
+                });
+                const data = await response.json();
+                // Display Results
+                const allExclusions = [...excluded, ...excludedTickers];
+                displayData(data, allExclusions);
+                loader.style.display = 'none';
+                results.style.display = 'block';
+            } catch (error) {
+                alert("Optimization Failed: " + error);
+                loader.style.display = 'none';
+            }
+        }
+        function displayData(data, excluded) {
+            // Metrics
+            document.getElementById('teMetric').innerText = (data.tracking_error * 100).toFixed(4) + "%";
+            document.getElementById('excludedMetric').innerText = excluded.length > 0 ? excluded.join(", ") : "None";
+            // AI Text - Markdown clean
+            // Simple replace of **bold** with <b>
+            let narrative = data.attribution_narrative || "No commentary generated.";
+            narrative = narrative.replace(/\*\*(.*?)\*\*/g, '<b>$1</b>').replace(/\n/g, '<br>');
+            document.getElementById('aiNarrative').innerHTML = narrative;
+            // Holdings List (Top 10)
+            const listObj = document.getElementById('holdingsList');
+            listObj.innerHTML = '';
+            // Sort by weight
+            const sorted = Object.entries(data.allocations).sort((a, b) => b[1] - a[1]).slice(0, 15);
+            sorted.forEach(([ticker, weight]) => {
+                const div = document.createElement('div');
+                div.className = 'holding-item';
+                div.innerHTML = `<span>${ticker}</span><span>${(weight * 100).toFixed(2)}%</span>`;
+                listObj.appendChild(div);
+            });
+            // Chart
+            renderChart(data.allocations);
+        }
+        let myChart = null;
+        function renderChart(allocations) {
+            const ctx = document.getElementById('allocationChart').getContext('2d');
+            if (myChart) myChart.destroy();
+            // Simplification: In a real app we'd map Ticker -> Sector here
+            // For now, let's just show Top 5 Tickers vs "Others"
+            const sorted = Object.entries(allocations).sort((a, b) => b[1] - a[1]);
+            const top5 = sorted.slice(0, 5);
+            const others = sorted.slice(5).reduce((acc, curr) => acc + curr[1], 0);
+            const labels = top5.map(x => x[0]).concat(["Others"]);
+            const data = top5.map(x => x[1]).concat([others]);
+            myChart = new Chart(ctx, {
+                type: 'doughnut',
+                data: {
+                    labels: labels,
+                    datasets: [{
+                        data: data,
+                        backgroundColor: ['#3b82f6', '#10b981', '#f59e0b', '#ef4444', '#8b5cf6', '#475569'],
+                        borderWidth: 0
+                    }]
+                },
+                options: {
+                    responsive: true,
+                    plugins: {
+                        legend: { position: 'right', labels: { color: '#94a3b8' } }
+                    }
+                }
+            });
+        }
+    </script>
+</body>
+</html>

config.py ADDED Viewed

	@@ -0,0 +1,37 @@

+import os
+from typing import Optional
+from pydantic import BaseModel, Field, SecretStr
+class Settings(BaseModel):
+    """
+    Application Configuration.
+    Loads from environment variables via os.getenv.
+    """
+    # API Keys
+    HF_TOKEN: Optional[SecretStr] = Field(default_factory=lambda: SecretStr(os.getenv("HF_TOKEN", "")) if os.getenv("HF_TOKEN") else None, description="Hugging Face API Token")
+    # Data Configuration
+    DATA_CACHE_DIR: str = Field(default="./data_cache", description="Directory to store cached market data")
+    SECTOR_MAP_FILE: str = Field(default="./data/sector_map.json", description="Path to sector mapping cache")
+    # Optimization Defaults
+    MAX_WEIGHT: float = Field(default=0.05, description="Maximum weight for a single asset")
+    MIN_WEIGHT: float = Field(default=0.00, description="Minimum weight for a single asset")
+    # Universe
+    BENCHMARK_TICKER: str = Field(default="^GSPC", description="Benchmark Ticker (S&P 500)")
+    # System
+    LOG_LEVEL: str = Field(default="INFO", description="Logging level")
+# Global settings instance
+try:
+    settings = Settings()
+except Exception as e:
+    print(f"WARNING: Settings failed to load. Using defaults/env vars might be missing: {e}")
+    # Fallback to empty settings if possible or re-raise if critical
+    # For now, let's allow it to crash safely or provide a dummy
+    # But if HF_TOKEN is None, the AI feature will just fail gracefully later
+    settings = Settings(HF_TOKEN=None)

core/__pycache__/schema.cpython-311.pyc ADDED Viewed

Binary file (6.08 kB). View file

core/__pycache__/schema.cpython-39.pyc ADDED Viewed

Binary file (4.01 kB). View file

core/schema.py ADDED Viewed

	@@ -0,0 +1,97 @@

+from typing import List, Dict, Optional
+from pydantic import BaseModel, Field, validator
+import pandas as pd
+from datetime import date
+class TickerData(BaseModel):
+    """
+    Represents a single stock's metadata and price history.
+    """
+    symbol: str
+    sector: str
+    price_history: Dict[str, float] = Field(default_factory=dict, description="Date (ISO) -> Adj Close Price")
+    @property
+    def latest_price(self) -> float:
+        if not self.price_history:
+            return 0.0
+        # Sort by date key and get last value
+        return self.price_history[sorted(self.price_history.keys())[-1]]
+class OptimizationRequest(BaseModel):
+    """
+    User request for portfolio optimization.
+    """
+    client_id: str
+    initial_investment: float = 100000.0
+    excluded_sectors: List[str] = Field(default_factory=list, description="List of sectors to exclude (e.g., ['Energy'])")
+    excluded_tickers: List[str] = Field(default_factory=list, description="List of specific tickers to exclude (e.g., ['AMZN'])")
+    benchmark: str = "^GSPC"
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "client_id": "Demo_User_1",
+                "initial_investment": 100000.0,
+                "excluded_sectors": ["Energy"],
+                "excluded_tickers": ["AMZN"],
+                "benchmark": "^GSPC"
+            }
+        }
+class OptimizationResult(BaseModel):
+    """
+    Output of the optimization engine.
+    """
+    weights: Dict[str, float] = Field(..., description="Ticker -> Optimal Weight")
+    tracking_error: float
+    status: str
+    @validator('weights')
+    def validate_weights(cls, v):
+        # Filter out near-zero weights for cleanliness
+        return {k: val for k, val in v.items() if val > 0.0001}
+class TaxLot(BaseModel):
+    """
+    A specific purchase lot of a stock.
+    """
+    symbol: str
+    purchase_date: date
+    quantity: int
+    cost_basis_per_share: float
+    current_price: float
+    @property
+    def unrealized_pl(self) -> float:
+        return (self.current_price - self.cost_basis_per_share) * self.quantity
+    @property
+    def is_loss(self) -> bool:
+        return self.unrealized_pl < 0
+    @property
+    def loss_percentage(self) -> float:
+         if self.cost_basis_per_share == 0: return 0.0
+         return (self.current_price - self.cost_basis_per_share) / self.cost_basis_per_share
+class HarvestOpportunity(BaseModel):
+    """
+    A suggestion to harvest a tax loss.
+    """
+    sell_ticker: str
+    buy_proxy_ticker: str
+    quantity: int
+    estimated_loss_harvested: float
+    reason: str
+class AttributionReport(BaseModel):
+    """
+    Brinson Attribution Data.
+    """
+    allocation_effect: float
+    selection_effect: float
+    total_active_return: float
+    top_contributors: List[str]
+    top_detractors: List[str]
+    narrative: str

data/__pycache__/data_manager.cpython-311.pyc ADDED Viewed

Binary file (10.6 kB). View file

data/__pycache__/data_manager.cpython-39.pyc ADDED Viewed

Binary file (5.49 kB). View file

data/__pycache__/optimizer.cpython-39.pyc ADDED Viewed

Binary file (4.1 kB). View file

data/data_manager.py ADDED Viewed

	@@ -0,0 +1,152 @@

+import yfinance as yf
+import pandas as pd
+import numpy as np
+import json
+import os
+import logging
+from typing import List, Dict, Optional
+from core.schema import TickerData
+from config import settings
+logging.basicConfig(level=settings.LOG_LEVEL)
+logger = logging.getLogger(__name__)
+class SectorCache:
+    """
+    Manages a local cache of Ticker -> Sector mappings to avoid
+    yfinance API throttling and improve speed.
+    """
+    def __init__(self, cache_file: str = settings.SECTOR_MAP_FILE):
+        self.cache_file = cache_file
+        self.sector_map = self._load_cache()
+    def _load_cache(self) -> Dict[str, str]:
+        if os.path.exists(self.cache_file):
+            try:
+                with open(self.cache_file, 'r') as f:
+                    return json.load(f)
+            except Exception as e:
+                logger.error(f"Failed to load sector cache: {e}")
+                return {}
+        return {}
+    def save_cache(self):
+        os.makedirs(os.path.dirname(self.cache_file), exist_ok=True)
+        with open(self.cache_file, 'w') as f:
+            json.dump(self.sector_map, f, indent=2)
+    def get_sector(self, ticker: str) -> Optional[str]:
+        return self.sector_map.get(ticker)
+    def update_sector(self, ticker: str, sector: str):
+        self.sector_map[ticker] = sector
+class MarketDataEngine:
+    """
+    Handles robust data ingestion from diverse sources (Wikipedia, yfinance).
+    Implements data cleaning and validation policies.
+    """
+    def __init__(self):
+        self.sector_cache = SectorCache()
+    def fetch_sp500_tickers(self) -> List[str]:
+        """
+        Loads S&P 500 components from a static JSON file (Production Mode).
+        Eliminates dependency on Wikipedia scraping.
+        """
+        try:
+            universe_file = os.path.join(os.path.dirname(__file__), 'sp500_universe.json')
+            # If we happen to not have the file, use the fallback list
+            if not os.path.exists(universe_file):
+                logger.warning("Universe file not found. Using fallback.")
+                return self._get_fallback_tickers()
+            with open(universe_file, 'r') as f:
+                universe_data = json.load(f)
+            tickers = []
+            for item in universe_data:
+                ticker = item['ticker']
+                sector = item['sector']
+                tickers.append(ticker)
+                self.sector_cache.update_sector(ticker, sector)
+            self.sector_cache.save_cache()
+            logger.info(f"Successfully loaded {len(tickers)} tickers from static universe.")
+            return tickers
+        except Exception as e:
+            logger.error(f"Error loading universe: {e}")
+            return self._get_fallback_tickers()
+    def _get_fallback_tickers(self) -> List[str]:
+        # Fallback for Demo Reliability
+        fallback_map = {
+            "AAPL": "Information Technology", "MSFT": "Information Technology", "GOOGL": "Communication Services",
+            "AMZN": "Consumer Discretionary", "NVDA": "Information Technology", "META": "Communication Services",
+            "TSLA": "Consumer Discretionary", "BRK-B": "Financials", "V": "Financials", "UNH": "Health Care",
+            "XOM": "Energy", "JNJ": "Health Care", "JPM": "Financials", "PG": "Consumer Staples",
+            "LLY": "Health Care", "MA": "Financials", "CVX": "Energy", "MRK": "Health Care",
+            "HD": "Consumer Discretionary", "PEP": "Consumer Staples", "COST": "Consumer Staples"
+        }
+        for t, s in fallback_map.items():
+            self.sector_cache.update_sector(t, s)
+        return list(fallback_map.keys())
+    def fetch_market_data(self, tickers: List[str], start_date: str = "2023-01-01") -> pd.DataFrame:
+        """
+        Fetches adjusted close prices for a list of tickers.
+        """
+        if not tickers:
+            logger.warning("No tickers provided to fetch.")
+            return pd.DataFrame()
+        logger.info(f"Downloading data for {len(tickers)} tickers from {start_date}...")
+        # Use yfinance download with threads
+        # 'Close' is usually adjusted in newer versions or defaults
+        data = yf.download(tickers, start=start_date, progress=False)
+        if data.empty:
+            logger.error("No data fetched from yfinance.")
+            return pd.DataFrame()
+        # Handle MultiIndex (Price, Ticker)
+        if hasattr(data.columns, 'levels') and 'Close' in data.columns.levels[0]:
+            data = data['Close']
+        elif 'Close' in data.columns:
+             data = data['Close']
+        elif 'Adj Close' in data.columns:
+            data = data['Adj Close']
+        else:
+            # Fallback
+            logger.warning("Could not find Close/Adj Close. Using first level.")
+            data = data.iloc[:, :len(tickers)] # Risky but fallback
+        return self._clean_data(data)
+    def _clean_data(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Applies data quality rules:
+        1. Drop columns with > 10% missing data.
+        2. Forward fill then Backward fill remaining NaNs.
+        """
+        initial_count = len(df.columns)
+        # Rule 1: Drop > 10% missing
+        missing_frac = df.isnull().mean()
+        drop_cols = missing_frac[missing_frac > 0.10].index.tolist()
+        df_clean = df.drop(columns=drop_cols)
+        dropped_count = len(drop_cols)
+        if dropped_count > 0:
+            logger.warning(f"Dropped {dropped_count} tickers due to >10% missing data: {drop_cols[:5]}...")
+        # Rule 2: Fill NaNs
+        df_clean = df_clean.ffill().bfill()
+        logger.info(f"Data cleaning complete. Retained {len(df_clean.columns)}/{initial_count} tickers.")
+        return df_clean
+    def get_sector_map(self) -> Dict[str, str]:
+        return self.sector_cache.sector_map

data/optimizer.py ADDED Viewed

	@@ -0,0 +1,160 @@

+import cvxpy as cp
+import pandas as pd
+import numpy as np
+import logging
+from typing import List, Dict, Optional
+from core.schema import OptimizationResult
+from config import settings
+logger = logging.getLogger(__name__)
+class PortfolioOptimizer:
+    """
+    Quantitative Optimization Engine using CVXPY.
+    Objective: Minimize Tracking Error against a Benchmark.
+    Constraints:
+    1. Full Investment (Sum w = 1)
+    2. Long Only (w >= 0)
+    3. Sector Exclusions (w[excluded] = 0)
+    """
+    def __init__(self):
+        pass
+    def optimize_portfolio(self,
+                           covariance_matrix: pd.DataFrame,
+                           tickers: List[str],
+                           benchmark_weights: pd.DataFrame,
+                           sector_map: Dict[str, str],
+                           excluded_sectors: List[str],
+                           excluded_tickers: List[str] = None) -> OptimizationResult:
+        """
+        Solves the tracking error minimization problem.
+        Args:
+            covariance_matrix: (N x N) Ledoit-Wolf shrunk covariance matrix.
+            tickers: List of N tickers.
+            benchmark_weights: (N x 1) Weights of the benchmark (e.g. S&P 500).
+                               Un-held assets should have 0 weight.
+            sector_map: Dictionary mapping ticker -> sector.
+            excluded_sectors: List of sectors to exclude.
+            excluded_tickers: List of specific tickers to exclude.
+        Returns:
+            OptimizationResult containing weights and status.
+        """
+        excluded_tickers = excluded_tickers or []
+        n_assets = len(tickers)
+        if covariance_matrix.shape != (n_assets, n_assets):
+            raise ValueError(f"Covariance matrix shape {covariance_matrix.shape} does not match tickers count {n_assets}")
+        logger.info(f"Setting up CVXPY optimization for {n_assets} assets...")
+        # Variables
+        w = cp.Variable(n_assets)
+        # Benchmark Weights Vector (aligned to tickers)
+        if isinstance(benchmark_weights, (pd.Series, pd.DataFrame)):
+            w_b = benchmark_weights.reindex(tickers).fillna(0).values.flatten()
+        else:
+            w_b = np.array(benchmark_weights)
+        # Objective
+        active_weights = w - w_b
+        tracking_error_variance = cp.quad_form(active_weights, covariance_matrix.values)
+        objective = cp.Minimize(tracking_error_variance)
+        # 1. Identify Exclusions FIRST to adjust constraints
+        excluded_indices = []
+        mask_vector = np.zeros(n_assets)
+        # Sector Exclusions
+        if excluded_sectors:
+            logger.info(f"Applying Sector Exclusion Validation for: {excluded_sectors}")
+            for i, ticker in enumerate(tickers):
+                sector = sector_map.get(ticker, "Unknown")
+                for excl in excluded_sectors:
+                     if excl.lower() == sector.lower() or (excl == "Technology" and sector == "Information Technology"):
+                        excluded_indices.append(i)
+                        mask_vector[i] = 1
+        # Ticker Exclusions (NEW)
+        if excluded_tickers:
+            logger.info(f"Applying Ticker Exclusion Validation for: {excluded_tickers}")
+            for i, ticker in enumerate(tickers):
+                 if ticker in excluded_tickers:
+                    excluded_indices.append(i)
+                    mask_vector[i] = 1
+        excluded_indices = list(set(excluded_indices)) # Dedupe
+        logger.info(f"DEBUG: Excluded Mask Sum = {mask_vector.sum()} assets out of {n_assets}")
+        if len(excluded_indices) == n_assets:
+            raise ValueError("All assets excluded! Cannot optimize.")
+        # 2. Dynamic Constraints
+        n_active = n_assets - len(excluded_indices)
+        if n_active == 0: n_active = 1
+        min_avg_weight = 1.0 / n_active
+        dynamic_max = max(0.20, min_avg_weight * 1.5)
+        MAX_WEIGHT_LIMIT = dynamic_max
+        logger.info(f"DEBUG: Active Assets={n_active}, Min Avg={min_avg_weight:.4f}, Dynamic Max Limit={MAX_WEIGHT_LIMIT:.4f}")
+        constraints = [
+            cp.sum(w) == 1,
+            w >= 0,
+            w <= MAX_WEIGHT_LIMIT
+        ]
+        # Apply Exclusions
+        if excluded_indices:
+             constraints.append(w[excluded_indices] == 0)
+        # Problem
+        prob = cp.Problem(objective, constraints)
+        try:
+            logger.info("Solving quadratic programming problem...")
+            # verbose=True to see solver output in logs
+            prob.solve(verbose=True)
+        except Exception as e:
+            logger.error(f"Optimization CRASHED: {e}")
+            raise e
+        # CHECK SOLVER STATUS
+        if prob.status not in [cp.OPTIMAL, cp.OPTIMAL_INACCURATE]:
+            logger.error(f"Optimization FAILED with status: {prob.status}")
+            raise ValueError(f"Solver failed: {prob.status}")
+        # Extract weights
+        optimal_weights = w.value
+        if optimal_weights is None:
+             raise ValueError("Solver returned None for weights.")
+        # Add small tolerance cleanup
+        optimal_weights[optimal_weights < 1e-4] = 0
+        # Normalize just in case (solver precision)
+        # optimal_weights = optimal_weights / optimal_weights.sum()
+        # Format Result
+        weight_dict = {
+            tickers[i]: float(optimal_weights[i])
+            for i in range(n_assets)
+            if optimal_weights[i] > 0
+        }
+        # Calculate resulting Tracking Error (volatility of active returns)
+        # TE = sqrt(variance)
+        te = np.sqrt(prob.value) if prob.value > 0 else 0.0
+        logger.info(f"Optimization Solved. Tracking Error: {te:.4f}")
+        return OptimizationResult(
+            weights=weight_dict,
+            tracking_error=te,
+            status=prob.status
+        )

data/sector_map.json ADDED Viewed

	@@ -0,0 +1,505 @@

+{
+  "MMM": "Industrials",
+  "AOS": "Industrials",
+  "ABT": "Health Care",
+  "ABBV": "Health Care",
+  "ACN": "Information Technology",
+  "ADBE": "Information Technology",
+  "AMD": "Information Technology",
+  "AES": "Utilities",
+  "AFL": "Financials",
+  "A": "Health Care",
+  "APD": "Materials",
+  "ABNB": "Consumer Discretionary",
+  "AKAM": "Information Technology",
+  "ALB": "Materials",
+  "ARE": "Real Estate",
+  "ALGN": "Health Care",
+  "ALLE": "Industrials",
+  "LNT": "Utilities",
+  "ALL": "Financials",
+  "GOOGL": "Communication Services",
+  "GOOG": "Communication Services",
+  "MO": "Consumer Staples",
+  "AMZN": "Consumer Discretionary",
+  "AMCR": "Materials",
+  "AEE": "Utilities",
+  "AEP": "Utilities",
+  "AXP": "Financials",
+  "AIG": "Financials",
+  "AMT": "Real Estate",
+  "AWK": "Utilities",
+  "AMP": "Financials",
+  "AME": "Industrials",
+  "AMGN": "Health Care",
+  "APH": "Information Technology",
+  "ADI": "Information Technology",
+  "AON": "Financials",
+  "APA": "Energy",
+  "APO": "Financials",
+  "AAPL": "Information Technology",
+  "AMAT": "Information Technology",
+  "APP": "Information Technology",
+  "APTV": "Consumer Discretionary",
+  "ACGL": "Financials",
+  "ADM": "Consumer Staples",
+  "ARES": "Financials",
+  "ANET": "Information Technology",
+  "AJG": "Financials",
+  "AIZ": "Financials",
+  "T": "Communication Services",
+  "ATO": "Utilities",
+  "ADSK": "Information Technology",
+  "ADP": "Industrials",
+  "AZO": "Consumer Discretionary",
+  "AVB": "Real Estate",
+  "AVY": "Materials",
+  "AXON": "Industrials",
+  "BKR": "Energy",
+  "BALL": "Materials",
+  "BAC": "Financials",
+  "BAX": "Health Care",
+  "BDX": "Health Care",
+  "BRK-B": "Financials",
+  "BBY": "Consumer Discretionary",
+  "TECH": "Health Care",
+  "BIIB": "Health Care",
+  "BLK": "Financials",
+  "BX": "Financials",
+  "XYZ": "Financials",
+  "BK": "Financials",
+  "BA": "Industrials",
+  "BKNG": "Consumer Discretionary",
+  "BSX": "Health Care",
+  "BMY": "Health Care",
+  "AVGO": "Information Technology",
+  "BR": "Industrials",
+  "BRO": "Financials",
+  "BF-B": "Consumer Staples",
+  "BLDR": "Industrials",
+  "BG": "Consumer Staples",
+  "BXP": "Real Estate",
+  "CHRW": "Industrials",
+  "CDNS": "Information Technology",
+  "CPT": "Real Estate",
+  "CPB": "Consumer Staples",
+  "COF": "Financials",
+  "CAH": "Health Care",
+  "CCL": "Consumer Discretionary",
+  "CARR": "Industrials",
+  "CVNA": "Consumer Discretionary",
+  "CAT": "Industrials",
+  "CBOE": "Financials",
+  "CBRE": "Real Estate",
+  "CDW": "Information Technology",
+  "COR": "Health Care",
+  "CNC": "Health Care",
+  "CNP": "Utilities",
+  "CF": "Materials",
+  "CRL": "Health Care",
+  "SCHW": "Financials",
+  "CHTR": "Communication Services",
+  "CVX": "Energy",
+  "CMG": "Consumer Discretionary",
+  "CB": "Financials",
+  "CHD": "Consumer Staples",
+  "CI": "Health Care",
+  "CINF": "Financials",
+  "CTAS": "Industrials",
+  "CSCO": "Information Technology",
+  "C": "Financials",
+  "CFG": "Financials",
+  "CLX": "Consumer Staples",
+  "CME": "Financials",
+  "CMS": "Utilities",
+  "KO": "Consumer Staples",
+  "CTSH": "Information Technology",
+  "COIN": "Financials",
+  "CL": "Consumer Staples",
+  "CMCSA": "Communication Services",
+  "FIX": "Industrials",
+  "CAG": "Consumer Staples",
+  "COP": "Energy",
+  "ED": "Utilities",
+  "STZ": "Consumer Staples",
+  "CEG": "Utilities",
+  "COO": "Health Care",
+  "CPRT": "Industrials",
+  "GLW": "Information Technology",
+  "CPAY": "Financials",
+  "CTVA": "Materials",
+  "CSGP": "Real Estate",
+  "COST": "Consumer Staples",
+  "CTRA": "Energy",
+  "CRH": "Materials",
+  "CRWD": "Information Technology",
+  "CCI": "Real Estate",
+  "CSX": "Industrials",
+  "CMI": "Industrials",
+  "CVS": "Health Care",
+  "DHR": "Health Care",
+  "DRI": "Consumer Discretionary",
+  "DDOG": "Information Technology",
+  "DVA": "Health Care",
+  "DAY": "Industrials",
+  "DECK": "Consumer Discretionary",
+  "DE": "Industrials",
+  "DELL": "Information Technology",
+  "DAL": "Industrials",
+  "DVN": "Energy",
+  "DXCM": "Health Care",
+  "FANG": "Energy",
+  "DLR": "Real Estate",
+  "DG": "Consumer Staples",
+  "DLTR": "Consumer Staples",
+  "D": "Utilities",
+  "DPZ": "Consumer Discretionary",
+  "DASH": "Consumer Discretionary",
+  "DOV": "Industrials",
+  "DOW": "Materials",
+  "DHI": "Consumer Discretionary",
+  "DTE": "Utilities",
+  "DUK": "Utilities",
+  "DD": "Materials",
+  "ETN": "Industrials",
+  "EBAY": "Consumer Discretionary",
+  "ECL": "Materials",
+  "EIX": "Utilities",
+  "EW": "Health Care",
+  "EA": "Communication Services",
+  "ELV": "Health Care",
+  "EME": "Industrials",
+  "EMR": "Industrials",
+  "ETR": "Utilities",
+  "EOG": "Energy",
+  "EPAM": "Information Technology",
+  "EQT": "Energy",
+  "EFX": "Industrials",
+  "EQIX": "Real Estate",
+  "EQR": "Real Estate",
+  "ERIE": "Financials",
+  "ESS": "Real Estate",
+  "EL": "Consumer Staples",
+  "EG": "Financials",
+  "EVRG": "Utilities",
+  "ES": "Utilities",
+  "EXC": "Utilities",
+  "EXE": "Energy",
+  "EXPE": "Consumer Discretionary",
+  "EXPD": "Industrials",
+  "EXR": "Real Estate",
+  "XOM": "Energy",
+  "FFIV": "Information Technology",
+  "FDS": "Financials",
+  "FICO": "Information Technology",
+  "FAST": "Industrials",
+  "FRT": "Real Estate",
+  "FDX": "Industrials",
+  "FIS": "Financials",
+  "FITB": "Financials",
+  "FSLR": "Information Technology",
+  "FE": "Utilities",
+  "FISV": "Financials",
+  "F": "Consumer Discretionary",
+  "FTNT": "Information Technology",
+  "FTV": "Industrials",
+  "FOXA": "Communication Services",
+  "FOX": "Communication Services",
+  "BEN": "Financials",
+  "FCX": "Materials",
+  "GRMN": "Consumer Discretionary",
+  "IT": "Information Technology",
+  "GE": "Industrials",
+  "GEHC": "Health Care",
+  "GEV": "Industrials",
+  "GEN": "Information Technology",
+  "GNRC": "Industrials",
+  "GD": "Industrials",
+  "GIS": "Consumer Staples",
+  "GM": "Consumer Discretionary",
+  "GPC": "Consumer Discretionary",
+  "GILD": "Health Care",
+  "GPN": "Financials",
+  "GL": "Financials",
+  "GDDY": "Information Technology",
+  "GS": "Financials",
+  "HAL": "Energy",
+  "HIG": "Financials",
+  "HAS": "Consumer Discretionary",
+  "HCA": "Health Care",
+  "DOC": "Real Estate",
+  "HSIC": "Health Care",
+  "HSY": "Consumer Staples",
+  "HPE": "Information Technology",
+  "HLT": "Consumer Discretionary",
+  "HOLX": "Health Care",
+  "HD": "Consumer Discretionary",
+  "HON": "Industrials",
+  "HRL": "Consumer Staples",
+  "HST": "Real Estate",
+  "HWM": "Industrials",
+  "HPQ": "Information Technology",
+  "HUBB": "Industrials",
+  "HUM": "Health Care",
+  "HBAN": "Financials",
+  "HII": "Industrials",
+  "IBM": "Information Technology",
+  "IEX": "Industrials",
+  "IDXX": "Health Care",
+  "ITW": "Industrials",
+  "INCY": "Health Care",
+  "IR": "Industrials",
+  "PODD": "Health Care",
+  "INTC": "Information Technology",
+  "IBKR": "Financials",
+  "ICE": "Financials",
+  "IFF": "Materials",
+  "IP": "Materials",
+  "INTU": "Information Technology",
+  "ISRG": "Health Care",
+  "IVZ": "Financials",
+  "INVH": "Real Estate",
+  "IQV": "Health Care",
+  "IRM": "Real Estate",
+  "JBHT": "Industrials",
+  "JBL": "Information Technology",
+  "JKHY": "Financials",
+  "J": "Industrials",
+  "JNJ": "Health Care",
+  "JCI": "Industrials",
+  "JPM": "Financials",
+  "KVUE": "Consumer Staples",
+  "KDP": "Consumer Staples",
+  "KEY": "Financials",
+  "KEYS": "Information Technology",
+  "KMB": "Consumer Staples",
+  "KIM": "Real Estate",
+  "KMI": "Energy",
+  "KKR": "Financials",
+  "KLAC": "Information Technology",
+  "KHC": "Consumer Staples",
+  "KR": "Consumer Staples",
+  "LHX": "Industrials",
+  "LH": "Health Care",
+  "LRCX": "Information Technology",
+  "LW": "Consumer Staples",
+  "LVS": "Consumer Discretionary",
+  "LDOS": "Industrials",
+  "LEN": "Consumer Discretionary",
+  "LII": "Industrials",
+  "LLY": "Health Care",
+  "LIN": "Materials",
+  "LYV": "Communication Services",
+  "LMT": "Industrials",
+  "L": "Financials",
+  "LOW": "Consumer Discretionary",
+  "LULU": "Consumer Discretionary",
+  "LYB": "Materials",
+  "MTB": "Financials",
+  "MPC": "Energy",
+  "MAR": "Consumer Discretionary",
+  "MRSH": "Financials",
+  "MLM": "Materials",
+  "MAS": "Industrials",
+  "MA": "Financials",
+  "MTCH": "Communication Services",
+  "MKC": "Consumer Staples",
+  "MCD": "Consumer Discretionary",
+  "MCK": "Health Care",
+  "MDT": "Health Care",
+  "MRK": "Health Care",
+  "META": "Communication Services",
+  "MET": "Financials",
+  "MTD": "Health Care",
+  "MGM": "Consumer Discretionary",
+  "MCHP": "Information Technology",
+  "MU": "Information Technology",
+  "MSFT": "Information Technology",
+  "MAA": "Real Estate",
+  "MRNA": "Health Care",
+  "MOH": "Health Care",
+  "TAP": "Consumer Staples",
+  "MDLZ": "Consumer Staples",
+  "MPWR": "Information Technology",
+  "MNST": "Consumer Staples",
+  "MCO": "Financials",
+  "MS": "Financials",
+  "MOS": "Materials",
+  "MSI": "Information Technology",
+  "MSCI": "Financials",
+  "NDAQ": "Financials",
+  "NTAP": "Information Technology",
+  "NFLX": "Communication Services",
+  "NEM": "Materials",
+  "NWSA": "Communication Services",
+  "NWS": "Communication Services",
+  "NEE": "Utilities",
+  "NKE": "Consumer Discretionary",
+  "NI": "Utilities",
+  "NDSN": "Industrials",
+  "NSC": "Industrials",
+  "NTRS": "Financials",
+  "NOC": "Industrials",
+  "NCLH": "Consumer Discretionary",
+  "NRG": "Utilities",
+  "NUE": "Materials",
+  "NVDA": "Information Technology",
+  "NVR": "Consumer Discretionary",
+  "NXPI": "Information Technology",
+  "ORLY": "Consumer Discretionary",
+  "OXY": "Energy",
+  "ODFL": "Industrials",
+  "OMC": "Communication Services",
+  "ON": "Information Technology",
+  "OKE": "Energy",
+  "ORCL": "Information Technology",
+  "OTIS": "Industrials",
+  "PCAR": "Industrials",
+  "PKG": "Materials",
+  "PLTR": "Information Technology",
+  "PANW": "Information Technology",
+  "PSKY": "Communication Services",
+  "PH": "Industrials",
+  "PAYX": "Industrials",
+  "PAYC": "Industrials",
+  "PYPL": "Financials",
+  "PNR": "Industrials",
+  "PEP": "Consumer Staples",
+  "PFE": "Health Care",
+  "PCG": "Utilities",
+  "PM": "Consumer Staples",
+  "PSX": "Energy",
+  "PNW": "Utilities",
+  "PNC": "Financials",
+  "POOL": "Consumer Discretionary",
+  "PPG": "Materials",
+  "PPL": "Utilities",
+  "PFG": "Financials",
+  "PG": "Consumer Staples",
+  "PGR": "Financials",
+  "PLD": "Real Estate",
+  "PRU": "Financials",
+  "PEG": "Utilities",
+  "PTC": "Information Technology",
+  "PSA": "Real Estate",
+  "PHM": "Consumer Discretionary",
+  "PWR": "Industrials",
+  "QCOM": "Information Technology",
+  "DGX": "Health Care",
+  "Q": "Information Technology",
+  "RL": "Consumer Discretionary",
+  "RJF": "Financials",
+  "RTX": "Industrials",
+  "O": "Real Estate",
+  "REG": "Real Estate",
+  "REGN": "Health Care",
+  "RF": "Financials",
+  "RSG": "Industrials",
+  "RMD": "Health Care",
+  "RVTY": "Health Care",
+  "HOOD": "Financials",
+  "ROK": "Industrials",
+  "ROL": "Industrials",
+  "ROP": "Information Technology",
+  "ROST": "Consumer Discretionary",
+  "RCL": "Consumer Discretionary",
+  "SPGI": "Financials",
+  "CRM": "Information Technology",
+  "SNDK": "Information Technology",
+  "SBAC": "Real Estate",
+  "SLB": "Energy",
+  "STX": "Information Technology",
+  "SRE": "Utilities",
+  "NOW": "Information Technology",
+  "SHW": "Materials",
+  "SPG": "Real Estate",
+  "SWKS": "Information Technology",
+  "SJM": "Consumer Staples",
+  "SW": "Materials",
+  "SNA": "Industrials",
+  "SOLV": "Health Care",
+  "SO": "Utilities",
+  "LUV": "Industrials",
+  "SWK": "Industrials",
+  "SBUX": "Consumer Discretionary",
+  "STT": "Financials",
+  "STLD": "Materials",
+  "STE": "Health Care",
+  "SYK": "Health Care",
+  "SMCI": "Information Technology",
+  "SYF": "Financials",
+  "SNPS": "Information Technology",
+  "SYY": "Consumer Staples",
+  "TMUS": "Communication Services",
+  "TROW": "Financials",
+  "TTWO": "Communication Services",
+  "TPR": "Consumer Discretionary",
+  "TRGP": "Energy",
+  "TGT": "Consumer Staples",
+  "TEL": "Information Technology",
+  "TDY": "Information Technology",
+  "TER": "Information Technology",
+  "TSLA": "Consumer Discretionary",
+  "TXN": "Information Technology",
+  "TPL": "Energy",
+  "TXT": "Industrials",
+  "TMO": "Health Care",
+  "TJX": "Consumer Discretionary",
+  "TKO": "Communication Services",
+  "TTD": "Communication Services",
+  "TSCO": "Consumer Discretionary",
+  "TT": "Industrials",
+  "TDG": "Industrials",
+  "TRV": "Financials",
+  "TRMB": "Information Technology",
+  "TFC": "Financials",
+  "TYL": "Information Technology",
+  "TSN": "Consumer Staples",
+  "USB": "Financials",
+  "UBER": "Industrials",
+  "UDR": "Real Estate",
+  "ULTA": "Consumer Discretionary",
+  "UNP": "Industrials",
+  "UAL": "Industrials",
+  "UPS": "Industrials",
+  "URI": "Industrials",
+  "UNH": "Health Care",
+  "UHS": "Health Care",
+  "VLO": "Energy",
+  "VTR": "Real Estate",
+  "VLTO": "Industrials",
+  "VRSN": "Information Technology",
+  "VRSK": "Industrials",
+  "VZ": "Communication Services",
+  "VRTX": "Health Care",
+  "VTRS": "Health Care",
+  "VICI": "Real Estate",
+  "V": "Financials",
+  "VST": "Utilities",
+  "VMC": "Materials",
+  "WRB": "Financials",
+  "GWW": "Industrials",
+  "WAB": "Industrials",
+  "WMT": "Consumer Staples",
+  "DIS": "Communication Services",
+  "WBD": "Communication Services",
+  "WM": "Industrials",
+  "WAT": "Health Care",
+  "WEC": "Utilities",
+  "WFC": "Financials",
+  "WELL": "Real Estate",
+  "WST": "Health Care",
+  "WDC": "Information Technology",
+  "WY": "Real Estate",
+  "WSM": "Consumer Discretionary",
+  "WMB": "Energy",
+  "WTW": "Financials",
+  "WDAY": "Information Technology",
+  "WYNN": "Consumer Discretionary",
+  "XEL": "Utilities",
+  "XYL": "Industrials",
+  "YUM": "Consumer Discretionary",
+  "ZBRA": "Information Technology",
+  "ZBH": "Health Care",
+  "ZTS": "Health Care"
+}

data/sp500_universe.json ADDED Viewed

	@@ -0,0 +1,266 @@

+[
+    {
+        "ticker": "AAPL",
+        "sector": "Information Technology"
+    },
+    {
+        "ticker": "MSFT",
+        "sector": "Information Technology"
+    },
+    {
+        "ticker": "NVDA",
+        "sector": "Information Technology"
+    },
+    {
+        "ticker": "AVGO",
+        "sector": "Information Technology"
+    },
+    {
+        "ticker": "ADBE",
+        "sector": "Information Technology"
+    },
+    {
+        "ticker": "CRM",
+        "sector": "Information Technology"
+    },
+    {
+        "ticker": "CSCO",
+        "sector": "Information Technology"
+    },
+    {
+        "ticker": "AMD",
+        "sector": "Information Technology"
+    },
+    {
+        "ticker": "INTC",
+        "sector": "Information Technology"
+    },
+    {
+        "ticker": "AMZN",
+        "sector": "Consumer Discretionary"
+    },
+    {
+        "ticker": "TSLA",
+        "sector": "Consumer Discretionary"
+    },
+    {
+        "ticker": "HD",
+        "sector": "Consumer Discretionary"
+    },
+    {
+        "ticker": "MCD",
+        "sector": "Consumer Discretionary"
+    },
+    {
+        "ticker": "NKE",
+        "sector": "Consumer Discretionary"
+    },
+    {
+        "ticker": "LOW",
+        "sector": "Consumer Discretionary"
+    },
+    {
+        "ticker": "SBUX",
+        "sector": "Consumer Discretionary"
+    },
+    {
+        "ticker": "GOOGL",
+        "sector": "Communication Services"
+    },
+    {
+        "ticker": "GOOG",
+        "sector": "Communication Services"
+    },
+    {
+        "ticker": "META",
+        "sector": "Communication Services"
+    },
+    {
+        "ticker": "NFLX",
+        "sector": "Communication Services"
+    },
+    {
+        "ticker": "DIS",
+        "sector": "Communication Services"
+    },
+    {
+        "ticker": "CMCSA",
+        "sector": "Communication Services"
+    },
+    {
+        "ticker": "VZ",
+        "sector": "Communication Services"
+    },
+    {
+        "ticker": "T",
+        "sector": "Communication Services"
+    },
+    {
+        "ticker": "BRK-B",
+        "sector": "Financials"
+    },
+    {
+        "ticker": "JPM",
+        "sector": "Financials"
+    },
+    {
+        "ticker": "V",
+        "sector": "Financials"
+    },
+    {
+        "ticker": "MA",
+        "sector": "Financials"
+    },
+    {
+        "ticker": "BAC",
+        "sector": "Financials"
+    },
+    {
+        "ticker": "WFC",
+        "sector": "Financials"
+    },
+    {
+        "ticker": "MS",
+        "sector": "Financials"
+    },
+    {
+        "ticker": "GS",
+        "sector": "Financials"
+    },
+    {
+        "ticker": "BLK",
+        "sector": "Financials"
+    },
+    {
+        "ticker": "UNH",
+        "sector": "Health Care"
+    },
+    {
+        "ticker": "LLY",
+        "sector": "Health Care"
+    },
+    {
+        "ticker": "JNJ",
+        "sector": "Health Care"
+    },
+    {
+        "ticker": "MRK",
+        "sector": "Health Care"
+    },
+    {
+        "ticker": "ABBV",
+        "sector": "Health Care"
+    },
+    {
+        "ticker": "PFE",
+        "sector": "Health Care"
+    },
+    {
+        "ticker": "AMGN",
+        "sector": "Health Care"
+    },
+    {
+        "ticker": "TMO",
+        "sector": "Health Care"
+    },
+    {
+        "ticker": "PG",
+        "sector": "Consumer Staples"
+    },
+    {
+        "ticker": "COST",
+        "sector": "Consumer Staples"
+    },
+    {
+        "ticker": "PEP",
+        "sector": "Consumer Staples"
+    },
+    {
+        "ticker": "KO",
+        "sector": "Consumer Staples"
+    },
+    {
+        "ticker": "WMT",
+        "sector": "Consumer Staples"
+    },
+    {
+        "ticker": "PM",
+        "sector": "Consumer Staples"
+    },
+    {
+        "ticker": "XOM",
+        "sector": "Energy"
+    },
+    {
+        "ticker": "CVX",
+        "sector": "Energy"
+    },
+    {
+        "ticker": "COP",
+        "sector": "Energy"
+    },
+    {
+        "ticker": "SLB",
+        "sector": "Energy"
+    },
+    {
+        "ticker": "EOG",
+        "sector": "Energy"
+    },
+    {
+        "ticker": "MPC",
+        "sector": "Energy"
+    },
+    {
+        "ticker": "LIN",
+        "sector": "Materials"
+    },
+    {
+        "ticker": "SHW",
+        "sector": "Materials"
+    },
+    {
+        "ticker": "FCX",
+        "sector": "Materials"
+    },
+    {
+        "ticker": "CAT",
+        "sector": "Industrials"
+    },
+    {
+        "ticker": "UNP",
+        "sector": "Industrials"
+    },
+    {
+        "ticker": "GE",
+        "sector": "Industrials"
+    },
+    {
+        "ticker": "HON",
+        "sector": "Industrials"
+    },
+    {
+        "ticker": "NEE",
+        "sector": "Utilities"
+    },
+    {
+        "ticker": "DUK",
+        "sector": "Utilities"
+    },
+    {
+        "ticker": "SO",
+        "sector": "Utilities"
+    },
+    {
+        "ticker": "PLD",
+        "sector": "Real Estate"
+    },
+    {
+        "ticker": "AMT",
+        "sector": "Real Estate"
+    },
+    {
+        "ticker": "EQIX",
+        "sector": "Real Estate"
+    }
+]

debug_optimizer_tech.py ADDED Viewed

	@@ -0,0 +1,82 @@

+import pandas as pd
+import numpy as np
+import logging
+from data.optimizer import PortfolioOptimizer
+from core.schema import OptimizationResult
+# Config Logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def test_optimizer_exclusion():
+    print("\n--- STARTING OPTIMIZER DEBUG TEST ---\n")
+    # 1. Mock Data Setup (Mini S&P 500)
+    tickers = ["AAPL", "MSFT", "GOOGL", "XOM", "CVX", "JPM", "BAC", "JNJ", "PFE", "NEE"]
+    n = len(tickers)
+    # Sector Map (Tech, Energy, Financials, Healthcare, Utilities)
+    sector_map = {
+        "AAPL": "Information Technology",
+        "MSFT": "Information Technology",
+        "GOOGL": "Communication Services", # Often grouped with Tech
+        "XOM": "Energy",
+        "CVX": "Energy",
+        "JPM": "Financials",
+        "BAC": "Financials",
+        "JNJ": "Health Care",
+        "PFE": "Health Care",
+        "NEE": "Utilities"
+    }
+    # Mock Covariance (Identity for simplicity, slight correlation)
+    np.random.seed(42)
+    cov_data = np.eye(n) * 0.0004 # Low variance
+    cov_df = pd.DataFrame(cov_data, index=tickers, columns=tickers)
+    # Benchmark weights (Equal weight benchmark for test)
+    bench_weights = pd.Series(np.ones(n)/n, index=tickers)
+    # 2. Instantiate Optimizer
+    opt = PortfolioOptimizer()
+    # 3. Test Cases
+    # Case A: Normal
+    print("\n[Case A] No Exclusions")
+    res_a = opt.optimize_portfolio(cov_df, tickers, bench_weights, sector_map, [])
+    print(f"Status: {res_a.status}, TE: {res_a.tracking_error:.4f}")
+    # Case B: Exclude Energy (2 stocks)
+    print("\n[Case B] Exclude Energy")
+    res_b = opt.optimize_portfolio(cov_df, tickers, bench_weights, sector_map, ["Energy"])
+    print(f"Status: {res_b.status}, TE: {res_b.tracking_error:.4f}")
+    print(f"Weights: {res_b.weights}")
+    assert "XOM" not in res_b.weights
+    assert "CVX" not in res_b.weights
+    # Case C: Exclude Tech (Heavyweights AAPL, MSFT) -> This usually breaks tight constraints!
+    print("\n[Case C] Exclude Technology (The Failure Case)")
+    try:
+        # Note: sector_map uses "Information Technology", so we pass "Technology" and ensure the loop handles it
+        # Or we act like the frontend and pass the mapped name?
+        # The frontend usually sends "Technology".
+        # But wait, my optimizer code line 91 checks: if excl == sector or ...
+        # My fixed code handles "Technology" == "Information Technology".
+        # Let's pass "Technology"
+        res_c = opt.optimize_portfolio(cov_df, tickers, bench_weights, sector_map, ["Technology"])
+        print(f"Status: {res_c.status}, TE: {res_c.tracking_error:.4f}")
+        print(f"Weights: {res_c.weights}")
+        # Verification
+        if "AAPL" in res_c.weights or "MSFT" in res_c.weights:
+            print("❌ FAILURE: Tech stocks still in portfolio!")
+        else:
+            print("✅ SUCCESS: Tech stocks removed!")
+    except Exception as e:
+        print(f"❌ CRASHED: {e}")
+if __name__ == "__main__":
+    test_optimizer_exclusion()

debug_yf.py ADDED Viewed

	@@ -0,0 +1,9 @@

+import yfinance as yf
+tickers = ["AAPL", "MSFT"]
+data = yf.download(tickers, start="2024-01-01", progress=False)
+print("Columns:", data.columns)
+try:
+    print(data['Adj Close'].head())
+except Exception as e:
+    print("Error accessing Adj Close:", e)
+    print(data.head())

main.py ADDED Viewed

	@@ -0,0 +1,147 @@

+import logging
+import pandas as pd
+from typing import Dict, Any
+from config import settings
+from data.data_manager import MarketDataEngine
+from analytics.risk_model import RiskModel
+from data.optimizer import PortfolioOptimizer
+from analytics.tax_module import TaxEngine
+from analytics.attribution import AttributionEngine
+from ai.ai_reporter import AIReporter
+from core.schema import OptimizationRequest, TickerData
+# Setup Logging
+logging.basicConfig(level=settings.LOG_LEVEL, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+logger = logging.getLogger("QuantScaleAI")
+class QuantScaleSystem:
+    def __init__(self):
+        self.data_engine = MarketDataEngine()
+        self.risk_model = RiskModel()
+        self.optimizer = PortfolioOptimizer()
+        self.tax_engine = TaxEngine()
+        self.attribution_engine = AttributionEngine()
+        self.ai_reporter = AIReporter()
+    def run_pipeline(self, request: OptimizationRequest):
+        logger.info(f"Starting pipeline for Client {request.client_id}...")
+        # 1. Fetch Universe (S&P 500)
+        tickers = self.data_engine.fetch_sp500_tickers()
+        # Limit for demo speed if needed, but let's try full
+        # tickers = tickers[:50]
+        # 2. Get Market Data
+        # Fetch last 2 years for covariance
+        data = self.data_engine.fetch_market_data(tickers, start_date="2023-01-01")
+        if data.empty:
+            logger.error("No market data available. Aborting.")
+            return None
+        returns = data.pct_change().dropna()
+        # 3. Compute Risk Model
+        # Ensure we align returns and tickers
+        valid_tickers = returns.columns.tolist()
+        cov_matrix = self.risk_model.compute_covariance_matrix(returns)
+        # 4. Get Benchmark Data (S&P 500)
+        # Fetch benchmark to calculate weights used for Tracking Error
+        # Simplification: Assume Market Cap weights or Equal weights for the benchmark
+        # since getting live weights is hard without expensive data.
+        # We will assume Equal Weights for the Benchmark in this demo logic
+        # or use a proxy.
+        # BETTER: Use SPY returns as the benchmark returns series for optimization.
+        # For the optimizer, we need "Benchmark Weights" if we want to minimize active weight variance.
+        # If we just map to S&P 500, let's assume valid_tickers ARE the index.
+        # 4. Get Benchmark Data (S&P 500)
+        # Fetch benchmark to calculate weights used for Tracking Error
+        # REALISTIC PROXY: S&P 500 is Market Cap Weighted.
+        # We manually assign Top 10 weights to make Tracking Error realistic when checking exclusions.
+        n_assets = len(valid_tickers)
+        benchmark_weights = pd.Series(0.0, index=valid_tickers)
+        # Approximate weights (Feb 2026-ish Reality)
+        # Total Market Cap heavily skewed to Mag 7
+        top_weights = {
+            "MSFT": 0.070, "AAPL": 0.065, "NVDA": 0.060,
+            "AMZN": 0.035, "GOOGL": 0.020, "GOOG": 0.020,
+            "META": 0.020, "TSLA": 0.015, "BRK-B": 0.015,
+            "LLY": 0.012, "AVGO": 0.012, "JPM": 0.010
+        }
+        current_total = 0.0
+        for t, w in top_weights.items():
+            if t in valid_tickers:
+                benchmark_weights[t] = w
+                current_total += w
+        # Distribute remaining weight equally among rest
+        remaining_weight = 1.0 - current_total
+        remaining_count = n_assets - len([t for t in top_weights if t in valid_tickers])
+        if remaining_count > 0:
+            avg_rest = remaining_weight / remaining_count
+            for t in valid_tickers:
+                if benchmark_weights[t] == 0.0:
+                    benchmark_weights[t] = avg_rest
+        # Normalize just in case
+        benchmark_weights = benchmark_weights / benchmark_weights.sum()
+        # 5. Optimize Portfolio
+        sector_map = self.data_engine.get_sector_map()
+        opt_result = self.optimizer.optimize_portfolio(
+            covariance_matrix=cov_matrix,
+            tickers=valid_tickers,
+            benchmark_weights=benchmark_weights,
+            sector_map=sector_map,
+            excluded_sectors=request.excluded_sectors,
+            excluded_tickers=request.excluded_tickers
+        )
+        if opt_result.status != "optimal":
+            logger.warning("Optimization might be suboptimal.")
+        # 6. Attribution Analysis (Simulated Performance)
+        # We need "performance" loop.
+        # Let's calculate return over the LAST MONTH for attribution
+        last_month = returns.iloc[-21:]
+        asset_period_return = (1 + last_month).prod() - 1
+        attribution = self.attribution_engine.generate_attribution_report(
+            portfolio_weights=opt_result.weights,
+            benchmark_weights=benchmark_weights.to_dict(),
+            asset_returns=asset_period_return,
+            sector_map=sector_map
+        )
+        # 7. AI Reporting
+        # Combine exclusions for the narrative
+        exclusions_list = request.excluded_sectors + request.excluded_tickers
+        excluded = ", ".join(exclusions_list) if exclusions_list else "None"
+        commentary = self.ai_reporter.generate_report(attribution, excluded)
+        return {
+            "optimization": opt_result,
+            "attribution": attribution,
+            "commentary": commentary
+        }
+if __name__ == "__main__":
+    # Test Run
+    req = OptimizationRequest(
+        client_id="TEST_001",
+        excluded_sectors=["Energy"] # Typical ESG constraint
+    )
+    system = QuantScaleSystem()
+    result = system.run_pipeline(req)
+    if result:
+        print("\n--- AI COMMENTARY ---\n")
+        print(result['commentary'])

qes_scale_optimizer.ipynb ADDED Viewed

	@@ -0,0 +1,93 @@

+{
+    "cells": [
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "# QuantScale AI: Automated Direct Indexing & Attribution\n",
+                "## Goldman Sachs Quant Prep Project\n",
+                "\n",
+                "This notebook demonstrates the end-to-end workflow:\n",
+                "1. **Data Ingestion**: Scraping S&P 500 & fetching market data.\n",
+                "2. **Risk Modeling**: Computing Ledoit-Wolf Shrinkage Covariance.\n",
+                "3. **Optimization**: Minimizing Tracking Error with Sector Exclusion Constraints.\n",
+                "4. **AI Reporting**: Using Hugging Face to generate professional commentary."
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "!pip install -r requirements.txt"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "from main import QuantScaleSystem\n",
+                "from core.schema import OptimizationRequest\n",
+                "import matplotlib.pyplot as plt\n",
+                "\n",
+                "# Initialize System\n",
+                "system = QuantScaleSystem()\n",
+                "\n",
+                "# Test Case: Optimization with Energy Exclusion\n",
+                "req = OptimizationRequest(client_id=\"COLAB_USER\", excluded_sectors=[\"Energy\"])\n",
+                "result = system.run_pipeline(req)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "# Visualization of Weights\n",
+                "if result:\n",
+                "    weights = result['optimization'].weights\n",
+                "    plt.figure(figsize=(12, 6))\n",
+                "    plt.bar(range(len(weights)), list(weights.values()), align='center')\n",
+                "    plt.title('Optimized Portfolio Weights (Energy Excluded)')\n",
+                "    plt.xlabel('Assets')\n",
+                "    plt.ylabel('Weight')\n",
+                "    plt.show()"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "# AI Commentary\n",
+                "print(result['commentary'])"
+            ]
+        }
+    ],
+    "metadata": {
+        "kernelspec": {
+            "display_name": "Python 3",
+            "language": "python",
+            "name": "python3"
+        },
+        "language_info": {
+            "codemirror_mode": {
+                "name": "ipython",
+                "version": 3
+            },
+            "file_extension": ".py",
+            "mimetype": "text/x-python",
+            "name": "python",
+            "nbconvert_exporter": "python",
+            "pygments_lexer": "ipython3",
+            "version": "3.10.12"
+        }
+    },
+    "nbformat": 4,
+    "nbformat_minor": 2
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+cvxpy>=1.4.1
+yfinance>=0.2.33
+google-generativeai>=0.3.2
+pandas>=2.1.4
+numpy>=1.26.3
+scikit-learn>=1.3.2
+fastapi>=0.109.0
+uvicorn>=0.27.0
+pydantic>=2.5.3
+pydantic>=2.5.3
+python-dotenv>=1.0.0
+matplotlib>=3.8.2
+scipy>=1.11.4
+huggingface_hub>=0.20.0