Spaces:

redradios
/

aurora-brain

Sleeping

App Files Files Community

redradios commited on Apr 9

Commit

ff8c8cb

1 Parent(s): 110e08f

Aurora Brain v1.0 — Regime Detector + Feature Engine + API

Browse files

Files changed (8) hide show

Dockerfile +23 -0
README.md +75 -11
app.py +214 -0
download_data.py +436 -0
feature_engine.py +511 -0
regime_detector.py +286 -0
regime_labeler.py +217 -0
requirements.txt +33 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,23 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Dependencias del sistema
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    gcc g++ && rm -rf /var/lib/apt/lists/*
+# Dependencias Python
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Código
+COPY . .
+# Crear directorios de datos y modelos
+RUN mkdir -p data models
+# Puerto de HuggingFace Spaces
+EXPOSE 7860
+# Arrancar API
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,11 +1,75 @@
----
-title: Aurora Brain
-emoji: 👀
-colorFrom: blue
-colorTo: yellow
-sdk: docker
-pinned: false
-license: other
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🧠 Aurora Brain — ML Trading Intelligence
+Sistema de Machine Learning para detección de régimen de mercado y predicción de señales de trading.
+Parte del proyecto **Aurora Trader Bot** (Cerebro Guardián).
+## Arquitectura
+```
+CAPA 1: Detector de Régimen (XGBoost)
+  → Clasifica: TRENDING / RANGING / VOLATILE / BREAKOUT
+CAPA 2: Modelos Especialistas (TFT + XGBoost)
+  → Un modelo por régimen, entrenado solo con datos de ese régimen
+CAPA 3: Feature Engineering (200+ features)
+  → Microestructura + Momentum + Volumen + Cross-asset + On-chain + Sentimiento
+```
+## Uso rápido
+```bash
+# 1. Descargar datos históricos (BTC/ETH/SOL, 5 años, 4H)
+python download_data.py --symbols BTCUSDT ETHUSDT SOLUSDT --days 1825
+# 2. Generar features (200+ por vela)
+python feature_engine.py --all
+# 3. Etiquetar regímenes
+python regime_labeler.py --all
+# 4. Entrenar detector de régimen
+python regime_detector.py --symbol BTCUSDT
+# 5. Levantar API
+python app.py
+```
+## API Endpoints
+| Endpoint | Método | Descripción |
+|----------|--------|-------------|
+| `/health` | GET | Estado del servicio |
+| `/regime` | POST | Régimen actual del mercado |
+| `/predict` | POST | Predicción completa (régimen + señal) |
+| `/models` | GET | Info de modelos cargados |
+## Pipeline completo
+```
+Binance API → download_data.py → data/*.parquet
+                                      ↓
+                              feature_engine.py → data/features_*.parquet
+                                      ↓
+                              regime_labeler.py → data/labeled_*.parquet
+                                      ↓
+                              regime_detector.py → models/regime_model.pkl
+                                      ↓
+                                    app.py → API REST (/regime, /predict)
+                                      ↓
+                              Raspberry Pi 5 (aurora_brain.py) → Strategy Runner
+```
+## Stack
+- **Feature Engine:** pandas + pandas_ta + numpy
+- **Detector Régimen:** XGBoost (scikit-learn)
+- **Modelos Especialistas:** TFT (Darts/PyTorch) — Fase 3+
+- **API:** FastAPI + uvicorn
+- **Datos:** Binance API + yfinance + CoinGecko
+- **Infraestructura:** HuggingFace Space (GPU para entrenamiento, CPU para inferencia)
+## Autor
+Eduardo (Mendoza, Argentina) + Claude Opus (Anthropic)
+Proyecto Aurora Trader Bot — v5.3

app.py ADDED Viewed

	@@ -0,0 +1,214 @@

+"""
+╔══════════════════════════════════════════════════════════════╗
+║  AURORA BRAIN — API Server (HuggingFace Space)               ║
+║                                                              ║
+║  FastAPI server que expone las predicciones del Brain.        ║
+║  Endpoints:                                                  ║
+║    GET  /health          — Estado del servicio               ║
+║    POST /regime          — Régimen actual del mercado        ║
+║    POST /predict         — Predicción completa               ║
+║    GET  /models          — Info de modelos cargados          ║
+║                                                              ║
+║  La Pi llama a estos endpoints cada 15 minutos.              ║
+╚══════════════════════════════════════════════════════════════╝
+"""
+import os
+import json
+import logging
+import pickle
+import time
+from datetime import datetime, timezone
+from typing import Optional
+import numpy as np
+import pandas as pd
+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+logger = logging.getLogger("AuroraBrain.API")
+app = FastAPI(
+    title="Aurora Brain API",
+    description="ML Trading Intelligence — Régimen + Predicciones",
+    version="1.0.0",
+)
+MODELS_DIR = os.path.join(os.path.dirname(__file__), "models")
+DATA_DIR = os.path.join(os.path.dirname(__file__), "data")
+# Modelos cargados en memoria
+_regime_model = None
+_regime_metadata = None
+_startup_time = None
+# ─────────────────────────────────────────────
+#  STARTUP
+# ─────────────────────────────────────────────
+@app.on_event("startup")
+async def load_models():
+    global _regime_model, _regime_metadata, _startup_time
+    _startup_time = datetime.now(timezone.utc)
+    # Cargar modelo de régimen
+    model_path = os.path.join(MODELS_DIR, "regime_model.pkl")
+    meta_path = os.path.join(MODELS_DIR, "regime_metadata.json")
+    if os.path.exists(model_path):
+        with open(model_path, "rb") as f:
+            _regime_model = pickle.load(f)
+        logger.info("✅ Modelo de régimen cargado")
+    if os.path.exists(meta_path):
+        with open(meta_path, "r") as f:
+            _regime_metadata = json.load(f)
+        logger.info("✅ Metadata de régimen cargada (accuracy: %.1f%%)",
+                     _regime_metadata.get("accuracy", 0))
+    logger.info("🧠 Aurora Brain API lista")
+# ─────────────────────────────────────────────
+#  SCHEMAS
+# ─────────────────────────────────────────────
+class RegimeRequest(BaseModel):
+    symbol: str = "BTCUSDT"
+    timeframe: str = "4h"
+class PredictRequest(BaseModel):
+    symbol: str = "BTCUSDT"
+    timeframe: str = "4h"
+class RegimeResponse(BaseModel):
+    regime: str
+    regime_id: int
+    confidence: float
+    probabilities: dict
+    model_accuracy: float
+    timestamp: str
+class HealthResponse(BaseModel):
+    status: str
+    uptime_seconds: float
+    models_loaded: dict
+    version: str
+# ─────────────────────────────────────────────
+#  ENDPOINTS
+# ─────────────────────────────────────────────
+@app.get("/health", response_model=HealthResponse)
+async def health():
+    uptime = (datetime.now(timezone.utc) - _startup_time).total_seconds() if _startup_time else 0
+    return HealthResponse(
+        status="ok",
+        uptime_seconds=round(uptime, 0),
+        models_loaded={
+            "regime_detector": _regime_model is not None,
+            "regime_accuracy": _regime_metadata.get("accuracy", 0) if _regime_metadata else 0,
+        },
+        version="1.0.0",
+    )
+@app.post("/regime", response_model=RegimeResponse)
+async def get_regime(req: RegimeRequest):
+    """Retorna el régimen actual del mercado."""
+    if _regime_model is None or _regime_metadata is None:
+        raise HTTPException(status_code=503, detail="Modelo de régimen no cargado")
+    # Cargar features más recientes
+    features_path = os.path.join(DATA_DIR, f"features_{req.symbol}_{req.timeframe}.parquet")
+    if not os.path.exists(features_path):
+        raise HTTPException(status_code=404,
+                            detail=f"Features no encontradas para {req.symbol} {req.timeframe}")
+    df = pd.read_parquet(features_path)
+    feature_cols = _regime_metadata["feature_cols"]
+    # Última fila
+    last_row = df[feature_cols].iloc[-1:].fillna(df[feature_cols].median())
+    # Predecir
+    proba = _regime_model.predict_proba(last_row)[0]
+    regime_id = int(np.argmax(proba))
+    REGIME_NAMES = {0: "TRENDING", 1: "RANGING", 2: "VOLATILE", 3: "BREAKOUT"}
+    return RegimeResponse(
+        regime=REGIME_NAMES[regime_id],
+        regime_id=regime_id,
+        confidence=round(float(proba[regime_id]), 4),
+        probabilities={REGIME_NAMES[i]: round(float(p), 4) for i, p in enumerate(proba)},
+        model_accuracy=_regime_metadata.get("accuracy", 0),
+        timestamp=datetime.now(timezone.utc).isoformat(),
+    )
+@app.post("/predict")
+async def predict(req: PredictRequest):
+    """
+    Predicción completa: régimen + señal del modelo especialista.
+    Por ahora solo retorna régimen. Los modelos especialistas se agregan en Fases 3-6.
+    """
+    # Obtener régimen
+    regime_req = RegimeRequest(symbol=req.symbol, timeframe=req.timeframe)
+    try:
+        regime = await get_regime(regime_req)
+    except HTTPException:
+        regime = None
+    result = {
+        "symbol": req.symbol,
+        "timeframe": req.timeframe,
+        "timestamp": datetime.now(timezone.utc).isoformat(),
+        "regime": regime.dict() if regime else None,
+        "signal": None,  # Fase 3+: modelo especialista por régimen
+        "recommendation": "HOLD",  # Default hasta que los modelos estén listos
+    }
+    # Lógica de recomendación basada en régimen
+    if regime:
+        if regime.regime == "VOLATILE" and regime.confidence > 0.7:
+            result["recommendation"] = "SHIELD_MAX"
+            result["action"] = "Activar Guardian Shield modo máximo — NO comprar"
+        elif regime.regime == "TRENDING" and regime.confidence > 0.7:
+            result["recommendation"] = "SMC_ACTIVE"
+            result["action"] = "Habilitar estrategias SMC — mercado en tendencia"
+        elif regime.regime == "RANGING" and regime.confidence > 0.7:
+            result["recommendation"] = "GRID_SUGGEST"
+            result["action"] = "Considerar grid trading — mercado lateral"
+        elif regime.regime == "BREAKOUT" and regime.confidence > 0.7:
+            result["recommendation"] = "ALERT"
+            result["action"] = "Breakout detectado — evaluar entrada rápida"
+    return result
+@app.get("/models")
+async def models_info():
+    """Info de los modelos cargados."""
+    info = {
+        "regime_detector": None,
+    }
+    if _regime_metadata:
+        info["regime_detector"] = {
+            "trained_at": _regime_metadata.get("trained_at"),
+            "accuracy": _regime_metadata.get("accuracy"),
+            "cv_accuracy": _regime_metadata.get("cv_accuracy"),
+            "n_features": _regime_metadata.get("n_features"),
+            "top_features": _regime_metadata.get("top_features", [])[:10],
+        }
+    return info
+# ─────────────────────────────────────────────
+#  MAIN (para desarrollo local)
+# ─────────────────────────────────────────────
+if __name__ == "__main__":
+    import uvicorn
+    port = int(os.environ.get("PORT", 7860))
+    uvicorn.run(app, host="0.0.0.0", port=port)

download_data.py ADDED Viewed

	@@ -0,0 +1,436 @@

+"""
+╔══════════════════════════════════════════════════════════════╗
+║  AURORA BRAIN — Download Data                                ║
+║                                                              ║
+║  Descarga klines históricas de Binance para ML.              ║
+║  Soporta múltiples pares y timeframes.                       ║
+║  Guarda en formato Parquet para eficiencia.                  ║
+║                                                              ║
+║  Uso:                                                        ║
+║    python download_data.py                                   ║
+║    python download_data.py --symbols BTCUSDT ETHUSDT         ║
+║    python download_data.py --days 1800 --timeframe 1h        ║
+╚══════════════════════════════════════════════════════════════╝
+"""
+import os
+import time
+import argparse
+import logging
+from datetime import datetime, timedelta, timezone
+import pandas as pd
+import requests
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+logger = logging.getLogger("AuroraBrain.Data")
+# ─────────────────────────────────────────────
+#  CONFIGURACIÓN
+# ─────────────────────────────────────────────
+BINANCE_BASE = "https://api.binance.com"
+KLINES_ENDPOINT = "/api/v3/klines"
+DEFAULT_SYMBOLS = ["BTCUSDT", "ETHUSDT", "SOLUSDT"]
+DEFAULT_TIMEFRAME = "4h"
+DEFAULT_DAYS = 1825  # ~5 años
+MAX_CANDLES_PER_REQUEST = 1000
+DATA_DIR = os.path.join(os.path.dirname(__file__), "data")
+# Mapeo de timeframe a milisegundos por vela
+TF_MS = {
+    "1m": 60_000, "3m": 180_000, "5m": 300_000, "15m": 900_000,
+    "30m": 1_800_000, "1h": 3_600_000, "2h": 7_200_000,
+    "4h": 14_400_000, "6h": 21_600_000, "8h": 28_800_000,
+    "12h": 43_200_000, "1d": 86_400_000, "1w": 604_800_000,
+}
+# ─────────────────────────────────────────────
+#  DESCARGA DE KLINES
+# ─────────────────────────────────────────────
+def download_klines(symbol: str, timeframe: str, days: int) -> pd.DataFrame:
+    """
+    Descarga klines históricas de Binance con paginación automática.
+    Retorna DataFrame con columnas OHLCV + extras.
+    """
+    tf_ms = TF_MS.get(timeframe, 14_400_000)
+    end_ts = int(datetime.now(timezone.utc).timestamp() * 1000)
+    start_ts = end_ts - (days * 86_400_000)
+    all_candles = []
+    current_start = start_ts
+    page = 0
+    logger.info("📥 Descargando %s %s — %d días (%s → %s)",
+                symbol, timeframe, days,
+                datetime.fromtimestamp(start_ts / 1000, tz=timezone.utc).strftime("%Y-%m-%d"),
+                datetime.fromtimestamp(end_ts / 1000, tz=timezone.utc).strftime("%Y-%m-%d"))
+    while current_start < end_ts:
+        params = {
+            "symbol": symbol,
+            "interval": timeframe,
+            "startTime": current_start,
+            "endTime": end_ts,
+            "limit": MAX_CANDLES_PER_REQUEST,
+        }
+        try:
+            resp = requests.get(f"{BINANCE_BASE}{KLINES_ENDPOINT}", params=params, timeout=30)
+            resp.raise_for_status()
+            data = resp.json()
+        except Exception as e:
+            logger.error("❌ Error descargando %s página %d: %s", symbol, page, e)
+            time.sleep(5)
+            continue
+        if not data:
+            break
+        all_candles.extend(data)
+        page += 1
+        # Avanzar al siguiente batch
+        last_ts = data[-1][0]
+        current_start = last_ts + tf_ms
+        if page % 10 == 0:
+            logger.info("  📊 %s: %d velas descargadas...", symbol, len(all_candles))
+        # Rate limit: max 1200 requests/min en Binance
+        time.sleep(0.1)
+    if not all_candles:
+        logger.warning("⚠️ No se obtuvieron datos para %s", symbol)
+        return pd.DataFrame()
+    # Parsear a DataFrame
+    df = pd.DataFrame(all_candles, columns=[
+        "open_time", "open", "high", "low", "close", "volume",
+        "close_time", "quote_volume", "trades", "taker_buy_base",
+        "taker_buy_quote", "ignore",
+    ])
+    # Tipos
+    for col in ["open", "high", "low", "close", "volume", "quote_volume",
+                "taker_buy_base", "taker_buy_quote"]:
+        df[col] = pd.to_numeric(df[col], errors="coerce")
+    df["trades"] = df["trades"].astype(int)
+    df["open_time"] = pd.to_datetime(df["open_time"], unit="ms", utc=True)
+    df["close_time"] = pd.to_datetime(df["close_time"], unit="ms", utc=True)
+    # Índice temporal
+    df.set_index("open_time", inplace=True)
+    df.drop(columns=["ignore"], inplace=True)
+    # Remover duplicados
+    df = df[~df.index.duplicated(keep="first")]
+    df.sort_index(inplace=True)
+    # Agregar columnas derivadas útiles
+    df["taker_buy_ratio"] = df["taker_buy_quote"] / df["quote_volume"].replace(0, 1)
+    df["symbol"] = symbol
+    logger.info("✅ %s: %d velas descargadas (%s → %s)",
+                symbol, len(df),
+                df.index[0].strftime("%Y-%m-%d"),
+                df.index[-1].strftime("%Y-%m-%d"))
+    return df
+def download_funding_rate(symbol: str, days: int) -> pd.DataFrame:
+    """Descarga funding rate histórico de Binance Futures."""
+    end_ts = int(datetime.now(timezone.utc).timestamp() * 1000)
+    start_ts = end_ts - (days * 86_400_000)
+    all_data = []
+    current_start = start_ts
+    logger.info("📥 Descargando funding rate %s...", symbol)
+    while current_start < end_ts:
+        params = {
+            "symbol": symbol,
+            "startTime": current_start,
+            "endTime": end_ts,
+            "limit": 1000,
+        }
+        try:
+            resp = requests.get(
+                "https://fapi.binance.com/fapi/v1/fundingRate",
+                params=params, timeout=30,
+            )
+            resp.raise_for_status()
+            data = resp.json()
+        except Exception as e:
+            logger.error("❌ Funding rate error: %s", e)
+            time.sleep(2)
+            continue
+        if not data:
+            break
+        all_data.extend(data)
+        current_start = data[-1]["fundingTime"] + 1
+        time.sleep(0.2)
+    if not all_data:
+        return pd.DataFrame()
+    df = pd.DataFrame(all_data)
+    df["fundingRate"] = pd.to_numeric(df["fundingRate"], errors="coerce")
+    df["fundingTime"] = pd.to_datetime(df["fundingTime"], unit="ms", utc=True)
+    df.set_index("fundingTime", inplace=True)
+    df = df[~df.index.duplicated(keep="first")]
+    df.sort_index(inplace=True)
+    logger.info("✅ Funding rate %s: %d registros", symbol, len(df))
+    return df[["fundingRate"]]
+def download_long_short_ratio(symbol: str, days: int) -> pd.DataFrame:
+    """Descarga Long/Short ratio de Binance Futures."""
+    end_ts = int(datetime.now(timezone.utc).timestamp() * 1000)
+    start_ts = end_ts - (days * 86_400_000)
+    all_data = []
+    current_start = start_ts
+    logger.info("📥 Descargando L/S ratio %s...", symbol)
+    while current_start < end_ts:
+        params = {
+            "symbol": symbol,
+            "period": "4h",
+            "startTime": current_start,
+            "endTime": end_ts,
+            "limit": 500,
+        }
+        try:
+            resp = requests.get(
+                "https://fapi.binance.com/futures/data/globalLongShortAccountRatio",
+                params=params, timeout=30,
+            )
+            resp.raise_for_status()
+            data = resp.json()
+        except Exception as e:
+            logger.error("❌ L/S ratio error: %s", e)
+            time.sleep(2)
+            continue
+        if not data:
+            break
+        all_data.extend(data)
+        current_start = data[-1]["timestamp"] + 1
+        time.sleep(0.3)
+    if not all_data:
+        return pd.DataFrame()
+    df = pd.DataFrame(all_data)
+    df["longShortRatio"] = pd.to_numeric(df["longShortRatio"], errors="coerce")
+    df["longAccount"] = pd.to_numeric(df["longAccount"], errors="coerce")
+    df["shortAccount"] = pd.to_numeric(df["shortAccount"], errors="coerce")
+    df["timestamp"] = pd.to_datetime(df["timestamp"], unit="ms", utc=True)
+    df.set_index("timestamp", inplace=True)
+    df.sort_index(inplace=True)
+    logger.info("✅ L/S ratio %s: %d registros", symbol, len(df))
+    return df
+def download_open_interest(symbol: str, days: int) -> pd.DataFrame:
+    """Descarga Open Interest histórico de Binance Futures."""
+    end_ts = int(datetime.now(timezone.utc).timestamp() * 1000)
+    start_ts = end_ts - (days * 86_400_000)
+    all_data = []
+    current_start = start_ts
+    logger.info("📥 Descargando OI %s...", symbol)
+    while current_start < end_ts:
+        params = {
+            "symbol": symbol,
+            "period": "4h",
+            "startTime": current_start,
+            "endTime": end_ts,
+            "limit": 500,
+        }
+        try:
+            resp = requests.get(
+                "https://fapi.binance.com/futures/data/openInterestHist",
+                params=params, timeout=30,
+            )
+            resp.raise_for_status()
+            data = resp.json()
+        except Exception as e:
+            logger.error("❌ OI error: %s", e)
+            time.sleep(2)
+            continue
+        if not data:
+            break
+        all_data.extend(data)
+        current_start = data[-1]["timestamp"] + 1
+        time.sleep(0.3)
+    if not all_data:
+        return pd.DataFrame()
+    df = pd.DataFrame(all_data)
+    df["sumOpenInterest"] = pd.to_numeric(df["sumOpenInterest"], errors="coerce")
+    df["sumOpenInterestValue"] = pd.to_numeric(df["sumOpenInterestValue"], errors="coerce")
+    df["timestamp"] = pd.to_datetime(df["timestamp"], unit="ms", utc=True)
+    df.set_index("timestamp", inplace=True)
+    df.sort_index(inplace=True)
+    logger.info("✅ OI %s: %d registros", symbol, len(df))
+    return df
+def download_macro_data(days: int) -> pd.DataFrame:
+    """Descarga datos macro via yfinance: DXY, S&P500, Gold, VIX."""
+    try:
+        import yfinance as yf
+    except ImportError:
+        logger.warning("⚠️ yfinance no disponible — saltando datos macro")
+        return pd.DataFrame()
+    tickers = {
+        "DXY": "DX-Y.NYB",
+        "SPX": "^GSPC",
+        "GOLD": "GC=F",
+        "VIX": "^VIX",
+        "US10Y": "^TNX",
+    }
+    start_date = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d")
+    frames = {}
+    for name, ticker in tickers.items():
+        try:
+            logger.info("📥 Descargando %s (%s)...", name, ticker)
+            data = yf.download(ticker, start=start_date, interval="1d",
+                               progress=False, auto_adjust=True)
+            if not data.empty:
+                frames[name] = data["Close"].rename(f"macro_{name.lower()}")
+        except Exception as e:
+            logger.warning("⚠️ Error descargando %s: %s", name, e)
+    if not frames:
+        return pd.DataFrame()
+    df = pd.concat(frames.values(), axis=1)
+    df.index = pd.to_datetime(df.index, utc=True)
+    df.sort_index(inplace=True)
+    df = df.ffill()
+    logger.info("✅ Macro data: %d días, %d indicadores", len(df), len(df.columns))
+    return df
+def download_fear_greed(days: int) -> pd.DataFrame:
+    """Descarga Fear & Greed Index histórico."""
+    try:
+        resp = requests.get(
+            f"https://api.alternative.me/fng/?limit={days}&format=json",
+            timeout=15,
+        )
+        resp.raise_for_status()
+        data = resp.json().get("data", [])
+    except Exception as e:
+        logger.warning("⚠️ F&G error: %s", e)
+        return pd.DataFrame()
+    if not data:
+        return pd.DataFrame()
+    df = pd.DataFrame(data)
+    df["value"] = pd.to_numeric(df["value"], errors="coerce")
+    df["timestamp"] = pd.to_datetime(df["timestamp"].astype(int), unit="s", utc=True)
+    df.set_index("timestamp", inplace=True)
+    df.sort_index(inplace=True)
+    df = df.rename(columns={"value": "fear_greed"})
+    logger.info("✅ Fear & Greed: %d días", len(df))
+    return df[["fear_greed"]]
+# ─────────────────────────────────────────────
+#  MAIN
+# ─────────────────────────────────────────────
+def main():
+    parser = argparse.ArgumentParser(description="Aurora Brain — Download Data")
+    parser.add_argument("--symbols", nargs="+", default=DEFAULT_SYMBOLS,
+                        help="Pares a descargar (default: BTCUSDT ETHUSDT SOLUSDT)")
+    parser.add_argument("--timeframe", default=DEFAULT_TIMEFRAME,
+                        help="Timeframe (default: 4h)")
+    parser.add_argument("--days", type=int, default=DEFAULT_DAYS,
+                        help="Días de historia (default: 1825 = ~5 años)")
+    parser.add_argument("--no-derivatives", action="store_true",
+                        help="Saltar datos de futuros (funding, OI, L/S)")
+    parser.add_argument("--no-macro", action="store_true",
+                        help="Saltar datos macro (DXY, S&P, Gold, VIX)")
+    args = parser.parse_args()
+    os.makedirs(DATA_DIR, exist_ok=True)
+    # ── 1. Klines spot ──
+    for symbol in args.symbols:
+        df = download_klines(symbol, args.timeframe, args.days)
+        if not df.empty:
+            path = os.path.join(DATA_DIR, f"klines_{symbol}_{args.timeframe}.parquet")
+            df.to_parquet(path)
+            logger.info("💾 Guardado: %s (%d filas)", path, len(df))
+    # ── 2. Datos de derivados (futuros) ──
+    if not args.no_derivatives:
+        for symbol in args.symbols:
+            # Funding rate
+            df_fr = download_funding_rate(symbol, args.days)
+            if not df_fr.empty:
+                path = os.path.join(DATA_DIR, f"funding_{symbol}.parquet")
+                df_fr.to_parquet(path)
+            # Long/Short ratio
+            df_ls = download_long_short_ratio(symbol, args.days)
+            if not df_ls.empty:
+                path = os.path.join(DATA_DIR, f"longshort_{symbol}.parquet")
+                df_ls.to_parquet(path)
+            # Open Interest
+            df_oi = download_open_interest(symbol, args.days)
+            if not df_oi.empty:
+                path = os.path.join(DATA_DIR, f"oi_{symbol}.parquet")
+                df_oi.to_parquet(path)
+    # ── 3. Datos macro ──
+    if not args.no_macro:
+        df_macro = download_macro_data(args.days)
+        if not df_macro.empty:
+            path = os.path.join(DATA_DIR, "macro.parquet")
+            df_macro.to_parquet(path)
+        df_fg = download_fear_greed(min(args.days, 3650))
+        if not df_fg.empty:
+            path = os.path.join(DATA_DIR, "fear_greed.parquet")
+            df_fg.to_parquet(path)
+    # ── Resumen ──
+    logger.info("\n" + "=" * 60)
+    logger.info("📦 DESCARGA COMPLETA")
+    logger.info("=" * 60)
+    for f in sorted(os.listdir(DATA_DIR)):
+        if f.endswith(".parquet"):
+            size_mb = os.path.getsize(os.path.join(DATA_DIR, f)) / 1_048_576
+            logger.info("  📄 %s (%.1f MB)", f, size_mb)
+if __name__ == "__main__":
+    main()

feature_engine.py ADDED Viewed

	@@ -0,0 +1,511 @@

+"""
+╔══════════════════════════════════════════════════════════════╗
+║  AURORA BRAIN — Feature Engine                               ║
+║                                                              ║
+║  Genera 200+ features por vela a partir de datos OHLCV +     ║
+║  derivados + macro + sentimiento.                            ║
+║                                                              ║
+║  Categorías:                                                 ║
+║    A. Microestructura de precio     (~50 features)           ║
+║    B. Momentum y tendencia          (~40 features)           ║
+║    C. Volumen y flujo               (~30 features)           ║
+║    D. Cross-asset intelligence      (~30 features)           ║
+║    E. On-chain y derivados          (~30 features)           ║
+║    F. Sentimiento y macro           (~20 features)           ║
+║                                                              ║
+║  Uso:                                                        ║
+║    python feature_engine.py                                  ║
+║    python feature_engine.py --symbol BTCUSDT --timeframe 4h  ║
+╚══════════════════════════════════════════════════════════════╝
+"""
+import os
+import argparse
+import logging
+import warnings
+import numpy as np
+import pandas as pd
+import pandas_ta as ta
+warnings.filterwarnings("ignore", category=FutureWarning)
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+logger = logging.getLogger("AuroraBrain.Features")
+DATA_DIR = os.path.join(os.path.dirname(__file__), "data")
+# ═══════════════════════════════════════════════════════════
+#  A. MICROESTRUCTURA DE PRECIO (~50 features)
+# ═══════════════════════════════════════════════════════════
+def features_microestructura(df: pd.DataFrame) -> pd.DataFrame:
+    """Features derivados de la estructura interna de las velas."""
+    o, h, l, c = df["open"], df["high"], df["low"], df["close"]
+    rng = (h - l).replace(0, np.nan)
+    # Ratios de vela
+    df["f_body_ratio"] = (c - o).abs() / rng                   # Fuerza: 1=todo cuerpo, 0=todo mecha
+    df["f_upper_wick_ratio"] = (h - pd.concat([c, o], axis=1).max(axis=1)) / rng  # Rechazo arriba
+    df["f_lower_wick_ratio"] = (pd.concat([c, o], axis=1).min(axis=1) - l) / rng  # Rechazo abajo
+    df["f_price_position"] = (c - l) / rng                      # Dónde cerró: 1=máximo, 0=mínimo
+    df["f_is_bull"] = (c > o).astype(int)
+    df["f_is_doji"] = (df["f_body_ratio"] < 0.1).astype(int)
+    # Body size normalizado por ATR
+    atr14 = ta.atr(h, l, c, length=14)
+    df["f_body_atr_ratio"] = (c - o).abs() / atr14.replace(0, np.nan)
+    # Gap entre velas
+    df["f_gap_pct"] = (o - c.shift(1)) / c.shift(1) * 100
+    # Secuencias consecutivas
+    bull = (c > o).astype(int)
+    df["f_consec_bull"] = bull.groupby((bull != bull.shift()).cumsum()).cumcount() + 1
+    df["f_consec_bull"] = df["f_consec_bull"] * bull
+    bear = (c < o).astype(int)
+    df["f_consec_bear"] = bear.groupby((bear != bear.shift()).cumsum()).cumcount() + 1
+    df["f_consec_bear"] = df["f_consec_bear"] * bear
+    # Patrones de velas (booleanos)
+    df["f_engulfing_bull"] = ((c > o) & (c.shift(1) < o.shift(1)) &
+                               (c > o.shift(1)) & (o < c.shift(1))).astype(int)
+    df["f_engulfing_bear"] = ((c < o) & (c.shift(1) > o.shift(1)) &
+                               (c < o.shift(1)) & (o > c.shift(1))).astype(int)
+    df["f_hammer"] = ((df["f_lower_wick_ratio"] > 0.6) & (df["f_body_ratio"] < 0.3) &
+                       (df["f_upper_wick_ratio"] < 0.1)).astype(int)
+    df["f_shooting_star"] = ((df["f_upper_wick_ratio"] > 0.6) & (df["f_body_ratio"] < 0.3) &
+                              (df["f_lower_wick_ratio"] < 0.1)).astype(int)
+    # ATR y volatilidad
+    df["f_atr_14"] = atr14
+    df["f_atr_pct"] = atr14 / c * 100
+    df["f_atr_roc_5"] = atr14.pct_change(5) * 100      # Aceleración de volatilidad
+    df["f_atr_roc_14"] = atr14.pct_change(14) * 100
+    # Range vs promedio
+    df["f_range_ratio_20"] = rng / rng.rolling(20).mean()
+    # Cambio porcentual
+    for period in [1, 3, 5, 10, 20]:
+        df[f"f_return_{period}"] = c.pct_change(period) * 100
+    # High-Low range pct
+    df["f_hl_pct"] = rng / c * 100
+    # Distancia a máximos/mínimos recientes
+    df["f_dist_high_20"] = (c - h.rolling(20).max()) / c * 100
+    df["f_dist_low_20"] = (c - l.rolling(20).min()) / c * 100
+    # Candle pattern encoding: últimas 5 velas como patrón
+    # Codificamos cada vela como 0 (bear) o 1 (bull), generando un número 0-31
+    pattern = pd.Series(0, index=df.index, dtype=int)
+    for i in range(5):
+        pattern += bull.shift(i).fillna(0).astype(int) * (2 ** i)
+    df["f_candle_pattern_5"] = pattern
+    logger.info("  ✅ A. Microestructura: %d features", sum(1 for c in df.columns if c.startswith("f_")))
+    return df
+# ═══════════════════════════════════════════════════════════
+#  B. MOMENTUM Y TENDENCIA (~40 features)
+# ═══════════════════════════════════════════════════════════
+def features_momentum(df: pd.DataFrame) -> pd.DataFrame:
+    """Features de momentum, tendencia y osciladores."""
+    c, h, l = df["close"], df["high"], df["low"]
+    # RSI
+    df["f_rsi_14"] = ta.rsi(c, length=14)
+    df["f_rsi_7"] = ta.rsi(c, length=7)
+    df["f_rsi_roc_5"] = df["f_rsi_14"].diff(5)             # RSI rate-of-change
+    df["f_rsi_roc_14"] = df["f_rsi_14"].diff(14)
+    # RSI divergence (precio sube pero RSI baja = divergencia bajista)
+    df["f_rsi_price_div_14"] = df["f_return_14"] - df["f_rsi_roc_14"]
+    # MACD
+    macd = ta.macd(c, fast=12, slow=26, signal=9)
+    if macd is not None and len(macd.columns) >= 3:
+        df["f_macd_hist"] = macd.iloc[:, 2]                 # Histograma
+        df["f_macd_hist_roc"] = df["f_macd_hist"].diff(3)   # Aceleración MACD
+        df["f_macd_cross_bull"] = ((df["f_macd_hist"] > 0) &
+                                    (df["f_macd_hist"].shift(1) <= 0)).astype(int)
+        df["f_macd_cross_bear"] = ((df["f_macd_hist"] < 0) &
+                                    (df["f_macd_hist"].shift(1) >= 0)).astype(int)
+    # ADX
+    adx_data = ta.adx(h, l, c, length=14)
+    if adx_data is not None:
+        df["f_adx_14"] = adx_data.iloc[:, 0]
+        df["f_plus_di"] = adx_data.iloc[:, 1]
+        df["f_minus_di"] = adx_data.iloc[:, 2]
+        df["f_di_ratio"] = df["f_plus_di"] / df["f_minus_di"].replace(0, np.nan)
+    adx_28 = ta.adx(h, l, c, length=28)
+    if adx_28 is not None:
+        df["f_adx_28"] = adx_28.iloc[:, 0]
+    # EMAs
+    for length in [9, 21, 55, 200]:
+        ema = ta.ema(c, length=length)
+        df[f"f_ema_{length}"] = ema
+        df[f"f_dist_ema_{length}"] = (c - ema) / ema * 100    # Distancia % al EMA
+        df[f"f_ema_{length}_slope"] = ema.diff(5) / ema * 100  # Pendiente (ángulo)
+    # Orden de EMAs (1=alcista perfecto: 9>21>55>200)
+    df["f_ema_order_bull"] = (
+        (df["f_ema_9"] > df["f_ema_21"]) &
+        (df["f_ema_21"] > df["f_ema_55"]) &
+        (df["f_ema_55"] > df["f_ema_200"])
+    ).astype(int)
+    df["f_ema_order_bear"] = (
+        (df["f_ema_9"] < df["f_ema_21"]) &
+        (df["f_ema_21"] < df["f_ema_55"]) &
+        (df["f_ema_55"] < df["f_ema_200"])
+    ).astype(int)
+    # Regresión lineal — pendiente
+    for period in [10, 20, 50]:
+        slope = ta.linreg(c, length=period, tsf=False)
+        if slope is not None:
+            df[f"f_linreg_slope_{period}"] = slope
+    # Bollinger Bands
+    bb = ta.bbands(c, length=20, std=2)
+    if bb is not None and len(bb.columns) >= 3:
+        df["f_bb_upper"] = bb.iloc[:, 0]
+        df["f_bb_mid"] = bb.iloc[:, 1]
+        df["f_bb_lower"] = bb.iloc[:, 2]
+        df["f_bb_width"] = (df["f_bb_upper"] - df["f_bb_lower"]) / df["f_bb_mid"] * 100
+        df["f_bb_position"] = (c - df["f_bb_lower"]) / (df["f_bb_upper"] - df["f_bb_lower"]).replace(0, np.nan)
+    # Stochastic
+    stoch = ta.stoch(h, l, c, k=14, d=3)
+    if stoch is not None and len(stoch.columns) >= 2:
+        df["f_stoch_k"] = stoch.iloc[:, 0]
+        df["f_stoch_d"] = stoch.iloc[:, 1]
+    # Higher Highs / Higher Lows conteo (proxy de HH+HL para bias)
+    hh_count = 0
+    hl_count = 0
+    ll_count = 0
+    lh_count = 0
+    swing_h = h.rolling(10).max()
+    swing_l = l.rolling(10).min()
+    prev_sh = swing_h.shift(10)
+    prev_sl = swing_l.shift(10)
+    df["f_hh_count_20"] = ((swing_h > prev_sh).rolling(20).sum()).fillna(0)
+    df["f_hl_count_20"] = ((swing_l > prev_sl).rolling(20).sum()).fillna(0)
+    df["f_ll_count_20"] = ((swing_l < prev_sl).rolling(20).sum()).fillna(0)
+    df["f_lh_count_20"] = ((swing_h < prev_sh).rolling(20).sum()).fillna(0)
+    n_momentum = sum(1 for col in df.columns if col.startswith("f_") and
+                     any(x in col for x in ["rsi", "macd", "adx", "ema", "linreg", "bb", "stoch", "hh_", "hl_", "ll_", "lh_", "di_"]))
+    logger.info("  ✅ B. Momentum: ~%d features", n_momentum)
+    return df
+# ═══════════════════════════════════════════════════════════
+#  C. VOLUMEN Y FLUJO (~30 features)
+# ══════════════════════��════════════════════════════════════
+def features_volumen(df: pd.DataFrame) -> pd.DataFrame:
+    """Features de volumen, flujo de órdenes y liquidez."""
+    v = df["volume"]
+    c = df["close"]
+    qv = df.get("quote_volume", v * c)
+    tbr = df.get("taker_buy_ratio", pd.Series(0.5, index=df.index))
+    # Volumen relativo
+    df["f_vol_sma_20"] = v.rolling(20).mean()
+    df["f_vol_ratio_20"] = v / df["f_vol_sma_20"].replace(0, np.nan)
+    df["f_vol_ratio_5"] = v / v.rolling(5).mean().replace(0, np.nan)
+    # Aceleración de volumen
+    df["f_vol_accel"] = df["f_vol_ratio_20"].diff(3)
+    # Volume spike (>2x promedio)
+    df["f_vol_spike"] = (df["f_vol_ratio_20"] > 2.0).astype(int)
+    # OBV (On Balance Volume)
+    obv = ta.obv(c, v)
+    if obv is not None:
+        df["f_obv"] = obv
+        df["f_obv_slope_10"] = obv.diff(10) / obv.abs().replace(0, np.nan) * 100
+    # Volume-weighted price deviation
+    vwap_approx = (qv.rolling(20).sum()) / (v.rolling(20).sum().replace(0, np.nan))
+    df["f_vwap_dev"] = (c - vwap_approx) / vwap_approx * 100
+    # Taker buy ratio
+    df["f_tbr"] = tbr
+    df["f_tbr_sma_10"] = tbr.rolling(10).mean()
+    df["f_tbr_roc_5"] = tbr.diff(5)
+    # Quote volume rate-of-change
+    df["f_qvol_roc_5"] = qv.pct_change(5) * 100
+    # Trades count si disponible
+    if "trades" in df.columns:
+        trades = df["trades"]
+        df["f_trades_ratio_20"] = trades / trades.rolling(20).mean().replace(0, np.nan)
+        df["f_avg_trade_size"] = qv / trades.replace(0, np.nan)
+    # Volume profile proxy: ratio de volumen en mitad superior vs inferior del rango
+    mid_price = (df["high"] + df["low"]) / 2
+    df["f_vol_above_mid"] = ((c > mid_price) * v).rolling(20).sum()
+    df["f_vol_below_mid"] = ((c <= mid_price) * v).rolling(20).sum()
+    df["f_vol_balance"] = df["f_vol_above_mid"] / (df["f_vol_above_mid"] + df["f_vol_below_mid"]).replace(0, np.nan)
+    n_vol = sum(1 for col in df.columns if col.startswith("f_vol") or col.startswith("f_obv") or
+                col.startswith("f_vwap") or col.startswith("f_tbr") or col.startswith("f_qvol") or
+                col.startswith("f_trades") or col.startswith("f_avg_trade"))
+    logger.info("  ✅ C. Volumen: ~%d features", n_vol)
+    return df
+# ═══════════════════════════════════════════════════════════
+#  D. CROSS-ASSET INTELLIGENCE (~30 features)
+# ═══════════════════════════════════════════════════════════
+def features_cross_asset(df: pd.DataFrame, df_btc: pd.DataFrame = None,
+                          df_macro: pd.DataFrame = None) -> pd.DataFrame:
+    """Features de correlación entre activos y datos macro."""
+    c = df["close"]
+    symbol = df.get("symbol", pd.Series("UNKNOWN", index=df.index)).iloc[0] if "symbol" in df.columns else "UNKNOWN"
+    # Cross-asset con BTC (si no es BTC mismo)
+    if df_btc is not None and symbol != "BTCUSDT":
+        btc_c = df_btc["close"].reindex(df.index, method="ffill")
+        # Correlación rolling
+        df["f_corr_btc_30"] = c.rolling(30).corr(btc_c)
+        df["f_corr_btc_90"] = c.rolling(90).corr(btc_c)
+        # Beta respecto a BTC
+        btc_ret = btc_c.pct_change()
+        sym_ret = c.pct_change()
+        cov = sym_ret.rolling(30).cov(btc_ret)
+        var = btc_ret.rolling(30).var()
+        df["f_beta_btc_30"] = cov / var.replace(0, np.nan)
+        # Lead-lag: ¿se movió antes que BTC?
+        for lag in [1, 3, 6]:
+            df[f"f_lead_btc_{lag}"] = sym_ret.shift(lag).rolling(10).corr(btc_ret)
+        # Spread normalizado
+        df["f_spread_btc"] = (c / btc_c).pct_change(5) * 100
+    else:
+        # Si ES BTC, agregamos 0s
+        df["f_corr_btc_30"] = 1.0
+        df["f_corr_btc_90"] = 1.0
+    # Datos macro (si disponibles)
+    if df_macro is not None and not df_macro.empty:
+        # Reindexar macro a la frecuencia del dataframe principal
+        for col in df_macro.columns:
+            macro_series = df_macro[col].reindex(df.index, method="ffill")
+            df[f"f_{col}"] = macro_series
+            # Rate-of-change macro
+            df[f"f_{col}_roc_5d"] = macro_series.pct_change(5) * 100
+    n_cross = sum(1 for col in df.columns if "corr_" in col or "beta_" in col or
+                  "lead_" in col or "spread_" in col or "macro_" in col)
+    logger.info("  ✅ D. Cross-asset: ~%d features", n_cross)
+    return df
+# ═══════════════════════════════════════════════════════════
+#  E. ON-CHAIN Y DERIVADOS (~30 features)
+# ═════════════════════════════════════════════���═════════════
+def features_onchain(df: pd.DataFrame, df_funding: pd.DataFrame = None,
+                      df_ls: pd.DataFrame = None, df_oi: pd.DataFrame = None) -> pd.DataFrame:
+    """Features de datos on-chain y derivados."""
+    # Funding rate
+    if df_funding is not None and not df_funding.empty:
+        fr = df_funding["fundingRate"].reindex(df.index, method="ffill")
+        df["f_funding_rate"] = fr
+        df["f_funding_rate_sma_10"] = fr.rolling(10).mean()
+        df["f_funding_rate_extreme_pos"] = (fr > 0.001).astype(int)  # Overleveraged longs
+        df["f_funding_rate_extreme_neg"] = (fr < -0.001).astype(int)  # Overleveraged shorts
+        df["f_funding_rate_roc"] = fr.diff(3)
+    # Long/Short ratio
+    if df_ls is not None and not df_ls.empty:
+        ls = df_ls["longShortRatio"].reindex(df.index, method="ffill")
+        df["f_ls_ratio"] = ls
+        df["f_ls_ratio_sma_10"] = ls.rolling(10).mean()
+        df["f_ls_ratio_roc"] = ls.pct_change(5) * 100
+        if "longAccount" in df_ls.columns:
+            long_pct = df_ls["longAccount"].reindex(df.index, method="ffill")
+            df["f_long_pct"] = long_pct
+    # Open Interest
+    if df_oi is not None and not df_oi.empty:
+        oi = df_oi["sumOpenInterestValue"].reindex(df.index, method="ffill")
+        df["f_oi_value"] = oi
+        df["f_oi_roc_5"] = oi.pct_change(5) * 100
+        df["f_oi_roc_24"] = oi.pct_change(24) * 100  # ~24 velas de 4h ≈ 4 días
+        # OI divergence: precio sube pero OI baja = movimiento sin respaldo
+        if "f_return_5" in df.columns:
+            df["f_oi_price_div"] = df["f_return_5"] - df["f_oi_roc_5"]
+    n_onchain = sum(1 for col in df.columns if "funding" in col or "ls_" in col or
+                    "oi_" in col or "long_pct" in col)
+    logger.info("  ✅ E. On-chain: ~%d features", n_onchain)
+    return df
+# ═══════════════════════════════════════════════════════════
+#  F. SENTIMIENTO Y MACRO (~20 features)
+# ═══════════════════════════════════════════════════════════
+def features_sentimiento(df: pd.DataFrame, df_fg: pd.DataFrame = None) -> pd.DataFrame:
+    """Features de sentimiento y patrones temporales."""
+    # Fear & Greed
+    if df_fg is not None and not df_fg.empty:
+        fg = df_fg["fear_greed"].reindex(df.index, method="ffill")
+        df["f_fear_greed"] = fg
+        df["f_fear_greed_roc_5"] = fg.diff(5)
+        df["f_fear_greed_extreme_fear"] = (fg < 25).astype(int)
+        df["f_fear_greed_extreme_greed"] = (fg > 75).astype(int)
+    # Patrones temporales
+    if isinstance(df.index, pd.DatetimeIndex):
+        df["f_hour_of_day"] = df.index.hour
+        df["f_day_of_week"] = df.index.dayofweek
+        df["f_is_weekend"] = (df.index.dayofweek >= 5).astype(int)
+        # Meses estacionalmente fuertes para BTC (históricamente: Oct, Nov, Abr)
+        df["f_month"] = df.index.month
+        df["f_is_q4"] = (df.index.month >= 10).astype(int)
+        # Días desde halving BTC (aprox)
+        # Halvings: 2012-11-28, 2016-07-09, 2020-05-11, 2024-04-20
+        halvings = pd.to_datetime(["2020-05-11", "2024-04-20"], utc=True)
+        last_halving = halvings[-1]
+        df["f_days_since_halving"] = (df.index - last_halving).days
+    n_sent = sum(1 for col in df.columns if "fear_greed" in col or "hour_" in col or
+                 "day_" in col or "weekend" in col or "month" in col or "halving" in col or "q4" in col)
+    logger.info("  ✅ F. Sentimiento: ~%d features", n_sent)
+    return df
+# ═══════════════════════════════════════════════════════════
+#  ORQUESTADOR PRINCIPAL
+# ═══════════════════════════════════════════════════════════
+def generate_features(symbol: str, timeframe: str = "4h",
+                       btc_symbol: str = "BTCUSDT") -> pd.DataFrame:
+    """
+    Genera todas las features para un par dado.
+    Lee los archivos parquet del directorio data/.
+    """
+    logger.info("🔧 Generando features para %s %s...", symbol, timeframe)
+    # ── Cargar datos ──
+    klines_path = os.path.join(DATA_DIR, f"klines_{symbol}_{timeframe}.parquet")
+    if not os.path.exists(klines_path):
+        logger.error("❌ No se encontró %s", klines_path)
+        return pd.DataFrame()
+    df = pd.read_parquet(klines_path)
+    logger.info("  📊 %d velas cargadas", len(df))
+    # BTC para cross-asset (si no es BTC)
+    df_btc = None
+    if symbol != btc_symbol:
+        btc_path = os.path.join(DATA_DIR, f"klines_{btc_symbol}_{timeframe}.parquet")
+        if os.path.exists(btc_path):
+            df_btc = pd.read_parquet(btc_path)
+    # Derivados
+    df_funding = _load_parquet(f"funding_{symbol}.parquet")
+    df_ls = _load_parquet(f"longshort_{symbol}.parquet")
+    df_oi = _load_parquet(f"oi_{symbol}.parquet")
+    # Macro y sentimiento
+    df_macro = _load_parquet("macro.parquet")
+    df_fg = _load_parquet("fear_greed.parquet")
+    # ── Generar features por categoría ──
+    df = features_microestructura(df)
+    df = features_momentum(df)
+    df = features_volumen(df)
+    df = features_cross_asset(df, df_btc=df_btc, df_macro=df_macro)
+    df = features_onchain(df, df_funding=df_funding, df_ls=df_ls, df_oi=df_oi)
+    df = features_sentimiento(df, df_fg=df_fg)
+    # ── Limpieza ──
+    feature_cols = [c for c in df.columns if c.startswith("f_")]
+    df[feature_cols] = df[feature_cols].replace([np.inf, -np.inf], np.nan)
+    # Warmup: eliminar primeras 200 filas donde la mayoría de features son NaN
+    warmup = 210
+    df = df.iloc[warmup:]
+    n_features = len(feature_cols)
+    n_rows = len(df)
+    nan_pct = df[feature_cols].isna().mean().mean() * 100
+    logger.info("\n" + "=" * 60)
+    logger.info("🔧 FEATURES GENERADAS: %s %s", symbol, timeframe)
+    logger.info("  📊 %d velas × %d features", n_rows, n_features)
+    logger.info("  📉 NaN promedio: %.1f%%", nan_pct)
+    logger.info("=" * 60)
+    return df
+def _load_parquet(filename: str) -> pd.DataFrame:
+    """Carga un parquet si existe, retorna DataFrame vacío si no."""
+    path = os.path.join(DATA_DIR, filename)
+    if os.path.exists(path):
+        return pd.read_parquet(path)
+    return pd.DataFrame()
+def get_feature_columns(df: pd.DataFrame) -> list[str]:
+    """Retorna la lista de columnas de features (prefijo f_)."""
+    return sorted([c for c in df.columns if c.startswith("f_")])
+# ─────────────────────────────────────────────
+#  MAIN
+# ─────────────────────────────────────────────
+def main():
+    parser = argparse.ArgumentParser(description="Aurora Brain — Feature Engine")
+    parser.add_argument("--symbol", default="BTCUSDT", help="Par (default: BTCUSDT)")
+    parser.add_argument("--timeframe", default="4h", help="Timeframe (default: 4h)")
+    parser.add_argument("--all", action="store_true", help="Generar para BTC + ETH + SOL")
+    args = parser.parse_args()
+    symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT"] if args.all else [args.symbol]
+    for symbol in symbols:
+        df = generate_features(symbol, args.timeframe)
+        if not df.empty:
+            path = os.path.join(DATA_DIR, f"features_{symbol}_{args.timeframe}.parquet")
+            df.to_parquet(path)
+            logger.info("💾 Guardado: %s", path)
+            # Mostrar top features por varianza (las más informativas)
+            feature_cols = get_feature_columns(df)
+            variance = df[feature_cols].var().sort_values(ascending=False)
+            logger.info("\n📊 Top 20 features por varianza:")
+            for i, (col, var) in enumerate(variance.head(20).items()):
+                logger.info("  %2d. %-30s var=%.4f", i + 1, col, var)
+if __name__ == "__main__":
+    main()

regime_detector.py ADDED Viewed

	@@ -0,0 +1,286 @@

+"""
+╔══════════════════════════════════════════════════════════════╗
+║  AURORA BRAIN — Regime Detector (Capa 1)                     ║
+║                                                              ║
+║  Entrena un XGBoost para clasificar el régimen actual        ║
+║  del mercado: TRENDING / RANGING / VOLATILE / BREAKOUT       ║
+║                                                              ║
+║  Walk-forward validation para evitar overfitting.            ║
+║  Feature importance para interpretabilidad.                  ║
+║                                                              ║
+║  Uso:                                                        ║
+║    python regime_detector.py                                 ║
+║    python regime_detector.py --symbol BTCUSDT --test-pct 20  ║
+╚══════════════════════════════════════════════════════════════╝
+"""
+import os
+import json
+import argparse
+import logging
+import pickle
+from datetime import datetime, timezone
+import numpy as np
+import pandas as pd
+from sklearn.model_selection import TimeSeriesSplit
+from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
+from sklearn.preprocessing import LabelEncoder
+try:
+    import xgboost as xgb
+except ImportError:
+    xgb = None
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+logger = logging.getLogger("AuroraBrain.Regime")
+DATA_DIR = os.path.join(os.path.dirname(__file__), "data")
+MODELS_DIR = os.path.join(os.path.dirname(__file__), "models")
+REGIME_NAMES = {0: "TRENDING", 1: "RANGING", 2: "VOLATILE", 3: "BREAKOUT"}
+def select_features(df: pd.DataFrame, target_col: str = "regime",
+                     max_features: int = 80) -> list[str]:
+    """
+    Selecciona las features más relevantes para el modelo.
+    Combina: varianza mínima + correlación con target.
+    """
+    feature_cols = sorted([c for c in df.columns if c.startswith("f_")])
+    # Filtrar features con poca varianza
+    variances = df[feature_cols].var()
+    low_var = variances[variances < 1e-8].index.tolist()
+    feature_cols = [c for c in feature_cols if c not in low_var]
+    # Filtrar features con >30% NaN
+    nan_pct = df[feature_cols].isna().mean()
+    high_nan = nan_pct[nan_pct > 0.3].index.tolist()
+    feature_cols = [c for c in feature_cols if c not in high_nan]
+    # Correlación con target
+    valid = df.dropna(subset=[target_col])
+    if len(valid) > 100:
+        corr = valid[feature_cols].corrwith(valid[target_col]).abs()
+        corr = corr.sort_values(ascending=False)
+        feature_cols = corr.head(max_features).index.tolist()
+    logger.info("  📊 Features seleccionadas: %d (de %d originales)",
+                len(feature_cols), sum(1 for c in df.columns if c.startswith("f_")))
+    return feature_cols
+def train_regime_model(df: pd.DataFrame, feature_cols: list[str],
+                        test_pct: float = 20.0, n_splits: int = 5) -> dict:
+    """
+    Entrena el detector de régimen con walk-forward validation.
+    Returns:
+        dict con modelo, métricas, feature importance
+    """
+    if xgb is None:
+        logger.error("❌ xgboost no instalado")
+        return {}
+    # Filtrar velas sin régimen etiquetado
+    valid = df[df["regime"] >= 0].copy()
+    valid = valid.dropna(subset=feature_cols, how="all")
+    # Fill NaN con mediana (para features parciales)
+    X = valid[feature_cols].fillna(valid[feature_cols].median())
+    y = valid["regime"].astype(int)
+    logger.info("  📊 Dataset: %d muestras, %d features, %d clases",
+                len(X), len(feature_cols), y.nunique())
+    # ── Split temporal (walk-forward) ──
+    split_idx = int(len(X) * (1 - test_pct / 100))
+    X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
+    y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]
+    logger.info("  📊 Train: %d | Test: %d (%.0f%%)", len(X_train), len(X_test), test_pct)
+    # ── Entrenamiento XGBoost ──
+    class_counts = y_train.value_counts()
+    total = len(y_train)
+    sample_weights = y_train.map(lambda x: total / (len(class_counts) * class_counts[x]))
+    model = xgb.XGBClassifier(
+        n_estimators=300,
+        max_depth=6,
+        learning_rate=0.05,
+        subsample=0.8,
+        colsample_bytree=0.8,
+        min_child_weight=5,
+        gamma=0.1,
+        reg_alpha=0.1,
+        reg_lambda=1.0,
+        objective="multi:softprob",
+        num_class=4,
+        eval_metric="mlogloss",
+        random_state=42,
+        n_jobs=-1,
+    )
+    model.fit(
+        X_train, y_train,
+        sample_weight=sample_weights,
+        eval_set=[(X_test, y_test)],
+        verbose=False,
+    )
+    # ── Evaluación ──
+    y_pred = model.predict(X_test)
+    accuracy = accuracy_score(y_test, y_pred)
+    report = classification_report(y_test, y_pred, target_names=list(REGIME_NAMES.values()),
+                                    output_dict=True, zero_division=0)
+    cm = confusion_matrix(y_test, y_pred)
+    logger.info("\n" + "=" * 60)
+    logger.info("🎯 RESULTADOS DEL DETECTOR DE RÉGIMEN")
+    logger.info("=" * 60)
+    logger.info("  Accuracy: %.2f%%", accuracy * 100)
+    logger.info("\n%s", classification_report(y_test, y_pred,
+                target_names=list(REGIME_NAMES.values()), zero_division=0))
+    # ── Feature importance ──
+    importances = model.feature_importances_
+    feat_imp = sorted(zip(feature_cols, importances), key=lambda x: -x[1])
+    logger.info("\n📊 Top 20 features más importantes:")
+    for i, (feat, imp) in enumerate(feat_imp[:20]):
+        logger.info("  %2d. %-35s %.4f", i + 1, feat, imp)
+    # ── Walk-forward cross-validation ──
+    tscv = TimeSeriesSplit(n_splits=n_splits)
+    cv_scores = []
+    for fold, (train_idx, val_idx) in enumerate(tscv.split(X)):
+        cv_model = xgb.XGBClassifier(
+            n_estimators=200, max_depth=6, learning_rate=0.05,
+            subsample=0.8, colsample_bytree=0.8,
+            objective="multi:softprob", num_class=4,
+            random_state=42, n_jobs=-1,
+        )
+        cv_model.fit(X.iloc[train_idx], y.iloc[train_idx], verbose=False)
+        cv_pred = cv_model.predict(X.iloc[val_idx])
+        cv_acc = accuracy_score(y.iloc[val_idx], cv_pred)
+        cv_scores.append(cv_acc)
+        logger.info("  Fold %d: %.2f%%", fold + 1, cv_acc * 100)
+    avg_cv = np.mean(cv_scores)
+    std_cv = np.std(cv_scores)
+    logger.info("  Walk-Forward CV: %.2f%% ± %.2f%%", avg_cv * 100, std_cv * 100)
+    # ── Guardar modelo ──
+    os.makedirs(MODELS_DIR, exist_ok=True)
+    model_path = os.path.join(MODELS_DIR, "regime_model.pkl")
+    with open(model_path, "wb") as f:
+        pickle.dump(model, f)
+    # Guardar metadatos
+    metadata = {
+        "trained_at": datetime.now(timezone.utc).isoformat(),
+        "accuracy": round(accuracy * 100, 2),
+        "cv_accuracy": round(avg_cv * 100, 2),
+        "cv_std": round(std_cv * 100, 2),
+        "n_train": len(X_train),
+        "n_test": len(X_test),
+        "n_features": len(feature_cols),
+        "feature_cols": feature_cols,
+        "top_features": [{"name": f, "importance": round(float(i), 4)}
+                         for f, i in feat_imp[:30]],
+        "class_report": report,
+    }
+    meta_path = os.path.join(MODELS_DIR, "regime_metadata.json")
+    with open(meta_path, "w") as f:
+        json.dump(metadata, f, indent=2, ensure_ascii=False)
+    logger.info("\n💾 Modelo guardado: %s", model_path)
+    logger.info("💾 Metadata: %s", meta_path)
+    return {
+        "model": model,
+        "accuracy": accuracy,
+        "cv_accuracy": avg_cv,
+        "feature_cols": feature_cols,
+        "metadata": metadata,
+    }
+def predict_regime(df: pd.DataFrame) -> dict:
+    """
+    Predice el régimen actual usando el modelo guardado.
+    Retorna dict con régimen, probabilidades y confianza.
+    """
+    model_path = os.path.join(MODELS_DIR, "regime_model.pkl")
+    meta_path = os.path.join(MODELS_DIR, "regime_metadata.json")
+    if not os.path.exists(model_path):
+        return {"error": "Modelo no encontrado — entrenar primero"}
+    with open(model_path, "rb") as f:
+        model = pickle.load(f)
+    with open(meta_path, "r") as f:
+        metadata = json.load(f)
+    feature_cols = metadata["feature_cols"]
+    # Preparar features de la última vela
+    last_row = df[feature_cols].iloc[-1:].fillna(df[feature_cols].median())
+    # Predecir
+    proba = model.predict_proba(last_row)[0]
+    regime_id = int(np.argmax(proba))
+    confidence = float(proba[regime_id])
+    return {
+        "regime": REGIME_NAMES[regime_id],
+        "regime_id": regime_id,
+        "confidence": round(confidence, 4),
+        "probabilities": {
+            REGIME_NAMES[i]: round(float(p), 4) for i, p in enumerate(proba)
+        },
+        "model_accuracy": metadata.get("accuracy", 0),
+        "model_cv_accuracy": metadata.get("cv_accuracy", 0),
+    }
+# ─────────────────────────────────────────────
+#  MAIN
+# ─────────────────────────────────────────────
+def main():
+    parser = argparse.ArgumentParser(description="Aurora Brain — Regime Detector")
+    parser.add_argument("--symbol", default="BTCUSDT")
+    parser.add_argument("--timeframe", default="4h")
+    parser.add_argument("--test-pct", type=float, default=20.0)
+    parser.add_argument("--predict-only", action="store_true",
+                        help="Solo predecir régimen actual (no entrenar)")
+    args = parser.parse_args()
+    labeled_path = os.path.join(DATA_DIR, f"labeled_{args.symbol}_{args.timeframe}.parquet")
+    if not os.path.exists(labeled_path):
+        logger.error("❌ No encontrado: %s — ejecutá regime_labeler.py primero", labeled_path)
+        return
+    df = pd.read_parquet(labeled_path)
+    if args.predict_only:
+        result = predict_regime(df)
+        logger.info("\n🔮 RÉGIMEN ACTUAL: %s (confianza: %.1f%%)",
+                    result["regime"], result["confidence"] * 100)
+        logger.info("  Probabilidades: %s", result["probabilities"])
+    else:
+        feature_cols = select_features(df)
+        result = train_regime_model(df, feature_cols, test_pct=args.test_pct)
+        if result:
+            # Predecir régimen actual
+            pred = predict_regime(df)
+            logger.info("\n🔮 RÉGIMEN ACTUAL: %s (confianza: %.1f%%)",
+                        pred["regime"], pred["confidence"] * 100)
+if __name__ == "__main__":
+    main()

regime_labeler.py ADDED Viewed

	@@ -0,0 +1,217 @@

+"""
+╔══════════════════════════════════════════════════════════════╗
+║  AURORA BRAIN — Regime Labeler                               ║
+║                                                              ║
+║  Etiqueta cada vela con el régimen de mercado que ocurrió    ║
+║  DESPUÉS (post-hoc). Esto se usa como target para entrenar   ║
+║  el detector de régimen.                                     ║
+║                                                              ║
+║  Regímenes:                                                  ║
+║    0 = TRENDING  — precio avanzó >5% en dirección del trend  ║
+║    1 = RANGING   — precio se mantuvo en rango ±3%            ║
+║    2 = VOLATILE  — drawdown >5% en 24h                       ║
+║    3 = BREAKOUT  — movimiento >8% en 48h + volumen >3x       ║
+║                                                              ║
+║  Uso:                                                        ║
+║    python regime_labeler.py                                  ║
+║    python regime_labeler.py --symbol BTCUSDT --horizon 12    ║
+╚══════════════════════════════════════════════════════════════╝
+"""
+import os
+import argparse
+import logging
+import numpy as np
+import pandas as pd
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+logger = logging.getLogger("AuroraBrain.Labeler")
+DATA_DIR = os.path.join(os.path.dirname(__file__), "data")
+# Régimen IDs
+REGIME_TRENDING = 0
+REGIME_RANGING = 1
+REGIME_VOLATILE = 2
+REGIME_BREAKOUT = 3
+REGIME_NAMES = {0: "TRENDING", 1: "RANGING", 2: "VOLATILE", 3: "BREAKOUT"}
+def label_regimes(df: pd.DataFrame, horizon: int = 12,
+                   trend_threshold: float = 5.0,
+                   range_threshold: float = 3.0,
+                   volatile_dd_threshold: float = 5.0,
+                   breakout_threshold: float = 8.0,
+                   breakout_vol_mult: float = 3.0) -> pd.DataFrame:
+    """
+    Etiqueta cada vela con el régimen que ocurrió en las siguientes `horizon` velas.
+    Args:
+        df: DataFrame con features ya calculadas (debe tener close, volume, f_vol_sma_20)
+        horizon: Número de velas a mirar hacia adelante (default 12 = 48h en 4H)
+        trend_threshold: % mínimo de avance para considerar TRENDING
+        range_threshold: % máximo de rango para considerar RANGING
+        volatile_dd_threshold: % de drawdown para considerar VOLATILE
+        breakout_threshold: % de movimiento para considerar BREAKOUT
+        breakout_vol_mult: Multiplicador de volumen para BREAKOUT
+    Returns:
+        DataFrame con columna 'regime' y 'regime_name' agregadas
+    """
+    c = df["close"]
+    v = df["volume"]
+    n = len(df)
+    regimes = np.full(n, REGIME_RANGING)  # Default: RANGING
+    logger.info("🏷️ Etiquetando regímenes (horizon=%d velas)...", horizon)
+    for i in range(n - horizon):
+        # Ventana de futuro
+        future_prices = c.iloc[i + 1: i + 1 + horizon]
+        future_volumes = v.iloc[i + 1: i + 1 + horizon]
+        if len(future_prices) < horizon:
+            continue
+        current_price = c.iloc[i]
+        if current_price <= 0:
+            continue
+        # Métricas de la ventana futura
+        max_price = future_prices.max()
+        min_price = future_prices.min()
+        end_price = future_prices.iloc[-1]
+        # Retorno neto
+        return_pct = (end_price - current_price) / current_price * 100
+        # Máximo drawdown en la ventana
+        running_max = future_prices.cummax()
+        drawdowns = (running_max - future_prices) / running_max * 100
+        max_drawdown = drawdowns.max()
+        # Máximo rally en la ventana
+        running_min = future_prices.cummin()
+        rallies = (future_prices - running_min) / running_min * 100
+        max_rally = rallies.max()
+        # Rango total
+        price_range = (max_price - min_price) / current_price * 100
+        # Volumen promedio futuro vs histórico
+        vol_sma = df["f_vol_sma_20"].iloc[i] if "f_vol_sma_20" in df.columns else v.iloc[max(0, i-20):i].mean()
+        avg_future_vol = future_volumes.mean()
+        vol_ratio = avg_future_vol / vol_sma if vol_sma > 0 else 1.0
+        # ── Clasificación (prioridad: VOLATILE > BREAKOUT > TRENDING > RANGING) ──
+        # VOLATILE: drawdown fuerte
+        if max_drawdown >= volatile_dd_threshold:
+            regimes[i] = REGIME_VOLATILE
+            continue
+        # BREAKOUT: movimiento grande + volumen alto
+        if price_range >= breakout_threshold and vol_ratio >= breakout_vol_mult:
+            regimes[i] = REGIME_BREAKOUT
+            continue
+        # TRENDING: avance sostenido en una dirección
+        if abs(return_pct) >= trend_threshold:
+            regimes[i] = REGIME_TRENDING
+            continue
+        # RANGING: precio se quedó en un rango estrecho
+        if price_range <= range_threshold:
+            regimes[i] = REGIME_RANGING
+            continue
+        # Default para movimientos moderados
+        if abs(return_pct) >= trend_threshold * 0.6:
+            regimes[i] = REGIME_TRENDING
+        else:
+            regimes[i] = REGIME_RANGING
+    # Últimas `horizon` velas no tienen etiqueta confiable
+    regimes[-horizon:] = -1  # Marcar como desconocido
+    df["regime"] = regimes
+    df["regime_name"] = df["regime"].map(REGIME_NAMES).fillna("UNKNOWN")
+    # Estadísticas
+    valid = df[df["regime"] >= 0]
+    counts = valid["regime_name"].value_counts()
+    total = len(valid)
+    logger.info("\n" + "=" * 50)
+    logger.info("🏷️ REGÍMENES ETIQUETADOS")
+    logger.info("  Total velas etiquetadas: %d", total)
+    for name, count in counts.items():
+        pct = count / total * 100
+        logger.info("  %-12s: %5d (%5.1f%%)", name, count, pct)
+    logger.info("=" * 50)
+    return df
+def label_targets(df: pd.DataFrame, horizons: list[int] = None) -> pd.DataFrame:
+    """
+    Agrega targets de retorno futuro para los modelos de predicción.
+    Estos NO son los regímenes — son los targets numéricos.
+    """
+    if horizons is None:
+        horizons = [6, 12, 24]  # 24h, 48h, 96h en TF 4H
+    c = df["close"]
+    for h in horizons:
+        # Retorno futuro
+        df[f"target_return_{h}"] = c.shift(-h).pct_change(h).shift(h) if False else \
+                                    (c.shift(-h) - c) / c * 100
+        # Dirección (1=sube, 0=baja)
+        df[f"target_dir_{h}"] = (df[f"target_return_{h}"] > 0).astype(int)
+        # Max drawdown futuro
+        future_max = c.iloc[::-1].rolling(h, min_periods=1).max().iloc[::-1].shift(-1)
+        future_min = c.iloc[::-1].rolling(h, min_periods=1).min().iloc[::-1].shift(-1)
+        df[f"target_max_dd_{h}"] = (c - future_min) / c * 100
+        df[f"target_max_rally_{h}"] = (future_max - c) / c * 100
+    logger.info("  ✅ Targets generados para horizontes: %s", horizons)
+    return df
+# ─────────────────────────────────────────────
+#  MAIN
+# ─────────────────────────────────────────────
+def main():
+    parser = argparse.ArgumentParser(description="Aurora Brain — Regime Labeler")
+    parser.add_argument("--symbol", default="BTCUSDT")
+    parser.add_argument("--timeframe", default="4h")
+    parser.add_argument("--horizon", type=int, default=12, help="Velas a futuro (default 12 = 48h en 4H)")
+    parser.add_argument("--all", action="store_true")
+    args = parser.parse_args()
+    symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT"] if args.all else [args.symbol]
+    for symbol in symbols:
+        features_path = os.path.join(DATA_DIR, f"features_{symbol}_{args.timeframe}.parquet")
+        if not os.path.exists(features_path):
+            logger.error("❌ No encontrado: %s — ejecutá feature_engine.py primero", features_path)
+            continue
+        df = pd.read_parquet(features_path)
+        logger.info("📊 Cargado %s: %d velas × %d columnas", symbol, len(df), len(df.columns))
+        df = label_regimes(df, horizon=args.horizon)
+        df = label_targets(df, horizons=[6, 12, 24])
+        out_path = os.path.join(DATA_DIR, f"labeled_{symbol}_{args.timeframe}.parquet")
+        df.to_parquet(out_path)
+        logger.info("💾 Guardado: %s", out_path)
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,33 @@

+# Aurora Brain — Dependencies
+# Para HuggingFace Space (Python 3.10+)
+# Core data
+pandas>=2.0
+numpy>=1.24
+pyarrow>=12.0
+# Technical analysis
+pandas_ta>=0.3.14
+# ML
+scikit-learn>=1.3
+xgboost>=2.0
+# API
+fastapi>=0.100
+uvicorn>=0.22
+pydantic>=2.0
+# Data download
+requests>=2.31
+# Macro data
+yfinance>=0.2.28
+# Visualization (entrenamiento)
+matplotlib>=3.7
+# Future: TFT (Fase 3)
+# darts>=0.27
+# pytorch-lightning>=2.0
+# torch>=2.0