Spaces:

xianqiu
/

qlang

Running

App Files Files Community

xianqiu commited on Jan 29

Commit

5145926

0 Parent(s):

Initial deployment: Kronos BTC Forecast API (xianqiu/qlang)

Browse files

Files changed (11) hide show

.gitattributes +35 -0
DEPLOYMENT.md +217 -0
Dockerfile +40 -0
README.md +33 -0
app.py +776 -0
client.py +410 -0
model/__init__.py +17 -0
model/kronos.py +589 -0
model/module.py +580 -0
models/predictor/README.md +10 -0
models/predictor/config.json +13 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,217 @@

+# HuggingFace Space 部署指南
+本指南介绍如何将 Kronos BTC 预测 API 部署到 HuggingFace Spaces。
+## 准备工作
+### 1. 创建 HuggingFace 账户
+如果还没有账户，请访问 https://huggingface.co/join 注册。
+### 2. 安装 HuggingFace CLI
+```bash
+pip install huggingface_hub
+huggingface-cli login
+```
+## 方法一：通过 Git 部署 (推荐)
+### 1. 创建新 Space
+访问 https://huggingface.co/new-space 创建新 Space：
+- **Space name**: `kronos-btc-predictor` (或任意名称)
+- **License**: MIT
+- **SDK**: Docker
+- **Hardware**: CPU basic (免费)
+### 2. 克隆 Space 仓库
+```bash
+git clone https://huggingface.co/spaces/YOUR_USERNAME/kronos-btc-predictor
+cd kronos-btc-predictor
+```
+### 3. 复制文件
+```bash
+# 复制所有文件到 Space 仓库
+cp -r /path/to/hf_space/* .
+# 文件结构应该是:
+# ├── app.py
+# ├── requirements.txt
+# ├── README.md
+# ├── client.py
+# ├── Dockerfile         # 需要创建
+# ├── model/
+# │   ├── __init__.py
+# │   ├── kronos.py
+# │   └── module.py
+# └── models/
+#     ├── tokenizer/
+#     │   ├── config.json
+#     │   └── model.safetensors
+#     └── predictor/
+#         ├── config.json
+#         └── model.safetensors
+```
+### 4. 创建 Dockerfile
+```dockerfile
+FROM python:3.10-slim
+WORKDIR /app
+# 安装依赖
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# 复制应用代码
+COPY . .
+# 暴露端口
+EXPOSE 7860
+# 启动服务
+CMD ["python", "app.py"]
+```
+### 5. 推送到 HuggingFace
+```bash
+git add .
+git commit -m "Initial deployment"
+git push
+```
+### 6. 等待构建
+Space 会自动构建和部署。你可以在 Space 页面查看构建日志。
+构建完成后，API 将在以下地址可用：
+```
+https://YOUR_USERNAME-kronos-btc-predictor.hf.space
+```
+## 方法二：通过 Web 界面上传
+### 1. 创建 Space
+访问 https://huggingface.co/new-space：
+- SDK: Docker
+- Hardware: CPU basic
+### 2. 上传文件
+在 Space 页面点击 "Files" 标签，然后 "Add file" -> "Upload files"：
+逐个上传以下文件：
+- `app.py`
+- `requirements.txt`
+- `Dockerfile`
+- `model/__init__.py`
+- `model/kronos.py`
+- `model/module.py`
+- `models/tokenizer/config.json`
+- `models/tokenizer/model.safetensors`
+- `models/predictor/config.json`
+- `models/predictor/model.safetensors`
+## 验证部署
+### 1. 健康检查
+```bash
+curl https://YOUR_USERNAME-kronos-btc-predictor.hf.space/health
+```
+预期响应：
+```json
+{
+  "status": "healthy",
+  "model_loaded": true,
+  "model_version": "iter5 (converged)",
+  "device": "cpu"
+}
+```
+### 2. API 文档
+访问 Swagger UI：
+```
+https://YOUR_USERNAME-kronos-btc-predictor.hf.space/docs
+```
+### 3. 测试预测
+```python
+from client import KronosClient
+client = KronosClient("https://YOUR_USERNAME-kronos-btc-predictor.hf.space")
+health = client.health()
+print(f"Status: {health.status}")
+```
+## 配置自定义域名
+1. 在 Space 设置中找到 "Custom domain"
+2. 输入你的域名 (如 `api.yourdomain.com`)
+3. 配置 DNS CNAME 记录指向 HuggingFace
+## 注意事项
+### 免费版限制
+- **CPU**: 2 vCPU
+- **内存**: 16GB RAM
+- **存储**: 50GB
+- **请求**: 无硬性限制，但有速率控制
+- **冷启动**: 不活动时会休眠，首次请求需等待约 30-60 秒
+### 性能优化
+1. **减少 n_paths**: 使用 10-20 个路径而不是 30-100
+2. **减少 pred_len**: 使用 12-24 而不是 72
+3. **预热**: 定期发送健康检查请求防止休眠
+### 安全建议
+1. 不要在代码中硬编码 API 密钥
+2. 使用 HuggingFace Secrets 存储敏感信息
+3. 考虑添加请求速率限制
+## 升级到 Pro
+如果需要更好的性能，可以升级到 HuggingFace Pro：
+- **CPU upgrade**: 更快的 CPU
+- **GPU**: T4 GPU (付费)
+- **永不休眠**: 始终保持运行
+访问 https://huggingface.co/pricing 了解详情。
+## 故障排除
+### 构建失败
+1. 检查 `requirements.txt` 中的版本兼容性
+2. 确保所有文件都已上传
+3. 查看构建日志中的错误信息
+### 模型加载失败
+1. 确认 `models/` 目录结构正确
+2. 检查 `config.json` 和 `model.safetensors` 文件
+### 请求超时
+1. 减少 `n_paths` 和 `pred_len` 参数
+2. 检查输入数据大小
+3. 考虑升级到更好的硬件
+## 联系支持
+如有问题，请在项目仓库提交 Issue。

Dockerfile ADDED Viewed

	@@ -0,0 +1,40 @@

+# Kronos BTC Prediction API - Docker Image
+# Optimized for HuggingFace Spaces
+FROM python:3.10-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better caching
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Create non-root user for security
+RUN useradd -m -u 1000 user
+USER user
+# Set environment variables
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH \
+    PYTHONUNBUFFERED=1
+# Expose port (HuggingFace Spaces uses 7860)
+EXPOSE 7860
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD python -c "import httpx; httpx.get('http://localhost:7860/health', timeout=5)" || exit 1
+# Start the application
+CMD ["python", "app.py"]

README.md ADDED Viewed

	@@ -0,0 +1,33 @@

+---
+title: Kronos BTC Forecast
+emoji: 📈
+colorFrom: blue
+colorTo: yellow
+sdk: gradio
+sdk_version: 5.9.1
+python_version: "3.10"
+app_file: app.py
+pinned: false
+license: mit
+---
+# Kronos BTC/USDT Forecast API
+Probabilistic BTC/USDT price forecasting using [Kronos](https://github.com/shiyu-coder/Kronos) foundation model.
+## API Usage
+```python
+from gradio_client import Client
+client = Client("xianqiu/qlang")
+# Get BTC/USDT 24-hour forecast
+plot, result = client.predict(api_name="/predict")
+print(result)
+```
+## Model
+- **Model:** Kronos-mini (4.1M params)
+- **Paper:** [arXiv:2508.02739](https://arxiv.org/abs/2508.02739)

app.py ADDED Viewed

	@@ -0,0 +1,776 @@

+"""
+Kronos API Server - Hugging Face Space
+Provides API endpoints for BTC/USDT price forecasting using Kronos model.
+API Usage:
+    from gradio_client import Client
+    client = Client("xianqiu/qlang")
+    # Fast API (no plot)
+    result = client.predict(align_to_hour=True, api_name="/predict_api")
+    # With plot
+    plot, result = client.predict(align_to_hour=True, api_name="/predict")
+"""
+import os
+import json
+import time
+from datetime import datetime, timezone, timedelta
+import gradio as gr
+import numpy as np
+import pandas as pd
+import torch
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+from model import Kronos, KronosTokenizer, KronosPredictor
+# === Configuration ===
+CONFIG = {
+    "SYMBOL": "BTCUSDT",
+    "INTERVAL": "1h",
+    "HIST_POINTS": 360,
+    "PRED_HORIZON": 24,
+    "N_PREDICTIONS": 30,
+    "VOL_WINDOW": 24,
+    "TEMPERATURE": 1.0,
+    "TOP_P": 0.95,
+}
+# Global model instance
+predictor = None
+def load_model():
+    """Load Kronos model and tokenizer."""
+    global predictor
+    if predictor is not None:
+        return predictor
+    print("Loading Kronos model...")
+    device = "cuda:0" if torch.cuda.is_available() else "cpu"
+    tokenizer = KronosTokenizer.from_pretrained("NeoQuasar/Kronos-Tokenizer-2k")
+    model = Kronos.from_pretrained("NeoQuasar/Kronos-mini")
+    tokenizer.eval()
+    model.eval()
+    predictor = KronosPredictor(model, tokenizer, device=device, max_context=512)
+    print(f"Model loaded on {device}")
+    return predictor
+def fetch_binance_data():
+    """Fetch K-line data using Binance public REST API."""
+    import requests
+    symbol = "BTCUSDT"
+    interval = "1h"
+    limit = CONFIG["HIST_POINTS"] + CONFIG["VOL_WINDOW"]
+    # Try multiple Binance API endpoints
+    endpoints = [
+        "https://api.binance.com/api/v3/klines",
+        "https://api1.binance.com/api/v3/klines",
+        "https://api2.binance.com/api/v3/klines",
+        "https://api3.binance.com/api/v3/klines",
+        "https://data-api.binance.vision/api/v3/klines",  # Data API endpoint
+    ]
+    ohlcv = None
+    last_error = None
+    for endpoint in endpoints:
+        try:
+            url = f"{endpoint}?symbol={symbol}&interval={interval}&limit={limit}"
+            response = requests.get(url, timeout=30)
+            response.raise_for_status()
+            ohlcv = response.json()
+            break
+        except Exception as e:
+            last_error = e
+            continue
+    if ohlcv is None:
+        # Fallback to ccxt with OKX
+        try:
+            import ccxt
+            exchange = ccxt.okx({'enableRateLimit': True})
+            raw_ohlcv = exchange.fetch_ohlcv("BTC/USDT", "1h", limit=limit)
+            # Convert ccxt format to binance format
+            ohlcv = [[d[0], d[1], d[2], d[3], d[4], d[5], d[0], 0, 0, 0, 0, 0] for d in raw_ohlcv]
+        except Exception as e:
+            raise Exception(f"Failed to fetch data from all sources. Last error: {last_error}, ccxt error: {e}")
+    # Parse Binance format: [open_time, open, high, low, close, volume, close_time, quote_volume, ...]
+    df = pd.DataFrame(ohlcv, columns=[
+        'open_time', 'open', 'high', 'low', 'close', 'volume', 'close_time',
+        'quote_asset_volume', 'number_of_trades', 'taker_buy_base_asset_volume',
+        'taker_buy_quote_asset_volume', 'ignore'
+    ])
+    df['timestamps'] = pd.to_datetime(df['open_time'], unit='ms')
+    df['amount'] = pd.to_numeric(df['quote_asset_volume'])
+    for col in ['open', 'high', 'low', 'close', 'volume']:
+        df[col] = pd.to_numeric(df[col])
+    df = df[['timestamps', 'open', 'high', 'low', 'close', 'volume', 'amount']]
+    return df
+def make_prediction(df, pred_model):
+    """Generate probabilistic forecasts."""
+    last_timestamp = df['timestamps'].max()
+    start_new_range = last_timestamp + pd.Timedelta(hours=1)
+    new_timestamps_index = pd.date_range(
+        start=start_new_range,
+        periods=CONFIG["PRED_HORIZON"],
+        freq='h'
+    )
+    y_timestamp = pd.Series(new_timestamps_index, name='y_timestamp')
+    x_timestamp = df['timestamps']
+    x_df = df[['open', 'high', 'low', 'close', 'volume', 'amount']]
+    with torch.no_grad():
+        close_preds, volume_preds = pred_model.predict(
+            df=x_df,
+            x_timestamp=x_timestamp,
+            y_timestamp=y_timestamp,
+            pred_len=CONFIG["PRED_HORIZON"],
+            T=CONFIG["TEMPERATURE"],
+            top_p=CONFIG["TOP_P"],
+            sample_count=CONFIG["N_PREDICTIONS"],
+            verbose=False
+        )
+    return close_preds, volume_preds, y_timestamp
+def make_prediction_detail(df, pred_model):
+    """Generate probabilistic forecasts with full OHLCV output."""
+    last_timestamp = df['timestamps'].max()
+    start_new_range = last_timestamp + pd.Timedelta(hours=1)
+    new_timestamps_index = pd.date_range(
+        start=start_new_range,
+        periods=CONFIG["PRED_HORIZON"],
+        freq='h'
+    )
+    y_timestamp = pd.Series(new_timestamps_index, name='y_timestamp')
+    x_timestamp = df['timestamps']
+    x_df = df[['open', 'high', 'low', 'close', 'volume', 'amount']]
+    with torch.no_grad():
+        preds_dict = pred_model.predict_detail(
+            df=x_df,
+            x_timestamp=x_timestamp,
+            y_timestamp=y_timestamp,
+            pred_len=CONFIG["PRED_HORIZON"],
+            T=CONFIG["TEMPERATURE"],
+            top_p=CONFIG["TOP_P"],
+            sample_count=CONFIG["N_PREDICTIONS"],
+            verbose=False
+        )
+    return preds_dict, y_timestamp
+def calculate_metrics(hist_df, close_preds_df):
+    """Calculate upside and volatility metrics."""
+    last_close = hist_df['close'].iloc[-1]
+    # Upside Probability
+    final_hour_preds = close_preds_df.iloc[-1]
+    upside_prob = float((final_hour_preds > last_close).mean())
+    # Volatility Amplification
+    hist_log_returns = np.log(hist_df['close'] / hist_df['close'].shift(1))
+    historical_vol = hist_log_returns.iloc[-CONFIG["VOL_WINDOW"]:].std()
+    amplification_count = 0
+    for col in close_preds_df.columns:
+        full_sequence = pd.concat([pd.Series([last_close]), close_preds_df[col]]).reset_index(drop=True)
+        pred_log_returns = np.log(full_sequence / full_sequence.shift(1))
+        predicted_vol = pred_log_returns.std()
+        if predicted_vol > historical_vol:
+            amplification_count += 1
+    vol_amp_prob = amplification_count / len(close_preds_df.columns)
+    return upside_prob, vol_amp_prob
+def create_plot(hist_df, close_preds_df, volume_preds_df):
+    """Create forecast visualization."""
+    fig, (ax1, ax2) = plt.subplots(
+        2, 1, figsize=(15, 10), sharex=True,
+        gridspec_kw={'height_ratios': [3, 1]}
+    )
+    hist_time = hist_df['timestamps']
+    last_hist_time = hist_time.iloc[-1]
+    pred_time = pd.to_datetime([last_hist_time + timedelta(hours=i + 1) for i in range(len(close_preds_df))])
+    ax1.plot(hist_time, hist_df['close'], color='royalblue', label='Historical Price', linewidth=1.5)
+    mean_preds = close_preds_df.mean(axis=1)
+    ax1.plot(pred_time, mean_preds, color='darkorange', linestyle='-', label='Mean Forecast', linewidth=2)
+    ax1.fill_between(pred_time, close_preds_df.min(axis=1), close_preds_df.max(axis=1),
+                     color='darkorange', alpha=0.2, label='Forecast Range')
+    ax1.set_title(f'{CONFIG["SYMBOL"]} 24-Hour Price Forecast (Kronos)', fontsize=16, weight='bold')
+    ax1.set_ylabel('Price (USDT)')
+    ax1.legend()
+    ax1.grid(True, linestyle='--', alpha=0.7)
+    ax2.bar(hist_time, hist_df['volume'], color='skyblue', label='Historical Volume', width=0.03)
+    ax2.bar(pred_time, volume_preds_df.mean(axis=1), color='sandybrown', label='Forecast Volume', width=0.03)
+    ax2.set_ylabel('Volume')
+    ax2.set_xlabel('Time (UTC)')
+    ax2.legend()
+    ax2.grid(True, linestyle='--', alpha=0.7)
+    separator_time = hist_time.iloc[-1] + timedelta(minutes=30)
+    for ax in [ax1, ax2]:
+        ax.axvline(x=separator_time, color='red', linestyle='--', linewidth=1.5)
+        ax.tick_params(axis='x', rotation=30)
+    fig.tight_layout()
+    return fig
+def predict_btc(align_to_hour: bool = True):
+    """
+    Main prediction function with plot (for UI).
+    Args:
+        align_to_hour: If True, use data up to the last completed hour (aligned with official demo).
+                       If False, use all available data including the current incomplete hour.
+    Returns:
+        tuple: (plot_figure, result_dict)
+    """
+    fig, result = _do_prediction(align_to_hour=align_to_hour, include_plot=True)
+    return fig, result
+def predict_btc_api(align_to_hour: bool = True):
+    """
+    API-only prediction (no plot, faster response).
+    Args:
+        align_to_hour: If True, use data up to the last completed hour (aligned with official demo).
+                       If False, use all available data including the current incomplete hour.
+    Returns:
+        dict: Prediction result without plot
+    """
+    _, result = _do_prediction(align_to_hour=align_to_hour, include_plot=False)
+    return result
+def predict_btc_detail(align_to_hour: bool = True):
+    """
+    Detailed prediction API returning all Monte Carlo sample paths.
+    Args:
+        align_to_hour: If True, use data up to the last completed hour (aligned with official demo).
+                       If False, use all available data including the current incomplete hour.
+    Returns:
+        dict: Prediction result with all Monte Carlo sample paths
+    """
+    _, result = _do_prediction_detail(align_to_hour=align_to_hour)
+    return result
+def _do_prediction(align_to_hour: bool = True, include_plot: bool = True):
+    """
+    Internal prediction function.
+    Args:
+        align_to_hour: If True, use data up to the last completed hour.
+        include_plot: If True, generate plot (slower). If False, skip plot (faster).
+    Returns:
+        tuple: (plot_figure or None, result_dict)
+    """
+    try:
+        sample_count = CONFIG["N_PREDICTIONS"]
+        print(f"[Predict] align_to_hour={align_to_hour}, include_plot={include_plot}")
+        start_time = time.time()
+        # Load model
+        pred_model = load_model()
+        # Fetch data
+        df_full = fetch_binance_data()
+        # Choose data based on alignment mode
+        if align_to_hour:
+            # Exclude the last (incomplete) bar - aligned with official demo
+            df_for_model = df_full.iloc[:-1]
+            data_mode = "hourly_aligned"
+        else:
+            # Use all data including current incomplete bar
+            df_for_model = df_full
+            data_mode = "realtime"
+        # Make predictions
+        close_preds, volume_preds, pred_timestamps = make_prediction(
+            df_for_model, pred_model
+        )
+        # Calculate metrics
+        hist_df_for_metrics = df_for_model.tail(CONFIG["VOL_WINDOW"])
+        upside_prob, vol_amp_prob = calculate_metrics(hist_df_for_metrics, close_preds)
+        # Create plot only if requested
+        fig = None
+        if include_plot:
+            hist_df_for_plot = df_for_model.tail(CONFIG["HIST_POINTS"])
+            fig = create_plot(hist_df_for_plot, close_preds, volume_preds)
+        # Prepare result
+        last_close = float(df_for_model['close'].iloc[-1])
+        last_timestamp = df_for_model['timestamps'].iloc[-1]
+        mean_preds = close_preds.mean(axis=1).tolist()
+        min_preds = close_preds.min(axis=1).tolist()
+        max_preds = close_preds.max(axis=1).tolist()
+        elapsed = time.time() - start_time
+        result = {
+            "timestamp": datetime.now(timezone.utc).isoformat(),
+            "symbol": CONFIG["SYMBOL"],
+            "last_close": last_close,
+            "last_data_timestamp": last_timestamp.isoformat(),
+            "data_mode": data_mode,
+            "upside_probability": round(upside_prob * 100, 1),
+            "volatility_amplification": round(vol_amp_prob * 100, 1),
+            "prediction_horizon_hours": CONFIG["PRED_HORIZON"],
+            "sample_count": sample_count,
+            "inference_time_seconds": round(elapsed, 1),
+            "predictions": {
+                "timestamps": [t.isoformat() for t in pred_timestamps],
+                "mean": mean_preds,
+                "min": min_preds,
+                "max": max_preds,
+            },
+            "model": {
+                "name": "Kronos-mini",
+                "tokenizer": "Kronos-Tokenizer-2k",
+                "temperature": CONFIG["TEMPERATURE"],
+                "top_p": CONFIG["TOP_P"],
+            }
+        }
+        print(f"[Done] Prediction completed in {elapsed:.1f}s (plot={include_plot})")
+        return fig, result
+    except Exception as e:
+        error_result = {
+            "error": str(e),
+            "timestamp": datetime.now(timezone.utc).isoformat()
+        }
+        return None, error_result
+def _do_prediction_detail(align_to_hour: bool = True):
+    """
+    Internal prediction function that returns all Monte Carlo sample paths.
+    Args:
+        align_to_hour: If True, use data up to the last completed hour.
+    Returns:
+        tuple: (None, result_dict with all sample paths)
+    """
+    try:
+        sample_count = CONFIG["N_PREDICTIONS"]
+        print(f"[Predict Detail] align_to_hour={align_to_hour}")
+        start_time = time.time()
+        # Load model
+        pred_model = load_model()
+        # Fetch data
+        df_full = fetch_binance_data()
+        # Choose data based on alignment mode
+        if align_to_hour:
+            # Exclude the last (incomplete) bar - aligned with official demo
+            df_for_model = df_full.iloc[:-1]
+            data_mode = "hourly_aligned"
+        else:
+            # Use all data including current incomplete bar
+            df_for_model = df_full
+            data_mode = "realtime"
+        # Make predictions with full OHLCV output
+        preds_dict, pred_timestamps = make_prediction_detail(
+            df_for_model, pred_model
+        )
+        # Extract close predictions for metrics calculation
+        close_preds = preds_dict['close']
+        # Calculate metrics
+        hist_df_for_metrics = df_for_model.tail(CONFIG["VOL_WINDOW"])
+        upside_prob, vol_amp_prob = calculate_metrics(hist_df_for_metrics, close_preds)
+        # Prepare result
+        last_close = float(df_for_model['close'].iloc[-1])
+        last_timestamp = df_for_model['timestamps'].iloc[-1]
+        # Summary statistics for close price
+        mean_preds = close_preds.mean(axis=1).tolist()
+        min_preds = close_preds.min(axis=1).tolist()
+        max_preds = close_preds.max(axis=1).tolist()
+        # Prepare all sample paths for OHLCV (each column is a sample path)
+        all_samples = {}
+        for price_type in ['open', 'high', 'low', 'close', 'volume']:
+            price_df = preds_dict[price_type]
+            samples = {}
+            for col in price_df.columns:
+                samples[col] = price_df[col].tolist()
+            all_samples[price_type] = samples
+        elapsed = time.time() - start_time
+        result = {
+            "timestamp": datetime.now(timezone.utc).isoformat(),
+            "symbol": CONFIG["SYMBOL"],
+            "last_close": last_close,
+            "last_data_timestamp": last_timestamp.isoformat(),
+            "data_mode": data_mode,
+            "upside_probability": round(upside_prob * 100, 1),
+            "volatility_amplification": round(vol_amp_prob * 100, 1),
+            "prediction_horizon_hours": CONFIG["PRED_HORIZON"],
+            "sample_count": sample_count,
+            "inference_time_seconds": round(elapsed, 1),
+            "predictions": {
+                "timestamps": [t.isoformat() for t in pred_timestamps],
+                "mean": mean_preds,
+                "min": min_preds,
+                "max": max_preds,
+            },
+            "all_samples": all_samples,
+            "model": {
+                "name": "Kronos-mini",
+                "tokenizer": "Kronos-Tokenizer-2k",
+                "temperature": CONFIG["TEMPERATURE"],
+                "top_p": CONFIG["TOP_P"],
+            }
+        }
+        print(f"[Done] Detail prediction completed in {elapsed:.1f}s")
+        return None, result
+    except Exception as e:
+        error_result = {
+            "error": str(e),
+            "timestamp": datetime.now(timezone.utc).isoformat()
+        }
+        return None, error_result
+def predict_custom(
+    hist_data_json: str,
+    pred_horizon: int = 24,
+    sample_count: int = 30,
+    temperature: float = 1.0,
+    top_p: float = 0.95
+):
+    """
+    Custom prediction with user-provided data.
+    Args:
+        hist_data_json: JSON string with format:
+            {
+                "timestamps": ["2024-01-01T00:00:00", ...],
+                "open": [100.0, ...],
+                "high": [101.0, ...],
+                "low": [99.0, ...],
+                "close": [100.5, ...],
+                "volume": [1000.0, ...],  # optional
+                "amount": [100000.0, ...]  # optional
+            }
+        pred_horizon: Number of hours to predict (1-48)
+        sample_count: Number of Monte Carlo samples (1-100)
+        temperature: Sampling temperature (0.1-2.0)
+        top_p: Nucleus sampling probability (0.1-1.0)
+    Returns:
+        JSON string with predictions
+    """
+    try:
+        pred_model = load_model()
+        # Parse input
+        data = json.loads(hist_data_json)
+        df = pd.DataFrame(data)
+        df['timestamps'] = pd.to_datetime(df['timestamps'])
+        # Ensure required columns
+        for col in ['open', 'high', 'low', 'close']:
+            if col not in df.columns:
+                raise ValueError(f"Missing required column: {col}")
+            df[col] = pd.to_numeric(df[col])
+        if 'volume' not in df.columns:
+            df['volume'] = 0.0
+        if 'amount' not in df.columns:
+            df['amount'] = 0.0
+        # Validate parameters
+        pred_horizon = max(1, min(48, pred_horizon))
+        sample_count = max(1, min(100, sample_count))
+        temperature = max(0.1, min(2.0, temperature))
+        top_p = max(0.1, min(1.0, top_p))
+        # Prepare timestamps
+        last_timestamp = df['timestamps'].max()
+        freq = pd.infer_freq(df['timestamps'])
+        if freq is None:
+            freq = 'h'
+        y_timestamp = pd.Series(
+            pd.date_range(start=last_timestamp + pd.Timedelta(hours=1), periods=pred_horizon, freq=freq)
+        )
+        x_timestamp = df['timestamps']
+        x_df = df[['open', 'high', 'low', 'close', 'volume', 'amount']]
+        # Predict
+        with torch.no_grad():
+            close_preds, volume_preds = pred_model.predict(
+                df=x_df,
+                x_timestamp=x_timestamp,
+                y_timestamp=y_timestamp,
+                pred_len=pred_horizon,
+                T=temperature,
+                top_p=top_p,
+                sample_count=sample_count,
+                verbose=False
+            )
+        # Calculate metrics
+        last_close = float(df['close'].iloc[-1])
+        final_hour_preds = close_preds.iloc[-1]
+        upside_prob = float((final_hour_preds > last_close).mean())
+        result = {
+            "timestamp": datetime.now(timezone.utc).isoformat(),
+            "last_close": last_close,
+            "upside_probability": round(upside_prob * 100, 1),
+            "prediction_horizon": pred_horizon,
+            "sample_count": sample_count,
+            "predictions": {
+                "timestamps": [t.isoformat() for t in y_timestamp],
+                "mean": close_preds.mean(axis=1).tolist(),
+                "min": close_preds.min(axis=1).tolist(),
+                "max": close_preds.max(axis=1).tolist(),
+                "volume_mean": volume_preds.mean(axis=1).tolist(),
+            },
+            "parameters": {
+                "temperature": temperature,
+                "top_p": top_p,
+            }
+        }
+        return json.dumps(result, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e)}, indent=2)
+# === Gradio Interface ===
+with gr.Blocks(title="Kronos BTC Forecast API") as demo:
+    gr.Markdown("""
+    # Kronos: BTC/USDT Price Forecast API
+    This Space provides an API for probabilistic BTC/USDT price forecasting using the
+    [Kronos](https://github.com/shiyu-coder/Kronos) foundation model.
+    ## Quick Start (Python)
+    ```python
+    from gradio_client import Client
+    client = Client("xianqiu/qlang")
+    # Fast API call (no plot, recommended)
+    result = client.predict(align_to_hour=True, api_name="/predict_api")
+    print(result)
+    # With plot (slower)
+    plot, result = client.predict(align_to_hour=True, api_name="/predict")
+    # Detail API - returns all Monte Carlo sample paths with full OHLCV
+    result = client.predict(align_to_hour=True, api_name="/predict_all")
+    print(result["all_samples"]["open"])   # All 30 open price prediction paths
+    print(result["all_samples"]["high"])   # All 30 high price prediction paths
+    print(result["all_samples"]["low"])    # All 30 low price prediction paths
+    print(result["all_samples"]["close"])  # All 30 close price prediction paths
+    print(result["all_samples"]["volume"]) # All 30 volume prediction paths
+    ```
+    ## API Endpoints
+    - `/predict_api` - **Recommended**: JSON-only response (faster, no plot)
+    - `/predict` - With plot (for visualization)
+    - `/predict_all` - Returns all Monte Carlo sample paths with full OHLCV (for detailed analysis)
+    - `/predict_custom` - Custom OHLCV data prediction
+    ## Data Mode
+    - **Hourly Aligned (default)**: Uses data up to the last completed hour, matching the official Kronos demo
+    - **Realtime**: Uses all available data including the current incomplete hour
+    """)
+    with gr.Tab("BTC/USDT Forecast"):
+        gr.Markdown("""
+        Generate 24-hour BTC/USDT price forecast.
+        **Data Mode:**
+        - **Hourly Aligned**: Use data up to last completed hour (matches official demo for comparison)
+        - **Realtime**: Use all available data including current incomplete hour
+        """)
+        align_checkbox = gr.Checkbox(
+            label="Align to Hour (match official demo)",
+            value=True,
+            info="If checked, excludes current incomplete hour for consistency with official demo"
+        )
+        predict_btn = gr.Button("Generate Forecast", variant="primary")
+        with gr.Row():
+            plot_output = gr.Plot(label="Forecast Chart")
+        json_output = gr.JSON(label="Prediction Result")
+        # UI button - with plot
+        predict_btn.click(
+            fn=predict_btc,
+            inputs=[align_checkbox],
+            outputs=[plot_output, json_output],
+            api_name="predict"
+        )
+    with gr.Tab("API Only (Fast)"):
+        gr.Markdown("""
+        **Fast API endpoint** - Returns JSON only, no plot generation.
+        Use this for programmatic access when you don't need the chart.
+        """)
+        api_align_checkbox = gr.Checkbox(
+            label="Align to Hour (match official demo)",
+            value=True
+        )
+        api_btn = gr.Button("Get Prediction (API)", variant="primary")
+        api_json_output = gr.JSON(label="Prediction Result")
+        api_btn.click(
+            fn=predict_btc_api,
+            inputs=[api_align_checkbox],
+            outputs=[api_json_output],
+            api_name="predict_api"
+        )
+    with gr.Tab("Detail API (All Samples)"):
+        gr.Markdown("""
+        **Detail API endpoint** - Returns all Monte Carlo sample paths with full OHLCV data.
+        Use this for detailed analysis when you need all individual prediction paths, not just summary statistics (mean/min/max).
+        **Response includes:**
+        - `predictions`: Summary statistics for close price (mean, min, max)
+        - `all_samples.open`: All open price prediction paths (pred-1, pred-2, ..., pred-N)
+        - `all_samples.high`: All high price prediction paths
+        - `all_samples.low`: All low price prediction paths
+        - `all_samples.close`: All close price prediction paths
+        - `all_samples.volume`: All volume prediction paths
+        """)
+        detail_align_checkbox = gr.Checkbox(
+            label="Align to Hour (match official demo)",
+            value=True
+        )
+        detail_btn = gr.Button("Get Detail Prediction", variant="primary")
+        detail_json_output = gr.JSON(label="Detail Prediction Result")
+        detail_btn.click(
+            fn=predict_btc_detail,
+            inputs=[detail_align_checkbox],
+            outputs=[detail_json_output],
+            api_name="predict_all"
+        )
+    with gr.Tab("Custom Prediction"):
+        gr.Markdown("""
+        Provide your own OHLCV data for prediction.
+        **Input Format:**
+        ```json
+        {
+            "timestamps": ["2024-01-01T00:00:00", "2024-01-01T01:00:00", ...],
+            "open": [100.0, 101.0, ...],
+            "high": [101.0, 102.0, ...],
+            "low": [99.0, 100.0, ...],
+            "close": [100.5, 101.5, ...],
+            "volume": [1000.0, 1100.0, ...],
+            "amount": [100000.0, 110000.0, ...]
+        }
+        ```
+        """)
+        with gr.Row():
+            with gr.Column():
+                data_input = gr.Textbox(
+                    label="Historical Data (JSON)",
+                    placeholder='{"timestamps": [...], "open": [...], "high": [...], "low": [...], "close": [...]}',
+                    lines=10
+                )
+                with gr.Row():
+                    horizon_input = gr.Slider(1, 48, value=24, step=1, label="Prediction Horizon (hours)")
+                    samples_input = gr.Slider(1, 100, value=30, step=1, label="Sample Count")
+                with gr.Row():
+                    temp_input = gr.Slider(0.1, 2.0, value=1.0, step=0.1, label="Temperature")
+                    topp_input = gr.Slider(0.1, 1.0, value=0.95, step=0.05, label="Top-p")
+                custom_btn = gr.Button("Predict", variant="primary")
+            with gr.Column():
+                custom_output = gr.JSON(label="Prediction Result")
+        custom_btn.click(
+            fn=predict_custom,
+            inputs=[data_input, horizon_input, samples_input, temp_input, topp_input],
+            outputs=custom_output,
+            api_name="predict_custom"
+        )
+    gr.Markdown("""
+    ---
+    **Model:** Kronos-mini (4.1M params) | **Paper:** [arXiv:2508.02739](https://arxiv.org/abs/2508.02739)
+    """)
+# Pre-load model on startup
+print("Pre-loading model...")
+load_model()
+print("Model ready!")
+if __name__ == "__main__":
+    demo.launch()

client.py ADDED Viewed

	@@ -0,0 +1,410 @@

+#!/usr/bin/env python3
+"""
+Kronos BTC 预测 API 测试客户端
+可直接运行来验证 HuggingFace Space API 是否正常工作。
+使用方法:
+    # 测试健康检查
+    python client.py health
+    # 测试预测 API
+    python client.py predict
+    # 测试交易信号 API
+    python client.py signal
+    # 运行所有测试
+    python client.py all
+    # 使用自定义 URL
+    python client.py all --url https://your-space.hf.space
+"""
+import argparse
+import json
+import sys
+import time
+from datetime import datetime, timedelta
+from typing import List, Dict, Any, Optional
+import requests
+# ==================== 配置 ====================
+DEFAULT_API_URL = "https://xianqiu-qlang.hf.space"
+# 币安 API
+BINANCE_API = "https://api.binance.com/api/v3/klines"
+# ==================== 辅助函数 ====================
+def fetch_btc_data(symbol: str = "BTCUSDT", interval: str = "1h", limit: int = 200) -> List[Dict]:
+    """
+    从币安获取 BTC K线数据
+    Args:
+        symbol: 交易对
+        interval: K线周期 (1h, 4h, 1d 等)
+        limit: 获取条数 (最大 1000)
+    Returns:
+        OHLCV 数据列表
+    """
+    print(f"[Binance] 获取 {symbol} {interval} K线数据 (最近 {limit} 条)...")
+    params = {
+        "symbol": symbol,
+        "interval": interval,
+        "limit": limit
+    }
+    try:
+        response = requests.get(BINANCE_API, params=params, timeout=10)
+        response.raise_for_status()
+        data = response.json()
+    except requests.exceptions.RequestException as e:
+        print(f"[Error] 无法连接币安 API: {e}")
+        print("[Info] 使用模拟数据...")
+        return generate_mock_data(limit)
+    ohlcv_list = []
+    for item in data:
+        ohlcv_list.append({
+            "timestamp": datetime.fromtimestamp(item[0] / 1000).isoformat(),
+            "open": float(item[1]),
+            "high": float(item[2]),
+            "low": float(item[3]),
+            "close": float(item[4]),
+            "volume": float(item[5]),
+            "amount": float(item[7])  # Quote asset volume
+        })
+    print(f"[OK] 获取到 {len(ohlcv_list)} 条数据")
+    print(f"     时间范围: {ohlcv_list[0]['timestamp']} ~ {ohlcv_list[-1]['timestamp']}")
+    print(f"     当前价格: ${ohlcv_list[-1]['close']:,.2f}")
+    return ohlcv_list
+def generate_mock_data(n: int = 200) -> List[Dict]:
+    """生成模拟 K线数据 (当币安 API 不可用时使用)"""
+    import random
+    base_price = 100000.0
+    data = []
+    current_time = datetime.utcnow() - timedelta(hours=n)
+    for i in range(n):
+        change = random.gauss(0, 0.01)  # 1% 标准差
+        base_price *= (1 + change)
+        high = base_price * (1 + random.random() * 0.005)
+        low = base_price * (1 - random.random() * 0.005)
+        close = random.uniform(low, high)
+        data.append({
+            "timestamp": current_time.isoformat(),
+            "open": round(base_price, 2),
+            "high": round(high, 2),
+            "low": round(low, 2),
+            "close": round(close, 2),
+            "volume": round(random.uniform(100, 1000), 2),
+            "amount": round(random.uniform(1000000, 10000000), 2)
+        })
+        current_time += timedelta(hours=1)
+    return data
+def print_json(data: Any, title: str = None):
+    """美化打印 JSON"""
+    if title:
+        print(f"\n{'='*60}")
+        print(f"  {title}")
+        print(f"{'='*60}")
+    print(json.dumps(data, indent=2, ensure_ascii=False))
+# ==================== API 测试函数 ====================
+def test_health(base_url: str) -> bool:
+    """测试健康检查 API"""
+    print("\n" + "="*60)
+    print("  TEST: /health")
+    print("="*60)
+    url = f"{base_url}/health"
+    print(f"[Request] GET {url}")
+    try:
+        start = time.time()
+        response = requests.get(url, timeout=30)
+        elapsed = time.time() - start
+        print(f"[Response] Status: {response.status_code} ({elapsed:.2f}s)")
+        if response.status_code == 200:
+            data = response.json()
+            print(f"\n[Result]")
+            print(f"  Status:        {data.get('status', 'N/A')}")
+            print(f"  Model Loaded:  {data.get('model_loaded', 'N/A')}")
+            print(f"  Model Version: {data.get('model_version', 'N/A')}")
+            print(f"  Device:        {data.get('device', 'N/A')}")
+            print(f"  Timestamp:     {data.get('timestamp', 'N/A')}")
+            return True
+        else:
+            print(f"[Error] {response.text}")
+            return False
+    except requests.exceptions.RequestException as e:
+        print(f"[Error] 请求失败: {e}")
+        return False
+def test_predict(base_url: str, data: List[Dict] = None) -> bool:
+    """测试预测 API"""
+    print("\n" + "="*60)
+    print("  TEST: /predict")
+    print("="*60)
+    # 获取数据
+    if data is None:
+        data = fetch_btc_data(limit=200)
+    url = f"{base_url}/predict"
+    payload = {
+        "data": data,
+        "pred_len": 24,
+        "n_paths": 30,
+        "temperature": 1.0,
+        "top_p": 0.9
+    }
+    print(f"\n[Request] POST {url}")
+    print(f"  数据点数:   {len(data)}")
+    print(f"  预测长度:   {payload['pred_len']} 小时")
+    print(f"  Monte Carlo: {payload['n_paths']} 路径")
+    try:
+        start = time.time()
+        response = requests.post(url, json=payload, timeout=120)
+        elapsed = time.time() - start
+        print(f"\n[Response] Status: {response.status_code} ({elapsed:.2f}s)")
+        if response.status_code == 200:
+            result = response.json()
+            print(f"\n[Result]")
+            print(f"  当前价格:     ${result.get('current_price', 0):,.2f}")
+            print(f"  预测均值:     ${result.get('mean_forecast', 0):,.2f}")
+            print(f"  预测范围:     ${result.get('min_forecast', 0):,.2f} ~ ${result.get('max_forecast', 0):,.2f}")
+            print(f"  上涨概率:     {result.get('upside_probability', 0)*100:.1f}%")
+            print(f"  预期收益:     {result.get('expected_return', 0)*100:.2f}%")
+            print(f"  波动放大:     {result.get('volatility_amplification', 0):.2f}x")
+            print(f"  置信度:       {result.get('confidence', 0)*100:.1f}%")
+            print(f"  预测点数:     {len(result.get('forecast_prices', []))} 个")
+            # 显示部分预测价格
+            prices = result.get('forecast_prices', [])
+            if prices:
+                print(f"\n  预测价格趋势 (每6小时):")
+                for i in range(0, len(prices), 6):
+                    print(f"    +{i}h: ${prices[i]:,.2f}")
+            return True
+        elif response.status_code == 503:
+            print(f"[Warning] 模型未加载，请稍后重试")
+            print(f"  Response: {response.text}")
+            return False
+        else:
+            print(f"[Error] {response.text}")
+            return False
+    except requests.exceptions.Timeout:
+        print(f"[Error] 请求超时 (>120s)")
+        return False
+    except requests.exceptions.RequestException as e:
+        print(f"[Error] 请求失败: {e}")
+        return False
+def test_signal(base_url: str, data: List[Dict] = None) -> bool:
+    """测试交易信号 API"""
+    print("\n" + "="*60)
+    print("  TEST: /signal")
+    print("="*60)
+    # 获取数据
+    if data is None:
+        data = fetch_btc_data(limit=200)
+    url = f"{base_url}/signal"
+    payload = {
+        "data": data,
+        "buy_threshold": 0.58,
+        "sell_threshold": 0.42,
+        "stop_loss": 0.03,
+        "take_profit": 0.08,
+        "n_paths": 30
+    }
+    print(f"\n[Request] POST {url}")
+    print(f"  数据点数:     {len(data)}")
+    print(f"  买入阈值:     {payload['buy_threshold']}")
+    print(f"  卖出阈值:     {payload['sell_threshold']}")
+    print(f"  止损比例:     {payload['stop_loss']*100:.1f}%")
+    print(f"  止盈比例:     {payload['take_profit']*100:.1f}%")
+    try:
+        start = time.time()
+        response = requests.post(url, json=payload, timeout=120)
+        elapsed = time.time() - start
+        print(f"\n[Response] Status: {response.status_code} ({elapsed:.2f}s)")
+        if response.status_code == 200:
+            result = response.json()
+            signal = result.get('signal', 'N/A')
+            # 信号颜色
+            signal_icons = {
+                'STRONG_BUY': '[++]',
+                'BUY': '[+]',
+                'HOLD': '[=]',
+                'SELL': '[-]',
+                'STRONG_SELL': '[--]'
+            }
+            print(f"\n[Result]")
+            print(f"  信号:         {signal_icons.get(signal, '')} {signal}")
+            print(f"  置信度:       {result.get('confidence', 0)*100:.1f}%")
+            print(f"  当前价格:     ${result.get('current_price', 0):,.2f}")
+            print(f"  目标价格:     ${result.get('target_price', 0):,.2f}")
+            print(f"  止损价格:     ${result.get('stop_loss_price', 0):,.2f}")
+            print(f"  止盈价格:     ${result.get('take_profit_price', 0):,.2f}")
+            print(f"  上涨概率:     {result.get('upside_probability', 0)*100:.1f}%")
+            print(f"  预期收益:     {result.get('expected_return', 0)*100:.2f}%")
+            print(f"  建议仓位:     {result.get('suggested_position_size', 0)*100:.1f}%")
+            print(f"  原因:         {result.get('reason', 'N/A')}")
+            return True
+        elif response.status_code == 503:
+            print(f"[Warning] 模型未加载，请稍后重试")
+            print(f"  Response: {response.text}")
+            return False
+        else:
+            print(f"[Error] {response.text}")
+            return False
+    except requests.exceptions.Timeout:
+        print(f"[Error] 请求超时 (>120s)")
+        return False
+    except requests.exceptions.RequestException as e:
+        print(f"[Error] 请求失败: {e}")
+        return False
+def run_all_tests(base_url: str) -> bool:
+    """运行所有测试"""
+    print("\n" + "#"*60)
+    print("#")
+    print("#  Kronos BTC 预测 API 测试")
+    print(f"#  URL: {base_url}")
+    print(f"#  时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    print("#")
+    print("#"*60)
+    results = {}
+    # 1. 健康检查
+    results['health'] = test_health(base_url)
+    if not results['health']:
+        print("\n[Warning] 健康检查失败，API 可能未启动")
+        print("  请检查 HuggingFace Space 是否正在运行")
+        return False
+    # 2. 获取数据 (只获取一次，两个测试共用)
+    print("\n" + "-"*60)
+    data = fetch_btc_data(limit=200)
+    # 3. 预测测试
+    results['predict'] = test_predict(base_url, data)
+    # 4. 信号测试
+    results['signal'] = test_signal(base_url, data)
+    # 汇总
+    print("\n" + "="*60)
+    print("  测试结果汇总")
+    print("="*60)
+    for test_name, passed in results.items():
+        status = "PASS" if passed else "FAIL"
+        icon = "[OK]" if passed else "[X]"
+        print(f"  {icon} {test_name}: {status}")
+    all_passed = all(results.values())
+    print("\n" + "-"*60)
+    if all_passed:
+        print("  所有测试通过!")
+    else:
+        print("  部分测试失败，请检查 API 状态")
+    print("-"*60)
+    return all_passed
+# ==================== 主函数 ====================
+def main():
+    parser = argparse.ArgumentParser(
+        description="Kronos BTC 预测 API 测试客户端",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+示例:
+  python client.py health              # 测试健康检查
+  python client.py predict             # 测试预测 API
+  python client.py signal              # 测试交易信号 API
+  python client.py all                 # 运行所有测试
+  python client.py all --url http://localhost:7860  # 测试本地服务
+        """
+    )
+    parser.add_argument(
+        "command",
+        choices=["health", "predict", "signal", "all"],
+        help="要执行的测试命令"
+    )
+    parser.add_argument(
+        "--url",
+        default=DEFAULT_API_URL,
+        help=f"API 地址 (默认: {DEFAULT_API_URL})"
+    )
+    args = parser.parse_args()
+    # 执行测试
+    if args.command == "health":
+        success = test_health(args.url)
+    elif args.command == "predict":
+        success = test_predict(args.url)
+    elif args.command == "signal":
+        success = test_signal(args.url)
+    elif args.command == "all":
+        success = run_all_tests(args.url)
+    else:
+        parser.print_help()
+        sys.exit(1)
+    sys.exit(0 if success else 1)
+if __name__ == "__main__":
+    main()

model/__init__.py ADDED Viewed

	@@ -0,0 +1,17 @@

+from .kronos import KronosTokenizer, Kronos, KronosPredictor
+model_dict = {
+    'kronos_tokenizer': KronosTokenizer,
+    'kronos': Kronos,
+    'kronos_predictor': KronosPredictor
+}
+def get_model_class(model_name):
+    if model_name in model_dict:
+        return model_dict[model_name]
+    else:
+        print(f"Model {model_name} not found in model_dict")
+        raise NotImplementedError

model/kronos.py ADDED Viewed

	@@ -0,0 +1,589 @@

+import numpy as np
+import pandas as pd
+import torch
+from huggingface_hub import PyTorchModelHubMixin
+import sys
+from tqdm import trange
+sys.path.append("../")
+from model.module import *
+class KronosTokenizer(nn.Module, PyTorchModelHubMixin):
+    """
+    KronosTokenizer module for tokenizing input data using a hybrid quantization approach.
+    This tokenizer utilizes a combination of encoder and decoder Transformer blocks
+    along with the Binary Spherical Quantization (BSQuantizer) to compress and decompress input data.
+    Args:
+           d_in (int): Input dimension.
+           d_model (int): Model dimension.
+           n_heads (int): Number of attention heads.
+           ff_dim (int): Feed-forward dimension.
+           n_enc_layers (int): Number of encoder layers.
+           n_dec_layers (int): Number of decoder layers.
+           ffn_dropout_p (float): Dropout probability for feed-forward networks.
+           attn_dropout_p (float): Dropout probability for attention mechanisms.
+           resid_dropout_p (float): Dropout probability for residual connections.
+           s1_bits (int): Number of bits for the pre token in BSQuantizer.
+           s2_bits (int): Number of bits for the post token in BSQuantizer.
+           beta (float): Beta parameter for BSQuantizer.
+           gamma0 (float): Gamma0 parameter for BSQuantizer.
+           gamma (float): Gamma parameter for BSQuantizer.
+           zeta (float): Zeta parameter for BSQuantizer.
+           group_size (int): Group size parameter for BSQuantizer.
+    """
+    def __init__(self, d_in, d_model, n_heads, ff_dim, n_enc_layers, n_dec_layers, ffn_dropout_p, attn_dropout_p, resid_dropout_p, s1_bits, s2_bits, beta, gamma0, gamma, zeta, group_size):
+        super().__init__()
+        self.d_in = d_in
+        self.d_model = d_model
+        self.n_heads = n_heads
+        self.ff_dim = ff_dim
+        self.enc_layers = n_enc_layers
+        self.dec_layers = n_dec_layers
+        self.ffn_dropout_p = ffn_dropout_p
+        self.attn_dropout_p = attn_dropout_p
+        self.resid_dropout_p = resid_dropout_p
+        self.s1_bits = s1_bits
+        self.s2_bits = s2_bits
+        self.codebook_dim = s1_bits + s2_bits # Total dimension of the codebook after quantization
+        self.embed = nn.Linear(self.d_in, self.d_model)
+        self.head = nn.Linear(self.d_model, self.d_in)
+        # Encoder Transformer Blocks
+        self.encoder = nn.ModuleList([
+            TransformerBlock(self.d_model, self.n_heads, self.ff_dim, self.ffn_dropout_p, self.attn_dropout_p, self.resid_dropout_p)
+            for _ in range(self.enc_layers - 1)
+        ])
+        # Decoder Transformer Blocks
+        self.decoder = nn.ModuleList([
+            TransformerBlock(self.d_model, self.n_heads, self.ff_dim, self.ffn_dropout_p, self.attn_dropout_p, self.resid_dropout_p)
+            for _ in range(self.dec_layers - 1)
+        ])
+        self.quant_embed = nn.Linear(in_features=self.d_model, out_features=self.codebook_dim) # Linear layer before quantization
+        self.post_quant_embed_pre = nn.Linear(in_features=self.s1_bits, out_features=self.d_model) # Linear layer after quantization (pre part - s1 bits)
+        self.post_quant_embed = nn.Linear(in_features=self.codebook_dim, out_features=self.d_model) # Linear layer after quantization (full codebook)
+        self.tokenizer = BSQuantizer(self.s1_bits, self.s2_bits, beta, gamma0, gamma, zeta, group_size) # BSQuantizer module
+    def forward(self, x):
+        """
+        Forward pass of the KronosTokenizer.
+        Args:
+            x (torch.Tensor): Input tensor of shape (batch_size, seq_len, d_in).
+        Returns:
+            tuple: A tuple containing:
+                - tuple: (z_pre, z) - Reconstructed outputs from decoder with s1_bits and full codebook respectively,
+                         both of shape (batch_size, seq_len, d_in).
+                - torch.Tensor: bsq_loss - Loss from the BSQuantizer.
+                - torch.Tensor: quantized - Quantized representation from BSQuantizer.
+                - torch.Tensor: z_indices - Indices from the BSQuantizer.
+        """
+        z = self.embed(x)
+        for layer in self.encoder:
+            z = layer(z)
+        z = self.quant_embed(z) # (B, T, codebook)
+        bsq_loss, quantized, z_indices = self.tokenizer(z)
+        quantized_pre = quantized[:, :, :self.s1_bits] # Extract the first part of quantized representation (s1_bits)
+        z_pre = self.post_quant_embed_pre(quantized_pre)
+        z = self.post_quant_embed(quantized)
+        # Decoder layers (for pre part - s1 bits)
+        for layer in self.decoder:
+            z_pre = layer(z_pre)
+        z_pre = self.head(z_pre)
+        # Decoder layers (for full codebook)
+        for layer in self.decoder:
+            z = layer(z)
+        z = self.head(z)
+        return (z_pre, z), bsq_loss, quantized, z_indices
+    def indices_to_bits(self, x, half=False):
+        """
+        Converts indices to bit representations and scales them.
+        Args:
+            x (torch.Tensor): Indices tensor.
+            half (bool, optional): Whether to process only half of the codebook dimension. Defaults to False.
+        Returns:
+            torch.Tensor: Bit representation tensor.
+        """
+        if half:
+            x1 = x[0] # Assuming x is a tuple of indices if half is True
+            x2 = x[1]
+            mask = 2 ** torch.arange(self.codebook_dim//2, device=x1.device, dtype=torch.long) # Create a mask for bit extraction
+            x1 = (x1.unsqueeze(-1) & mask) != 0 # Extract bits for the first half
+            x2 = (x2.unsqueeze(-1) & mask) != 0 # Extract bits for the second half
+            x = torch.cat([x1, x2], dim=-1) # Concatenate the bit representations
+        else:
+            mask = 2 ** torch.arange(self.codebook_dim, device=x.device, dtype=torch.long) # Create a mask for bit extraction
+            x = (x.unsqueeze(-1) & mask) != 0 # Extract bits
+        x = x.float() * 2 - 1 # Convert boolean to bipolar (-1, 1)
+        q_scale = 1. / (self.codebook_dim ** 0.5) # Scaling factor
+        x = x * q_scale
+        return x
+    def encode(self, x, half=False):
+        """
+        Encodes the input data into quantized indices.
+        Args:
+            x (torch.Tensor): Input tensor of shape (batch_size, seq_len, d_in).
+            half (bool, optional): Whether to use half quantization in BSQuantizer. Defaults to False.
+        Returns:
+            torch.Tensor: Quantized indices from BSQuantizer.
+        """
+        z = self.embed(x)
+        for layer in self.encoder:
+            z = layer(z)
+        z = self.quant_embed(z)
+        bsq_loss, quantized, z_indices = self.tokenizer(z, half)
+        return z_indices
+    def decode(self, x, half=False):
+        """
+        Decodes quantized indices back to the input data space.
+        Args:
+            x (torch.Tensor): Quantized indices tensor.
+            half (bool, optional): Whether the indices were generated with half quantization. Defaults to False.
+        Returns:
+            torch.Tensor: Reconstructed output tensor of shape (batch_size, seq_len, d_in).
+        """
+        quantized = self.indices_to_bits(x, half)
+        z = self.post_quant_embed(quantized)
+        for layer in self.decoder:
+            z = layer(z)
+        z = self.head(z)
+        return z
+class Kronos(nn.Module, PyTorchModelHubMixin):
+    """
+    Kronos Model.
+    Args:
+        s1_bits (int): Number of bits for pre tokens.
+        s2_bits (int): Number of bits for post tokens.
+        n_layers (int): Number of Transformer blocks.
+        d_model (int): Dimension of the model's embeddings and hidden states.
+        n_heads (int): Number of attention heads in the MultiheadAttention layers.
+        ff_dim (int): Dimension of the feedforward network in the Transformer blocks.
+        ffn_dropout_p (float): Dropout probability for the feedforward network.
+        attn_dropout_p (float): Dropout probability for the attention layers.
+        resid_dropout_p (float): Dropout probability for residual connections.
+        token_dropout_p (float): Dropout probability for token embeddings.
+        learn_te (bool): Whether to use learnable temporal embeddings.
+    """
+    def __init__(self, s1_bits, s2_bits, n_layers, d_model, n_heads, ff_dim, ffn_dropout_p, attn_dropout_p, resid_dropout_p, token_dropout_p, learn_te):
+        super().__init__()
+        self.s1_bits = s1_bits
+        self.s2_bits = s2_bits
+        self.n_layers = n_layers
+        self.d_model = d_model
+        self.n_heads = n_heads
+        self.learn_te = learn_te
+        self.ff_dim = ff_dim
+        self.ffn_dropout_p = ffn_dropout_p
+        self.attn_dropout_p = attn_dropout_p
+        self.resid_dropout_p = resid_dropout_p
+        self.token_dropout_p = token_dropout_p
+        self.s1_vocab_size = 2 ** self.s1_bits
+        self.token_drop = nn.Dropout(self.token_dropout_p)
+        self.embedding = HierarchicalEmbedding(self.s1_bits, self.s2_bits, self.d_model)
+        self.time_emb = TemporalEmbedding(self.d_model, self.learn_te)
+        self.transformer = nn.ModuleList([
+            TransformerBlock(self.d_model, self.n_heads, self.ff_dim, self.ffn_dropout_p, self.attn_dropout_p, self.resid_dropout_p)
+            for _ in range(self.n_layers)
+        ])
+        self.norm = RMSNorm(self.d_model)
+        self.dep_layer = DependencyAwareLayer(self.d_model)
+        self.head = DualHead(self.s1_bits, self.s2_bits, self.d_model)
+        self.apply(self._init_weights)
+    def _init_weights(self, module):
+        if isinstance(module, nn.Linear):
+            nn.init.xavier_normal_(module.weight)
+            if module.bias is not None:
+                nn.init.zeros_(module.bias)
+        elif isinstance(module, nn.Embedding):
+            nn.init.normal_(module.weight, mean=0, std=self.embedding.d_model ** -0.5)
+        elif isinstance(module, nn.LayerNorm):
+            nn.init.ones_(module.weight)
+            nn.init.zeros_(module.bias)
+        elif isinstance(module, RMSNorm):
+            nn.init.ones_(module.weight)
+    def forward(self, s1_ids, s2_ids, stamp=None, padding_mask=None, use_teacher_forcing=False, s1_targets=None):
+        """
+        Args:
+            s1_ids (torch.Tensor): Input tensor of s1 token IDs. Shape: [batch_size, seq_len]
+            s2_ids (torch.Tensor): Input tensor of s2 token IDs. Shape: [batch_size, seq_len]
+            stamp (torch.Tensor, optional): Temporal stamp tensor. Shape: [batch_size, seq_len]. Defaults to None.
+            padding_mask (torch.Tensor, optional): Mask for padding tokens. Shape: [batch_size, seq_len]. Defaults to None.
+            use_teacher_forcing (bool, optional): Whether to use teacher forcing for s1 decoding. Defaults to False.
+            s1_targets (torch.Tensor, optional): Target s1 token IDs for teacher forcing. Shape: [batch_size, seq_len]. Defaults to None.
+        Returns:
+            Tuple[torch.Tensor, torch.Tensor]:
+                - s1 logits: Logits for s1 token predictions. Shape: [batch_size, seq_len, s1_vocab_size]
+                - s2_logits: Logits for s2 token predictions, conditioned on s1. Shape: [batch_size, seq_len, s2_vocab_size]
+        """
+        x = self.embedding([s1_ids, s2_ids])
+        if stamp is not None:
+            time_embedding = self.time_emb(stamp)
+            x = x + time_embedding
+        x = self.token_drop(x)
+        for layer in self.transformer:
+            x = layer(x, key_padding_mask=padding_mask)
+        x = self.norm(x)
+        s1_logits = self.head(x)
+        if use_teacher_forcing:
+            sibling_embed = self.embedding.emb_s1(s1_targets)
+        else:
+            s1_probs = F.softmax(s1_logits.detach(), dim=-1)
+            sample_s1_ids = torch.multinomial(s1_probs.view(-1, self.s1_vocab_size), 1).view(s1_ids.shape)
+            sibling_embed = self.embedding.emb_s1(sample_s1_ids)
+        x2 = self.dep_layer(x, sibling_embed, key_padding_mask=padding_mask) # Dependency Aware Layer: Condition on s1 embeddings
+        s2_logits = self.head.cond_forward(x2)
+        return s1_logits, s2_logits
+    def decode_s1(self, s1_ids, s2_ids, stamp=None, padding_mask=None):
+        """
+        Decodes only the s1 tokens.
+        This method performs a forward pass to predict only s1 tokens. It returns the s1 logits
+        and the context representation from the Transformer, which can be used for subsequent s2 decoding.
+        Args:
+            s1_ids (torch.Tensor): Input tensor of s1 token IDs. Shape: [batch_size, seq_len]
+            s2_ids (torch.Tensor): Input tensor of s2 token IDs. Shape: [batch_size, seq_len]
+            stamp (torch.Tensor, optional): Temporal stamp tensor. Shape: [batch_size, seq_len]. Defaults to None.
+            padding_mask (torch.Tensor, optional): Mask for padding tokens. Shape: [batch_size, seq_len]. Defaults to None.
+        Returns:
+            Tuple[torch.Tensor, torch.Tensor]:
+                - s1 logits: Logits for s1 token predictions. Shape: [batch_size, seq_len, s1_vocab_size]
+                - context: Context representation from the Transformer. Shape: [batch_size, seq_len, d_model]
+        """
+        x = self.embedding([s1_ids, s2_ids])
+        if stamp is not None:
+            time_embedding = self.time_emb(stamp)
+            x = x + time_embedding
+        x = self.token_drop(x)
+        for layer in self.transformer:
+            x = layer(x, key_padding_mask=padding_mask)
+        x = self.norm(x)
+        s1_logits = self.head(x)
+        return s1_logits, x
+    def decode_s2(self, context, s1_ids, padding_mask=None):
+        """
+        Decodes the s2 tokens, conditioned on the context and s1 tokens.
+        This method decodes s2 tokens based on a pre-computed context representation (typically from `decode_s1`)
+        and the s1 token IDs. It uses the dependency-aware layer and the conditional s2 head to predict s2 tokens.
+        Args:
+            context (torch.Tensor): Context representation from the transformer (output of decode_s1).
+                                     Shape: [batch_size, seq_len, d_model]
+            s1_ids (torch.torch.Tensor): Input tensor of s1 token IDs. Shape: [batch_size, seq_len]
+            padding_mask (torch.Tensor, optional): Mask for padding tokens. Shape: [batch_size, seq_len]. Defaults to None.
+        Returns:
+            torch.Tensor: s2 logits. Shape: [batch_size, seq_len, s2_vocab_size]
+        """
+        sibling_embed = self.embedding.emb_s1(s1_ids)
+        x2 = self.dep_layer(context, sibling_embed, key_padding_mask=padding_mask)
+        return self.head.cond_forward(x2)
+def top_k_top_p_filtering(
+        logits,
+        top_k: int = 0,
+        top_p: float = 1.0,
+        filter_value: float = -float("Inf"),
+        min_tokens_to_keep: int = 1,
+):
+    """Filter a distribution of logits using top-k and/or nucleus (top-p) filtering
+    Args:
+        logits: logits distribution shape (batch size, vocabulary size)
+        if top_k > 0: keep only top k tokens with highest probability (top-k filtering).
+        if top_p < 1.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).
+            Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
+        Make sure we keep at least min_tokens_to_keep per batch example in the output
+    From: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317
+    """
+    if top_k > 0:
+        top_k = min(max(top_k, min_tokens_to_keep), logits.size(-1))  # Safety check
+        # Remove all tokens with a probability less than the last token of the top-k
+        indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]
+        logits[indices_to_remove] = filter_value
+        return logits
+    if top_p < 1.0:
+        sorted_logits, sorted_indices = torch.sort(logits, descending=True)
+        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)
+        # Remove tokens with cumulative probability above the threshold (token with 0 are kept)
+        sorted_indices_to_remove = cumulative_probs > top_p
+        if min_tokens_to_keep > 1:
+            # Keep at least min_tokens_to_keep (set to min_tokens_to_keep-1 because we add the first one below)
+            sorted_indices_to_remove[..., :min_tokens_to_keep] = 0
+        # Shift the indices to the right to keep also the first token above the threshold
+        sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
+        sorted_indices_to_remove[..., 0] = 0
+        # scatter sorted tensors to original indexing
+        indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
+        logits[indices_to_remove] = filter_value
+        return logits
+def sample_from_logits(logits, temperature=1.0, top_k=None, top_p=None, sample_logits=True):
+    logits = logits / temperature
+    if top_k is not None or top_p is not None:
+        if top_k > 0 or top_p < 1.0:
+            logits = top_k_top_p_filtering(logits, top_k=top_k, top_p=top_p)
+    probs = F.softmax(logits, dim=-1)
+    if not sample_logits:
+        _, x = top_k(probs, k=1, dim=-1)
+    else:
+        x = torch.multinomial(probs, num_samples=1)
+    return x
+def auto_regressive_inference(tokenizer, model, x, x_stamp, y_stamp, max_context, pred_len, clip=5, T=1.0, top_k=0, top_p=0.99, sample_count=5, verbose=False):
+    with torch.no_grad():
+        batch_size = x.size(0)
+        initial_seq_len = x.size(1)
+        x = torch.clip(x, -clip, clip)
+        device = x.device
+        x = x.unsqueeze(1).repeat(1, sample_count, 1, 1).reshape(-1, x.size(1), x.size(2)).to(device)
+        x_stamp = x_stamp.unsqueeze(1).repeat(1, sample_count, 1, 1).reshape(-1, x_stamp.size(1), x_stamp.size(2)).to(device)
+        y_stamp = y_stamp.unsqueeze(1).repeat(1, sample_count, 1, 1).reshape(-1, y_stamp.size(1), y_stamp.size(2)).to(device)
+        x_token = tokenizer.encode(x, half=True)
+        def get_dynamic_stamp(x_stamp, y_stamp, current_seq_len, pred_step):
+            if current_seq_len <= max_context - pred_step:
+                return torch.cat([x_stamp, y_stamp[:, :pred_step, :]], dim=1)
+            else:
+                start_idx = max_context - pred_step
+                return torch.cat([x_stamp[:, -start_idx:, :], y_stamp[:, :pred_step, :]], dim=1)
+        if verbose:
+            ran = trange
+        else:
+            ran = range
+        for i in ran(pred_len):
+            current_seq_len = initial_seq_len + i
+            if current_seq_len <= max_context:
+                input_tokens = x_token
+            else:
+                input_tokens = [t[:, -max_context:].contiguous() for t in x_token]
+            current_stamp = get_dynamic_stamp(x_stamp, y_stamp, current_seq_len, i)
+            s1_logits, context = model.decode_s1(input_tokens[0], input_tokens[1], current_stamp)
+            s1_logits = s1_logits[:, -1, :]
+            sample_pre = sample_from_logits(s1_logits, temperature=T, top_k=top_k, top_p=top_p, sample_logits=True)
+            s2_logits = model.decode_s2(context, sample_pre)
+            s2_logits = s2_logits[:, -1, :]
+            sample_post = sample_from_logits(s2_logits, temperature=T, top_k=top_k, top_p=top_p, sample_logits=True)
+            x_token[0] = torch.cat([x_token[0], sample_pre], dim=1)
+            x_token[1] = torch.cat([x_token[1], sample_post], dim=1)
+        input_tokens = [t[:, -max_context:].contiguous() for t in x_token]
+        z = tokenizer.decode(input_tokens, half=True)
+        z = z.reshape(batch_size, sample_count, z.size(1), z.size(2))
+        preds = z.cpu().numpy()
+        # preds = np.mean(preds, axis=1)
+        return preds
+def calc_time_stamps(x_timestamp):
+    time_df = pd.DataFrame()
+    time_df['minute'] = x_timestamp.dt.minute
+    time_df['hour'] = x_timestamp.dt.hour
+    time_df['weekday'] = x_timestamp.dt.weekday
+    time_df['day'] = x_timestamp.dt.day
+    time_df['month'] = x_timestamp.dt.month
+    return time_df
+class KronosPredictor:
+    def __init__(self, model, tokenizer, device="cuda:0", max_context=512, clip=5):
+        self.tokenizer = tokenizer
+        self.model = model
+        self.max_context = max_context
+        self.clip = clip
+        self.price_cols = ['open', 'high', 'low', 'close']
+        self.vol_col = 'volume'
+        self.amt_vol = 'amount'
+        self.time_cols = ['minute', 'hour', 'weekday', 'day', 'month']
+        self.device = device
+        self.tokenizer = self.tokenizer.to(self.device)
+        self.model = self.model.to(self.device)
+    def generate(self, x, x_stamp, y_stamp, pred_len, T, top_k, top_p, sample_count, verbose):
+        x_tensor = torch.from_numpy(np.array(x).astype(np.float32)).to(self.device)
+        x_stamp_tensor = torch.from_numpy(np.array(x_stamp).astype(np.float32)).to(self.device)
+        y_stamp_tensor = torch.from_numpy(np.array(y_stamp).astype(np.float32)).to(self.device)
+        preds = auto_regressive_inference(self.tokenizer, self.model, x_tensor, x_stamp_tensor, y_stamp_tensor, self.max_context, pred_len,
+                                          self.clip, T, top_k, top_p, sample_count, verbose)
+        preds = preds[:, :, -pred_len:, :]
+        return preds
+    def predict(self, df, x_timestamp, y_timestamp, pred_len, T=1.0, top_k=0, top_p=0.9, sample_count=1, verbose=True):
+        if not isinstance(df, pd.DataFrame):
+            raise ValueError("Input must be a pandas DataFrame.")
+        if not all(col in df.columns for col in self.price_cols):
+            raise ValueError(f"Price columns {self.price_cols} not found in DataFrame.")
+        df = df.copy()
+        if self.vol_col not in df.columns:
+            df[self.vol_col] = 0.0  # Fill missing volume with zeros
+            df[self.amt_vol] = 0.0  # Fill missing amount with zeros
+        if self.amt_vol not in df.columns and self.vol_col in df.columns:
+            df[self.amt_vol] = df[self.vol_col] * df[self.price_cols].mean(axis=1)
+        if df[self.price_cols + [self.vol_col, self.amt_vol]].isnull().values.any():
+            raise ValueError("Input DataFrame contains NaN values in price or volume columns.")
+        x_time_df = calc_time_stamps(x_timestamp)
+        y_time_df = calc_time_stamps(y_timestamp)
+        x = df[self.price_cols + [self.vol_col, self.amt_vol]].values.astype(np.float32)
+        x_stamp = x_time_df.values.astype(np.float32)
+        y_stamp = y_time_df.values.astype(np.float32)
+        x_mean, x_std = np.mean(x, axis=0), np.std(x, axis=0)
+        x = (x - x_mean) / (x_std + 1e-5)
+        x = np.clip(x, -self.clip, self.clip)
+        x = x[np.newaxis, :]
+        x_stamp = x_stamp[np.newaxis, :]
+        y_stamp = y_stamp[np.newaxis, :]
+        preds = self.generate(x, x_stamp, y_stamp, pred_len, T, top_k, top_p, sample_count, verbose)
+        preds = preds.squeeze(0)
+        preds = preds * (x_std[np.newaxis, :] + 1e-5) + x_mean[np.newaxis, :]
+        close_preds = preds[:, :, 3].swapaxes(0, 1)
+        volume_preds = preds[:, :, 4].swapaxes(0, 1)
+        close_df = pd.DataFrame(close_preds, columns=[f"pred-{i+1}" for i in range(sample_count)], index=y_timestamp)
+        volume_df = pd.DataFrame(volume_preds, columns=[f"pred-{i + 1}" for i in range(sample_count)], index=y_timestamp)
+        return close_df, volume_df
+    def predict_detail(self, df, x_timestamp, y_timestamp, pred_len, T=1.0, top_k=0, top_p=0.9, sample_count=1, verbose=True):
+        """
+        Predict with full OHLCV output for all Monte Carlo samples.
+        Returns:
+            dict: Dictionary containing DataFrames for each price component:
+                - 'open': DataFrame with open price predictions
+                - 'high': DataFrame with high price predictions
+                - 'low': DataFrame with low price predictions
+                - 'close': DataFrame with close price predictions
+                - 'volume': DataFrame with volume predictions
+        """
+        if not isinstance(df, pd.DataFrame):
+            raise ValueError("Input must be a pandas DataFrame.")
+        if not all(col in df.columns for col in self.price_cols):
+            raise ValueError(f"Price columns {self.price_cols} not found in DataFrame.")
+        df = df.copy()
+        if self.vol_col not in df.columns:
+            df[self.vol_col] = 0.0
+            df[self.amt_vol] = 0.0
+        if self.amt_vol not in df.columns and self.vol_col in df.columns:
+            df[self.amt_vol] = df[self.vol_col] * df[self.price_cols].mean(axis=1)
+        if df[self.price_cols + [self.vol_col, self.amt_vol]].isnull().values.any():
+            raise ValueError("Input DataFrame contains NaN values in price or volume columns.")
+        x_time_df = calc_time_stamps(x_timestamp)
+        y_time_df = calc_time_stamps(y_timestamp)
+        x = df[self.price_cols + [self.vol_col, self.amt_vol]].values.astype(np.float32)
+        x_stamp = x_time_df.values.astype(np.float32)
+        y_stamp = y_time_df.values.astype(np.float32)
+        x_mean, x_std = np.mean(x, axis=0), np.std(x, axis=0)
+        x = (x - x_mean) / (x_std + 1e-5)
+        x = np.clip(x, -self.clip, self.clip)
+        x = x[np.newaxis, :]
+        x_stamp = x_stamp[np.newaxis, :]
+        y_stamp = y_stamp[np.newaxis, :]
+        preds = self.generate(x, x_stamp, y_stamp, pred_len, T, top_k, top_p, sample_count, verbose)
+        preds = preds.squeeze(0)
+        preds = preds * (x_std[np.newaxis, :] + 1e-5) + x_mean[np.newaxis, :]
+        # Extract all OHLCV components: [sample_count, pred_len, 6]
+        # Columns: open(0), high(1), low(2), close(3), volume(4), amount(5)
+        col_names = [f"pred-{i+1}" for i in range(sample_count)]
+        result = {
+            'open': pd.DataFrame(preds[:, :, 0].swapaxes(0, 1), columns=col_names, index=y_timestamp),
+            'high': pd.DataFrame(preds[:, :, 1].swapaxes(0, 1), columns=col_names, index=y_timestamp),
+            'low': pd.DataFrame(preds[:, :, 2].swapaxes(0, 1), columns=col_names, index=y_timestamp),
+            'close': pd.DataFrame(preds[:, :, 3].swapaxes(0, 1), columns=col_names, index=y_timestamp),
+            'volume': pd.DataFrame(preds[:, :, 4].swapaxes(0, 1), columns=col_names, index=y_timestamp),
+        }
+        return result

model/module.py ADDED Viewed

	@@ -0,0 +1,580 @@

+import math
+from einops import rearrange, reduce
+import torch
+import torch.nn as nn
+from torch.autograd import Function
+import torch.nn.functional as F
+class DifferentiableEntropyFunction(Function):
+    @staticmethod
+    def forward(ctx, zq, basis, K, eps):
+        zb = (zq + 1) / 2
+        zi = ((zb * basis).sum(-1)).to(torch.int64)
+        cnt = torch.scatter_reduce(torch.zeros(2 ** K, device=zq.device, dtype=zq.dtype),
+                                   0,
+                                   zi.flatten(),
+                                   torch.ones_like(zi.flatten()).to(zq.dtype),
+                                   'sum')
+        prob = (cnt + eps) / (cnt + eps).sum()
+        H = -(prob * torch.log(prob)).sum()
+        ctx.save_for_backward(zq, zi, prob)
+        ctx.K = K
+        return H
+    @staticmethod
+    def backward(ctx, grad_output):
+        zq, zi, prob = ctx.saved_tensors
+        grad_array = -grad_output * (torch.log(prob) + 1) / zi.numel() / ctx.K
+        reord_grad = grad_array[zi.flatten()].reshape(zi.shape)
+        grad_input = reord_grad.unsqueeze(-1) * zq
+        return grad_input, None, None, None, None
+def codebook_entropy(zq, basis, K, eps=1e-4):
+    return DifferentiableEntropyFunction.apply(zq, basis, K, eps)
+class BinarySphericalQuantizer(nn.Module):
+    def __init__(self, embed_dim, beta, gamma0, gamma, zeta,
+                 input_format='bchw',
+                 soft_entropy=True, group_size=9,
+                 persample_entropy_compute='analytical',
+                 cb_entropy_compute='group',
+                 l2_norm=True,
+                 inv_temperature=1):
+        """
+        Paper link: https://arxiv.org/pdf/2406.07548.pdf
+        Here we use the official implementation of the BinarySphericalQuantizer.
+        """
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.beta = beta  # loss weight for commit loss
+        self.gamma0 = gamma0  # loss weight for entropy penalty
+        self.gamma = gamma  # loss weight for entropy penalty
+        self.zeta = zeta  # loss weight for entire entropy penalty
+        self.input_format = input_format
+        assert self.embed_dim % group_size == 0, "embed_dim must be divisible by group_size"
+        self.num_groups = self.embed_dim // group_size
+        self.group_size = group_size
+        assert persample_entropy_compute in ['group', 'analytical'], "persample_entropy_compute must be either 'group' or 'analytical'"
+        assert cb_entropy_compute in ['group', 'nce'], "cb_entropy_compute must be either 'group' or 'nce'"
+        self.persample_entropy_compute = persample_entropy_compute
+        self.cb_entropy_compute = cb_entropy_compute
+        self.l2_norm = l2_norm
+        self.inv_temperature = inv_temperature
+        self.register_buffer('basis', 2 ** torch.arange(embed_dim - 1, -1, -1))
+        self.register_buffer('group_basis', 2 ** torch.arange(group_size - 1, -1, -1))
+        self.num_dimensions = 2 ** embed_dim
+        self.bits_per_index = embed_dim
+        # we only need to keep the codebook portion up to the group size
+        # because we approximate the H loss with this subcode
+        group_codes = torch.arange(2 ** self.group_size)
+        group_codebook = self.indexes_to_codes(group_codes).float()[:, -group_size:]
+        self.register_buffer('group_codebook', group_codebook, persistent=False)
+        self.soft_entropy = soft_entropy  # soft_entropy: Sec 3.2 of https://arxiv.org/pdf/1911.05894.pdf
+    def quantize(self, z):
+        assert z.shape[-1] == self.embed_dim, f"Expected {self.embed_dim} dimensions, got {z.shape[-1]}"
+        zhat = torch.where(z > 0,
+                           torch.tensor(1, dtype=z.dtype, device=z.device),
+                           torch.tensor(-1, dtype=z.dtype, device=z.device))
+        return z + (zhat - z).detach()
+    def forward(self, z):
+        # if self.input_format == 'bchw':
+        #     z = rearrange(z, 'b c h w -> b h w c')
+        zq = self.quantize(z)
+        indices = self.codes_to_indexes(zq.detach())
+        group_indices = self.codes_to_group_indexes(zq.detach())
+        if not self.training:
+            used_codes = torch.unique(indices, return_counts=False)
+        else:
+            used_codes = None
+        q_scale = 1. / (self.embed_dim ** 0.5) if self.l2_norm else 1.
+        if self.soft_entropy:
+            persample_entropy, cb_entropy, avg_prob = self.soft_entropy_loss(z)
+            entropy_penalty = self.gamma0 * persample_entropy - self.gamma * cb_entropy
+        else:
+            zb_by_sample = ((zq + 1) / 2).reshape(z.shape[0], -1, z.shape[-1]).to(torch.float32)
+            persample_entropy = self.get_hard_per_sample_entropy(zb_by_sample)
+            cb_entropy = codebook_entropy(zq, self.basis, self.embed_dim)
+            entropy_penalty = self.gamma0 * persample_entropy - self.gamma * cb_entropy
+        zq = zq * q_scale
+        # commit loss
+        commit_loss = self.beta * torch.mean(((zq.detach() - z) ** 2).sum(dim=-1))
+        # if self.input_format == 'bchw':
+        #     zq = rearrange(zq, 'b h w c -> b c h w')
+        return (
+            zq,
+            commit_loss + self.zeta * entropy_penalty / self.inv_temperature,
+            {"H": cb_entropy, "used_codes": used_codes, "indices": indices, "group_indices": group_indices,
+             "avg_prob": avg_prob}
+        )
+    def soft_entropy_loss(self, z):
+        # if we divide the code in subgroups of size group_size, the codebook will be of size 2 ** group_size
+        # the sub-code is the last group_size bits of the full code
+        group_code_book = self.group_codebook / (self.embed_dim ** 0.5 if self.l2_norm else 1)
+        divided_z = rearrange(z, '... (g c) -> ... g c', c=self.group_size)
+        # we calculate the distance between the divided_z and the codebook for each subgroup
+        distance = - 2 * torch.einsum('... g c, d c ->... g d', divided_z, group_code_book)
+        prob = (-distance * self.inv_temperature).softmax(dim=-1)
+        if self.persample_entropy_compute == 'analytical':
+            if self.l2_norm:
+                p = torch.sigmoid(-4 * z / (self.embed_dim ** 0.5) * self.inv_temperature)
+            else:
+                p = torch.sigmoid(-4 * z * self.inv_temperature)
+            prob = torch.stack([p, 1 - p], dim=-1)
+            per_sample_entropy = self.get_entropy(prob, dim=-1, normalize=False).sum(dim=-1).mean()
+        else:
+            per_sample_entropy = self.get_entropy(prob, dim=-1, normalize=False).sum(dim=-1).mean()
+        # macro average of the probability of each subgroup
+        avg_prob = reduce(prob, '... g d ->g d', 'mean')
+        codebook_entropy = self.get_entropy(avg_prob, dim=-1, normalize=False)
+        # the approximation of the entropy is the sum of the entropy of each subgroup
+        return per_sample_entropy, codebook_entropy.sum(), avg_prob
+    def get_hard_per_sample_entropy(self, zb_by_sample):
+        probs_per_dim = zb_by_sample.sum(1) / zb_by_sample.shape[1]
+        persample_entropy = - probs_per_dim * torch.log(probs_per_dim + 1e-8) - (1 - probs_per_dim) * torch.log(1 - probs_per_dim + 1e-8)
+        persample_entropy = persample_entropy.sum(-1)
+        return persample_entropy.mean()
+    def codes_to_indexes(self, zhat):
+        """Converts a `code` to an index in the codebook.
+        Args:
+            zhat: A tensor of shape (B, ..., C) containing the codes. must be in {-1, 1}
+        """
+        assert zhat.shape[-1] == self.embed_dim, f"Expected {self.embed_dim} dimensions, got {zhat.shape[-1]}"
+        return ((zhat + 1) / 2 * self.basis).sum(axis=-1).to(torch.int64)
+    def codes_to_group_indexes(self, zhat):
+        """Converts a `code` to a list of indexes (in groups) in the codebook.
+        Args:
+            zhat: A tensor of shape (B, ..., C) containing the codes. must be in {-1, 1}
+        """
+        zhat_in_group = rearrange(zhat, 'b ... (g c) -> b ... g c', c=self.group_size)
+        return ((zhat_in_group + 1) / 2 * self.group_basis).sum(axis=-1).to(torch.int64)
+    def indexes_to_codes(self, indices):
+        """Inverse of `indexes_to_codes`."""
+        indices = indices.unsqueeze(-1)
+        codes_non_centered = torch.remainder(
+            torch.floor_divide(indices, self.basis), 2
+        )
+        return codes_non_centered * 2 - 1
+    def group_indexes_to_codes(self, group_indices):
+        """Inverse of `group_indexes_to_codes`."""
+        group_indices = group_indices.unsqueeze(-1)
+        codes_non_centered = torch.remainder(
+            torch.floor_divide(group_indices, self.group_basis), 2
+        )
+        codes_non_centered = rearrange(codes_non_centered, 'b ... g c -> b ... (g c)')
+        return codes_non_centered * 2 - 1
+    def get_entropy(self, count, dim=-1, eps=1e-4, normalize=True):
+        if normalize:
+            probs = (count + eps) / (count + eps).sum(dim=dim, keepdim=True)
+        else:
+            probs = count
+        H = -(probs * torch.log(probs + 1e-8)).sum(dim=dim)
+        return H
+    def get_group_codebook_entry(self, group_indices):
+        z_q = self.group_indexes_to_codes(group_indices)
+        q_scale = 1. / (self.embed_dim ** 0.5) if self.l2_norm else 1.
+        z_q = z_q * q_scale
+        if self.input_format == 'bchw':
+            h, w = int(z_q.shape[1] ** 0.5)
+            assert h * w == z_q.shape[1], 'Invalid sequence length'
+            z_q = rearrange(z_q, 'b (h w) c -> b c h w', h=h)
+        return z_q
+    def get_codebook_entry(self, indices):
+        z_q = self.indexes_to_codes(indices)
+        q_scale = 1. / (self.embed_dim ** 0.5) if self.l2_norm else 1.
+        z_q = z_q * q_scale
+        if self.input_format == 'bchw':
+            h, w = int(z_q.shape[1] ** 0.5)
+            assert h * w == z_q.shape[1], 'Invalid sequence length'
+            z_q = rearrange(z_q, 'b (h w) c -> b c h w', h=h)
+        return z_q
+class BSQuantizer(nn.Module):
+    def __init__(self, s1_bits, s2_bits, beta, gamma0, gamma, zeta, group_size):
+        super().__init__()
+        self.codebook_dim = s1_bits + s2_bits
+        self.s1_bits = s1_bits
+        self.s2_bits = s2_bits
+        self.bsq = BinarySphericalQuantizer(self.codebook_dim, beta, gamma0, gamma, zeta, group_size=group_size)
+    def bits_to_indices(self, bits):
+        bits = (bits >= 0).to(torch.long)
+        indices = 2 ** torch.arange(
+            0,
+            bits.shape[-1],
+            1,
+            dtype=torch.long,
+            device=bits.device,
+        )
+        return (bits * indices).sum(-1)
+    def forward(self, z, half=False):
+        z = F.normalize(z, dim=-1)
+        quantized, bsq_loss, metrics = self.bsq(z)
+        if half:
+            q_pre = quantized[:, :, :self.s1_bits]
+            q_post = quantized[:, :, self.s1_bits:]
+            z_indices = [self.bits_to_indices(q_pre), self.bits_to_indices(q_post)]
+        else:
+            z_indices = self.bits_to_indices(quantized)
+        return bsq_loss, quantized, z_indices
+class RMSNorm(torch.nn.Module):
+    def __init__(self, dim: int, eps: float = 1e-5):
+        super().__init__()
+        self.eps = eps
+        self.weight = nn.Parameter(torch.ones(dim))
+    def _norm(self, x):
+        return x * torch.rsqrt(torch.mean(x * x, dim=-1, keepdim=True) + self.eps)
+    def forward(self, x):
+        output = self._norm(x.float()).type_as(x)
+        return output * self.weight
+class FeedForward(nn.Module):
+    def __init__(self, d_model, ff_dim, ffn_dropout_p=0.0):
+        super().__init__()
+        self.w1 = nn.Linear(d_model, ff_dim, bias=False)
+        self.w3 = nn.Linear(d_model, ff_dim, bias=False)
+        self.w2 = nn.Linear(ff_dim, d_model, bias=False)
+        self.ffn_dropout = nn.Dropout(ffn_dropout_p)
+    def forward(self, x):
+        return self.ffn_dropout(self.w2(F.silu(self.w1(x)) * self.w3(x)))
+class RotaryPositionalEmbedding(nn.Module):
+    def __init__(self, dim):
+        super().__init__()
+        inv_freq = 1.0 / (10000 ** (torch.arange(0, dim, 2).float() / dim))
+        self.register_buffer("inv_freq", inv_freq)
+        self.seq_len_cached = None
+        self.cos_cached = None
+        self.sin_cached = None
+    def _update_cos_sin_cache(self, x, seq_len):
+        if seq_len != self.seq_len_cached:
+            self.seq_len_cached = seq_len
+            t = torch.arange(seq_len, device=x.device).type_as(self.inv_freq)
+            freqs = torch.einsum('i,j->ij', t, self.inv_freq)
+            emb = torch.cat((freqs, freqs), dim=-1).to(x.device)
+            self.cos_cached = emb.cos()[None, None, :, :]
+            self.sin_cached = emb.sin()[None, None, :, :]
+        return self.cos_cached, self.sin_cached
+    def forward(self, q, k):
+        cos, sin = self._update_cos_sin_cache(q, q.shape[-2])
+        return (
+            (q * cos) + (self._rotate_half(q) * sin),
+            (k * cos) + (self._rotate_half(k) * sin),
+        )
+    def _rotate_half(self, x):
+        x1, x2 = x.chunk(2, dim=-1)
+        return torch.cat((-x2, x1), dim=-1)
+def scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=False, scale=None, training=True) -> torch.Tensor:
+    L, S = query.size(-2), key.size(-2)
+    scale_factor = 1 / math.sqrt(query.size(-1)) if scale is None else scale
+    attn_bias = torch.zeros(L, S, dtype=query.dtype).to(query.device)
+    if is_causal:
+        assert attn_mask is None
+        temp_mask = torch.ones(L, S, dtype=torch.bool).tril(diagonal=0).to(query.device)
+        attn_bias.masked_fill_(temp_mask.logical_not(), float("-inf"))
+        attn_bias.to(query.dtype)
+    attn_weight = query @ key.transpose(-2, -1) * scale_factor
+    attn_weight += attn_bias
+    if attn_mask is not None:
+        attn_mask_bias = torch.zeros_like(attn_weight)
+        if attn_mask.dtype == torch.bool:
+            attn_mask_bias.masked_fill_(attn_mask, float("-inf"))
+        else:
+            attn_mask_bias += attn_mask
+        attn_weight += attn_mask_bias
+    attn_weight = torch.softmax(attn_weight, dim=-1)
+    attn_weight = torch.dropout(attn_weight, dropout_p, train=training)
+    return attn_weight @ value
+class MultiHeadAttentionWithRoPE(nn.Module):
+    def __init__(self, d_model, n_heads, attn_dropout_p=0.0, resid_dropout_p=0.0):
+        super().__init__()
+        self.d_model = d_model
+        self.n_heads = n_heads
+        self.head_dim = d_model // n_heads
+        self.q_proj = nn.Linear(d_model, d_model)
+        self.k_proj = nn.Linear(d_model, d_model)
+        self.v_proj = nn.Linear(d_model, d_model)
+        self.out_proj = nn.Linear(d_model, d_model)
+        self.rotary = RotaryPositionalEmbedding(self.head_dim)
+        self.attn_dropout_p = attn_dropout_p
+        self.resid_dropout = nn.Dropout(resid_dropout_p)
+    def forward(self, x, key_padding_mask=None):
+        batch_size, seq_len, _ = x.shape
+        q = self.q_proj(x).view(batch_size, seq_len, self.n_heads, self.head_dim).transpose(1, 2)
+        k = self.k_proj(x).view(batch_size, seq_len, self.n_heads, self.head_dim).transpose(1, 2)
+        v = self.v_proj(x).view(batch_size, seq_len, self.n_heads, self.head_dim).transpose(1, 2)
+        q, k = self.rotary(q, k)
+        if key_padding_mask is not None:
+            attn_mask = key_padding_mask.unsqueeze(1).unsqueeze(2)  # [batch, 1, 1, seq_len]
+            attn_mask = attn_mask.expand(-1, self.n_heads, seq_len, -1)  # [batch, n_heads, q_len, k_len]
+        else:
+            attn_mask = None
+        attn_output = scaled_dot_product_attention(
+            q, k, v,
+            attn_mask=attn_mask,
+            dropout_p=self.attn_dropout_p,
+            is_causal=True,
+            training=self.training
+        )
+        attn_output = attn_output.transpose(1, 2).contiguous().view(batch_size, seq_len, self.d_model)
+        return self.resid_dropout(self.out_proj(attn_output))
+class MultiHeadCrossAttentionWithRoPE(nn.Module):
+    def __init__(self, d_model, n_heads, attn_dropout_p=0.0, resid_dropout=0.0):
+        super().__init__()
+        self.d_model = d_model
+        self.n_heads = n_heads
+        self.head_dim = d_model // n_heads
+        self.q_proj = nn.Linear(d_model, d_model)
+        self.k_proj = nn.Linear(d_model, d_model)
+        self.v_proj = nn.Linear(d_model, d_model)
+        self.out_proj = nn.Linear(d_model, d_model)
+        self.rotary = RotaryPositionalEmbedding(self.head_dim)
+        self.attn_dropout_p = attn_dropout_p
+        self.resid_dropout = nn.Dropout(resid_dropout)
+    def forward(self, query, key, value, key_padding_mask=None):
+        batch_size, q_len, _ = query.shape
+        _, seq_len, _ = key.shape
+        q = self.q_proj(query).view(batch_size, q_len, self.n_heads, self.head_dim).transpose(1, 2)
+        k = self.k_proj(key).view(batch_size, seq_len, self.n_heads, self.head_dim).transpose(1, 2)
+        v = self.v_proj(value).view(batch_size, seq_len, self.n_heads, self.head_dim).transpose(1, 2)
+        q, k = self.rotary(q, k)
+        if key_padding_mask is not None:
+            attn_mask = key_padding_mask.unsqueeze(1).unsqueeze(2)
+            attn_mask = attn_mask.expand(-1, self.n_heads, q_len, -1)
+        else:
+            attn_mask = None
+        is_causal_flag = self.training
+        attn_output = scaled_dot_product_attention(
+            q, k, v,
+            attn_mask=attn_mask,
+            dropout_p=self.attn_dropout_p,
+            is_causal=is_causal_flag,
+            training=self.training
+        )
+        attn_output = attn_output.transpose(1, 2).contiguous().view(batch_size, q_len, self.d_model)
+        return self.resid_dropout(self.out_proj(attn_output))
+class HierarchicalEmbedding(nn.Module):
+    def __init__(self, s1_bits, s2_bits, d_model=256):
+        super().__init__()
+        self.s1_bits = s1_bits
+        self.s2_bits = s2_bits
+        vocab_s1 = 2 ** s1_bits
+        vocab_s2 = 2 ** s2_bits
+        self.emb_s1 = nn.Embedding(vocab_s1, d_model)
+        self.emb_s2 = nn.Embedding(vocab_s2, d_model)
+        self.d_model = d_model
+        self.fusion_proj = nn.Linear(d_model * 2, d_model)
+        nn.init.normal_(self.emb_s1.weight, mean=0, std=d_model ** -0.5)
+        nn.init.normal_(self.emb_s2.weight, mean=0, std=d_model ** -0.5)
+    def forward(self, token_ids):
+        """Inputs:
+        token_ids: [batch_size, seq_len] token ID
+        Output: [batch_size, seq_len, d_model]
+        """
+        if isinstance(token_ids, tuple) or isinstance(token_ids, list):
+            s1_ids, s2_ids = token_ids
+        else:
+            s1_ids, s2_ids = self.split_token(token_ids, self.s2_bits)
+        s1_emb = self.emb_s1(s1_ids) * math.sqrt(self.d_model)
+        s2_emb = self.emb_s2(s2_ids) * math.sqrt(self.d_model)
+        return self.fusion_proj(torch.cat([s1_emb, s2_emb], dim=-1))
+class DependencyAwareLayer(nn.Module):
+    def __init__(self, d_model, n_heads=4, attn_dropout_p=0.0, resid_dropout=0.0):
+        super().__init__()
+        self.cross_attn = MultiHeadCrossAttentionWithRoPE(d_model, n_heads, attn_dropout_p, resid_dropout)
+        self.norm = RMSNorm(d_model)
+    def forward(self, hidden_states, sibling_embed, key_padding_mask=None):
+        """hidden_states: [batch, seq_len, d_model]
+        sibling_embed: Embedding from another subtoken
+        """
+        attn_out = self.cross_attn(
+            query=sibling_embed,
+            key=hidden_states,
+            value=hidden_states,
+            key_padding_mask=key_padding_mask
+        )
+        return self.norm(hidden_states + attn_out)
+class TransformerBlock(nn.Module):
+    def __init__(self, d_model, n_heads, ff_dim=1024, ffn_dropout_p=0.0, attn_dropout_p=0.0, resid_dropout_p=0.0):
+        super().__init__()
+        self.norm1 = RMSNorm(d_model)
+        self.self_attn = MultiHeadAttentionWithRoPE(d_model, n_heads, attn_dropout_p, resid_dropout_p)
+        self.norm2 = RMSNorm(d_model)
+        self.ffn = FeedForward(d_model, ff_dim, ffn_dropout_p)
+    def forward(self, x, key_padding_mask=None):
+        residual = x
+        x = self.norm1(x)
+        attn_out = self.self_attn(x, key_padding_mask=key_padding_mask)
+        x = residual + attn_out
+        residual = x
+        x = self.norm2(x)
+        ffn_out = self.ffn(x)
+        x = residual + ffn_out
+        return x
+class DualHead(nn.Module):
+    def __init__(self, s1_bits, s2_bits, d_model):
+        super().__init__()
+        self.vocab_s1 = 2 ** s1_bits
+        self.vocab_s2 = 2 ** s2_bits
+        self.proj_s1 = nn.Linear(d_model, self.vocab_s1)
+        self.proj_s2 = nn.Linear(d_model, self.vocab_s2)
+    def compute_loss(self, s1_logits, s2_logits, s1_targets, s2_targets, padding_mask=None):
+        if padding_mask is not None:
+            valid_mask = (padding_mask == 0)
+            s1_logits = s1_logits[valid_mask]
+            s2_logits = s2_logits[valid_mask]
+            s1_targets = s1_targets[valid_mask]
+            s2_targets = s2_targets[valid_mask]
+            ce_s1 = F.cross_entropy(s1_logits, s1_targets)
+            ce_s2 = F.cross_entropy(s2_logits, s2_targets)
+        else:
+            ce_s1 = F.cross_entropy(s1_logits.reshape(-1, self.vocab_s1), s1_targets.reshape(-1))
+            ce_s2 = F.cross_entropy(s2_logits.reshape(-1, self.vocab_s2), s2_targets.reshape(-1))
+        ce_loss = (ce_s1 + ce_s2) / 2
+        return ce_loss, ce_s1, ce_s2
+    def forward(self, x):
+        return self.proj_s1(x)
+    def cond_forward(self, x2):
+        return self.proj_s2(x2)
+class FixedEmbedding(nn.Module):
+    def __init__(self, c_in, d_model):
+        super(FixedEmbedding, self).__init__()
+        w = torch.zeros(c_in, d_model).float()
+        w.require_grad = False
+        position = torch.arange(0, c_in).float().unsqueeze(1)
+        div_term = (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp()
+        w[:, 0::2] = torch.sin(position * div_term)
+        w[:, 1::2] = torch.cos(position * div_term)
+        self.emb = nn.Embedding(c_in, d_model)
+        self.emb.weight = nn.Parameter(w, requires_grad=False)
+    def forward(self, x):
+        return self.emb(x).detach()
+class TemporalEmbedding(nn.Module):
+    def __init__(self, d_model, learn_pe):
+        super(TemporalEmbedding, self).__init__()
+        minute_size = 60
+        hour_size = 24
+        weekday_size = 7
+        day_size = 32
+        month_size = 13
+        Embed = FixedEmbedding if not learn_pe else nn.Embedding
+        self.minute_embed = Embed(minute_size, d_model)
+        self.hour_embed = Embed(hour_size, d_model)
+        self.weekday_embed = Embed(weekday_size, d_model)
+        self.day_embed = Embed(day_size, d_model)
+        self.month_embed = Embed(month_size, d_model)
+    def forward(self, x):
+        x = x.long()
+        minute_x = self.minute_embed(x[:, :, 0])
+        hour_x = self.hour_embed(x[:, :, 1])
+        weekday_x = self.weekday_embed(x[:, :, 2])
+        day_x = self.day_embed(x[:, :, 3])
+        month_x = self.month_embed(x[:, :, 4])
+        return hour_x + weekday_x + day_x + month_x + minute_x

models/predictor/README.md ADDED Viewed

	@@ -0,0 +1,10 @@

+---
+tags:
+- model_hub_mixin
+- pytorch_model_hub_mixin
+---
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
+- Code: [More Information Needed]
+- Paper: [More Information Needed]
+- Docs: [More Information Needed]

models/predictor/config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "attn_dropout_p": 0.0,
+  "d_model": 256,
+  "ff_dim": 512,
+  "ffn_dropout_p": 0.2,
+  "learn_te": true,
+  "n_heads": 4,
+  "n_layers": 4,
+  "resid_dropout_p": 0.2,
+  "s1_bits": 10,
+  "s2_bits": 10,
+  "token_dropout_p": 0.0
+}