Spaces:

yingfeng64
/

kronos-api

Running

App Files Files Community

fengwm commited on 21 days ago

Commit

2a8a0f5

1 Parent(s): 292fb60

更新 README，切换数据源为 AkShare，统一请求字段为 symbol，优化缓存机制，新增性能日志

Browse files

Files changed (5) hide show

README.md +42 -15
app.py +50 -15
data_fetcher.py +103 -38
predictor.py +63 -14
requirements.txt +1 -1

README.md CHANGED Viewed

@@ -11,12 +11,22 @@ pinned: false
 基于[清华大学 Kronos 金融 K 线基础大模型](https://arxiv.org/abs/2508.02739)的 A 股概率预测 REST API。
-- **数据源**：Tushare Pro，前复权（qfq）
-- **推理方式**：蒙特卡洛多次采样，输出预测方向、置信度及 95% 交易区间
 - **异步任务**：POST 提交 → 返回 `task_id` → GET 轮询结果
 ---
 ## 模型信息
 | 项目 | 值 |
@@ -38,7 +48,7 @@ pinned: false
 curl -X POST "https://yingfeng64-kronos-api.hf.space/api/v1/predict" \
   -H "Content-Type: application/json" \
   -d '{
-    "ts_code": "000063.SZ",
     "lookback": 512,
     "pred_len": 5,
     "sample_count": 30,
@@ -66,7 +76,7 @@ import time, requests
 BASE = "https://yingfeng64-kronos-api.hf.space"
 resp = requests.post(f"{BASE}/api/v1/predict", json={
-    "ts_code": "000063.SZ",
     "lookback": 512,
     "pred_len": 5,
     "sample_count": 30,
@@ -92,7 +102,7 @@ print(r["result"])
 | 字段 | 类型 | 默认值 | 范围 | 说明 |
 |---|---|---|---|---|
-| `ts_code` | string | — | — | Tushare 股票代码，如 `"000063.SZ"` |
 | `lookback` | int | 512 | 20–512 | 回看历史 K 线根数 |
 | `pred_len` | int | 5 | 1–60 | 预测未来交易日数（建议 ≤ 30） |
 | `sample_count` | int | 30 | 1–100 | MC 蒙特卡洛采样次数 |
@@ -103,7 +113,7 @@ print(r["result"])
 ```json
 {
-  "ts_code": "000063.SZ",
   "base_date": "2026-03-13",
   "pred_len": 5,
   "confidence": 95,
@@ -176,9 +186,9 @@ curl -X POST "https://yingfeng64-kronos-api.hf.space/api/v1/predict/batch" \
   -H "Content-Type: application/json" \
   -d '{
     "requests": [
-      {"ts_code": "000063.SZ", "pred_len": 5, "sample_count": 30},
-      {"ts_code": "600900.SH", "pred_len": 5, "sample_count": 30},
-      {"ts_code": "000001.SZ", "pred_len": 5, "sample_count": 30}
     ]
   }'
 ```
@@ -227,11 +237,11 @@ curl -X POST "https://yingfeng64-kronos-api.hf.space/api/v1/predict/batch" \
 | 参数 | 说明 |
 |---|---|
-| `ts_code`（可选） | 只返回该股票的缓存，不传则返回全部 |
 ```bash
 # 查某只股票
-curl "https://yingfeng64-kronos-api.hf.space/api/v1/cache?ts_code=000063.SZ"
 # 查全部
 curl "https://yingfeng64-kronos-api.hf.space/api/v1/cache"
@@ -242,7 +252,7 @@ curl "https://yingfeng64-kronos-api.hf.space/api/v1/cache"
   "count": 1,
   "entries": [
     {
-      "ts_code": "000063.SZ",
       "lookback": 512,
       "pred_len": 5,
       "sample_count": 30,
@@ -271,8 +281,8 @@ curl "https://yingfeng64-kronos-api.hf.space/api/v1/cache"
 | 字段 | 含义 |
 |---|---|
 | `base_date` | 预测所基于的最后一个历史 K 线日期 |
-| `direction.signal` | `"bullish"` / `"bearish"`，MC 样本中末日收盘价 > 当前收盘的比例决定 |
-| `direction.probability` | 看涨概率（0–1） |
 | `trading_low` | 该日预测最低价的 q2.5 分位数（95% 交易区间下沿） |
 | `trading_high` | 该日预测最高价的 q97.5 分位数（95% 交易区间上沿） |
 | `uncertainty` | `(trading_high − trading_low) / last_close`，无量纲不确定性 |
@@ -284,7 +294,7 @@ curl "https://yingfeng64-kronos-api.hf.space/api/v1/cache"
 ## 缓存机制
-缓存 key 由 `(ts_code, lookback, pred_len, sample_count, mode, include_volume)` 六元组构成，失效时机为下一个 A 股交易日收盘（15:00 CST）。
 | 请求时间（CST） | 缓存过期时间 |
 |---|---|
@@ -297,6 +307,23 @@ curl "https://yingfeng64-kronos-api.hf.space/api/v1/cache"
 ---
 ## 性能参考
 | 环境 | 单次采样 | 30 次 MC 总耗时 |

 基于[清华大学 Kronos 金融 K 线基础大模型](https://arxiv.org/abs/2508.02739)的 A 股概率预测 REST API。
+- **数据源**：AkShare 东方财富 A 股日线，前复权（qfq）
+- **推理方式**：蒙特卡洛分批采样，输出预测方向、置信度及 95% 交易区间
 - **异步任务**：POST 提交 → 返回 `task_id` → GET 轮询结果
 ---
+## 更新说明（2026-03-16）
+- 请求/响应主字段由 `ts_code` 统一为 `symbol`
+- 数据源切换为 AkShare 东财日线接口（`stock_zh_a_hist`，前复权 `qfq`）
+- 方向概率 `direction.probability` 定义为“预测区间内看涨概率（0–1）”
+- 推理路径改为分批采样（可通过 `MC_BATCH_SIZE` 调整批大小）
+- 新增阶段耗时日志（`fetch/calendar/infer/build/cache/total`）
+---
 ## 模型信息
 | 项目 | 值 |
 curl -X POST "https://yingfeng64-kronos-api.hf.space/api/v1/predict" \
   -H "Content-Type: application/json" \
   -d '{
+    "symbol": "000063.SZ",
     "lookback": 512,
     "pred_len": 5,
     "sample_count": 30,
 BASE = "https://yingfeng64-kronos-api.hf.space"
 resp = requests.post(f"{BASE}/api/v1/predict", json={
+    "symbol": "000063.SZ",
     "lookback": 512,
     "pred_len": 5,
     "sample_count": 30,
 | 字段 | 类型 | 默认值 | 范围 | 说明 |
 |---|---|---|---|---|
+| `symbol` | string | — | — | A 股代码，支持 `"603777"` 或 `"000063.SZ"` |
 | `lookback` | int | 512 | 20–512 | 回看历史 K 线根数 |
 | `pred_len` | int | 5 | 1–60 | 预测未来交易日数（建议 ≤ 30） |
 | `sample_count` | int | 30 | 1–100 | MC 蒙特卡洛采样次数 |
 ```json
 {
+  "symbol": "000063.SZ",
   "base_date": "2026-03-13",
   "pred_len": 5,
   "confidence": 95,
   -H "Content-Type: application/json" \
   -d '{
     "requests": [
+      {"symbol": "000063", "pred_len": 5, "sample_count": 30},
+      {"symbol": "600900", "pred_len": 5, "sample_count": 30},
+      {"symbol": "000001", "pred_len": 5, "sample_count": 30}
     ]
   }'
 ```
 | 参数 | 说明 |
 |---|---|
+| `symbol`（可选） | 只返回该股票的缓存，不传则返回全部 |
 ```bash
 # 查某只股票
+curl "https://yingfeng64-kronos-api.hf.space/api/v1/cache?symbol=000063"
 # 查全部
 curl "https://yingfeng64-kronos-api.hf.space/api/v1/cache"
   "count": 1,
   "entries": [
     {
+      "symbol": "000063",
       "lookback": 512,
       "pred_len": 5,
       "sample_count": 30,
 | 字段 | 含义 |
 |---|---|
 | `base_date` | 预测所基于的最后一个历史 K 线日期 |
+| `direction.signal` | `"bullish"` / `"bearish"`，由 `direction.probability >= 0.5` 决定 |
+| `direction.probability` | 预测区间内看涨概率（0–1） |
 | `trading_low` | 该日预测最低价的 q2.5 分位数（95% 交易区间下沿） |
 | `trading_high` | 该日预测最高价的 q97.5 分位数（95% 交易区间上沿） |
 | `uncertainty` | `(trading_high − trading_low) / last_close`，无量纲不确定性 |
 ## 缓存机制
+缓存 key 由 `(symbol, lookback, pred_len, sample_count, mode, include_volume)` 六元组构成，失效时机为下一个 A 股交易日收盘（15:00 CST）。
 | 请求时间（CST） | 缓存过期时间 |
 |---|---|
 ---
+## 运行配置
+| 环境变量 | 默认值 | 说明 |
+|---|---|---|
+| `KRONOS_DIR` | `/app/Kronos` | Kronos 源码目录 |
+| `MC_BATCH_SIZE` | `8` | 蒙特卡洛分批采样大小（越大通常越快，但占用显存/内存更高） |
+---
+## 可观测性
+服务会在 `INFO` 日志输出预测阶段耗时，示例：
+```text
+Task <task_id> timing symbol=300065.SZ fetch=...ms calendar=...ms infer=...ms build=...ms cache=...ms total=...ms
+```
 ## 性能参考
 | 环境 | 单次采样 | 30 次 MC 总耗时 |

app.py CHANGED Viewed

@@ -15,6 +15,7 @@ import uuid
 from concurrent.futures import ThreadPoolExecutor
 from contextlib import asynccontextmanager
 from datetime import datetime, time, timedelta, timezone
 from typing import Literal, List
 import pandas as pd
@@ -38,8 +39,8 @@ def _next_cache_expiry() -> datetime:
     Return the UTC datetime of the NEXT A-share market close (15:00 CST on a
     weekday), which is when new candle data becomes available and the cache
     should be invalidated.
-    Chinese public holidays are intentionally ignored: on those days Tushare
-    returns the same last bar, so a cache hit is harmless.
     """
     now_cst = datetime.now(_CST)
     today_close = now_cst.replace(hour=15, minute=0, second=0, microsecond=0)
@@ -58,13 +59,13 @@ def _next_cache_expiry() -> datetime:
 # ── Result cache ──────────────────────────────────────────────────────────────
-# key   : (ts_code, lookback, pred_len, sample_count, mode, include_volume)
 # value : {"result": dict, "expires_at": datetime(UTC), "cached_at": datetime(UTC)}
 _cache: dict[tuple, dict] = {}
 def _cache_key(req: "PredictRequest") -> tuple:
-    return (req.ts_code, req.lookback, req.pred_len,
             req.sample_count, req.mode, req.include_volume)
@@ -84,7 +85,7 @@ def _set_cache(req: "PredictRequest", result: dict) -> None:
     }
     logger.info(
         "Cached %s, expires at %s CST",
-        req.ts_code,
         _cache[_cache_key(req)]["expires_at"].astimezone(_CST).strftime("%Y-%m-%d %H:%M"),
     )
@@ -124,7 +125,11 @@ app.add_middleware(
 # ── Request / Response schemas ────────────────────────────────────────────────
 class PredictRequest(BaseModel):
-    ts_code: str = Field(..., examples=["600900.SH"], description="Tushare 股票代码")
     lookback: int = Field(
         default=512,
         ge=20,
@@ -180,7 +185,7 @@ def _build_response(req: PredictRequest, base_date: str, pred_mean, ci,
         bands.append(band)
     result: dict = {
-        "ts_code":             req.ts_code,
         "base_date":           base_date,
         "pred_len":            req.pred_len,
         "confidence":          95,
@@ -214,12 +219,18 @@ def _build_response(req: PredictRequest, base_date: str, pred_mean, ci,
 # ── Background task ───────────────────────────────────────────────────────────
 def _run_prediction(task_id: str, req: PredictRequest) -> None:
     try:
         # ── Cache check ───────────────────────────────────────────────────────
         cache_entry = _get_cached(req)
         if cache_entry is not None:
-            logger.info("Cache hit for %s (expires %s CST)", req.ts_code,
-                        cache_entry["expires_at"].astimezone(_CST).strftime("%Y-%m-%d %H:%M"))
             _tasks[task_id] = {
                 "status": "done",
                 "result": {**cache_entry["result"], "cached": True,
@@ -229,26 +240,37 @@ def _run_prediction(task_id: str, req: PredictRequest) -> None:
             return
         # ── Full inference ────────────────────────────────────────────────────
         x_df, x_timestamp, last_trade_date = data_fetcher.fetch_stock_data(
-            req.ts_code, req.lookback
         )
         y_timestamp = data_fetcher.get_future_trading_dates(last_trade_date, req.pred_len)
         pred_mean, ci, trading_low, trading_high, direction_prob, last_close = (
             pred_module.run_mc_prediction(
                 x_df, x_timestamp, y_timestamp, req.pred_len, req.sample_count
             )
         )
         base_date = str(pd.to_datetime(last_trade_date, format="%Y%m%d").date())
         result = _build_response(
             req, base_date, pred_mean, ci,
             trading_low, trading_high, direction_prob, last_close, y_timestamp,
         )
         # ── Store in cache ────────────────────────────────────────────────────
         _set_cache(req, result)
         cache_entry = _cache[_cache_key(req)]
         _tasks[task_id] = {
             "status": "done",
@@ -256,8 +278,21 @@ def _run_prediction(task_id: str, req: PredictRequest) -> None:
                        "cache_expires_at": cache_entry["expires_at"].astimezone(_CST).strftime("%Y-%m-%d %H:%M:%S %Z")},
             "error": None,
         }
     except Exception as exc:
-        logger.exception("Task %s failed", task_id)
         _tasks[task_id] = {"status": "failed", "result": None, "error": str(exc)}
@@ -297,22 +332,22 @@ async def get_predict_result(task_id: str):
 @app.get("/api/v1/cache", summary="查看缓存状态")
-async def get_cache(ts_code: str | None = None):
     """
     列出有效的缓存条目及其过期时间。
     - 不传参数：返回全部
-    - `?ts_code=000063.SZ`：只返回该股票的所有参数组合
     """
     now_utc = datetime.now(timezone.utc)
     entries = []
     for key, entry in _cache.items():
-        if ts_code and key[0] != ts_code:
             continue
         remaining = (entry["expires_at"] - now_utc).total_seconds()
         if remaining > 0:
             entries.append({
-                "ts_code":        key[0],
                 "lookback":       key[1],
                 "pred_len":       key[2],
                 "sample_count":   key[3],

 from concurrent.futures import ThreadPoolExecutor
 from contextlib import asynccontextmanager
 from datetime import datetime, time, timedelta, timezone
+from time import perf_counter
 from typing import Literal, List
 import pandas as pd
     Return the UTC datetime of the NEXT A-share market close (15:00 CST on a
     weekday), which is when new candle data becomes available and the cache
     should be invalidated.
+    Chinese public holidays are intentionally ignored: on those days market
+    data does not advance, so a cache hit is harmless.
     """
     now_cst = datetime.now(_CST)
     today_close = now_cst.replace(hour=15, minute=0, second=0, microsecond=0)
 # ── Result cache ──────────────────────────────────────────────────────────────
+# key   : (symbol, lookback, pred_len, sample_count, mode, include_volume)
 # value : {"result": dict, "expires_at": datetime(UTC), "cached_at": datetime(UTC)}
 _cache: dict[tuple, dict] = {}
 def _cache_key(req: "PredictRequest") -> tuple:
+    return (req.symbol, req.lookback, req.pred_len,
             req.sample_count, req.mode, req.include_volume)
     }
     logger.info(
         "Cached %s, expires at %s CST",
+        req.symbol,
         _cache[_cache_key(req)]["expires_at"].astimezone(_CST).strftime("%Y-%m-%d %H:%M"),
     )
 # ── Request / Response schemas ────────────────────────────────────────────────
 class PredictRequest(BaseModel):
+    symbol: str = Field(
+        ...,
+        examples=["603777", "600900.SH"],
+        description="A 股代码；支持 6 位代码或带市场后缀（如 600900.SH）",
+    )
     lookback: int = Field(
         default=512,
         ge=20,
         bands.append(band)
     result: dict = {
+        "symbol":              req.symbol,
         "base_date":           base_date,
         "pred_len":            req.pred_len,
         "confidence":          95,
 # ── Background task ───────────────────────────────────────────────────────────
 def _run_prediction(task_id: str, req: PredictRequest) -> None:
+    t_total_start = perf_counter()
     try:
         # ── Cache check ───────────────────────────────────────────────────────
         cache_entry = _get_cached(req)
         if cache_entry is not None:
+            total_ms = (perf_counter() - t_total_start) * 1000
+            logger.info(
+                "Cache hit for %s (expires %s CST, total=%.1fms)",
+                req.symbol,
+                cache_entry["expires_at"].astimezone(_CST).strftime("%Y-%m-%d %H:%M"),
+                total_ms,
+            )
             _tasks[task_id] = {
                 "status": "done",
                 "result": {**cache_entry["result"], "cached": True,
             return
         # ── Full inference ────────────────────────────────────────────────────
+        t_fetch_start = perf_counter()
         x_df, x_timestamp, last_trade_date = data_fetcher.fetch_stock_data(
+            req.symbol, req.lookback
         )
+        fetch_ms = (perf_counter() - t_fetch_start) * 1000
+        t_calendar_start = perf_counter()
         y_timestamp = data_fetcher.get_future_trading_dates(last_trade_date, req.pred_len)
+        calendar_ms = (perf_counter() - t_calendar_start) * 1000
+        t_infer_start = perf_counter()
         pred_mean, ci, trading_low, trading_high, direction_prob, last_close = (
             pred_module.run_mc_prediction(
                 x_df, x_timestamp, y_timestamp, req.pred_len, req.sample_count
             )
         )
+        infer_ms = (perf_counter() - t_infer_start) * 1000
+        t_build_start = perf_counter()
         base_date = str(pd.to_datetime(last_trade_date, format="%Y%m%d").date())
         result = _build_response(
             req, base_date, pred_mean, ci,
             trading_low, trading_high, direction_prob, last_close, y_timestamp,
         )
+        build_ms = (perf_counter() - t_build_start) * 1000
         # ── Store in cache ────────────────────────────────────────────────────
+        t_cache_start = perf_counter()
         _set_cache(req, result)
         cache_entry = _cache[_cache_key(req)]
+        cache_ms = (perf_counter() - t_cache_start) * 1000
         _tasks[task_id] = {
             "status": "done",
                        "cache_expires_at": cache_entry["expires_at"].astimezone(_CST).strftime("%Y-%m-%d %H:%M:%S %Z")},
             "error": None,
         }
+        total_ms = (perf_counter() - t_total_start) * 1000
+        logger.info(
+            "Task %s timing symbol=%s fetch=%.1fms calendar=%.1fms infer=%.1fms build=%.1fms cache=%.1fms total=%.1fms",
+            task_id,
+            req.symbol,
+            fetch_ms,
+            calendar_ms,
+            infer_ms,
+            build_ms,
+            cache_ms,
+            total_ms,
+        )
     except Exception as exc:
+        total_ms = (perf_counter() - t_total_start) * 1000
+        logger.exception("Task %s failed after %.1fms", task_id, total_ms)
         _tasks[task_id] = {"status": "failed", "result": None, "error": str(exc)}
 @app.get("/api/v1/cache", summary="查看缓存状态")
+async def get_cache(symbol: str | None = None):
     """
     列出有效的缓存条目及其过期时间。
     - 不传参数：返回全部
+    - `?symbol=000063.SZ`：只返回该股票的所有参数组合
     """
     now_utc = datetime.now(timezone.utc)
     entries = []
     for key, entry in _cache.items():
+        if symbol and key[0] != symbol:
             continue
         remaining = (entry["expires_at"] - now_utc).total_seconds()
         if remaining > 0:
             entries.append({
+                "symbol":         key[0],
                 "lookback":       key[1],
                 "pred_len":       key[2],
                 "sample_count":   key[3],

data_fetcher.py CHANGED Viewed

@@ -1,19 +1,66 @@
-import os
 from datetime import datetime, timedelta
 import pandas as pd
-import tushare as ts
-TUSHARE_TOKEN = os.environ.get(
-    "TUSHARE_TOKEN",
-)
-ts.set_token(TUSHARE_TOKEN)
-_pro = ts.pro_api()
 def fetch_stock_data(
-    ts_code: str, lookback: int
 ) -> tuple[pd.DataFrame, pd.Series, str]:
     """
     Returns:
@@ -21,31 +68,51 @@ def fetch_stock_data(
         x_timestamp  : pd.Series[datetime], aligned to x_df
         last_trade_date: str "YYYYMMDD", the most recent bar date
     """
     end_date = datetime.today().strftime("%Y%m%d")
-    # 2× buffer to account for weekends/holidays
-    start_date = (datetime.today() - timedelta(days=lookback * 2)).strftime("%Y%m%d")
-    df = ts.pro_bar(
-        ts_code=ts_code,
-        adj="qfq",
         start_date=start_date,
         end_date=end_date,
-        asset="E",
     )
     if df is None or df.empty:
-        raise ValueError(f"No data returned for ts_code={ts_code!r}")
     df = df.sort_values("trade_date").reset_index(drop=True)
-    df = df.rename(columns={"vol": "volume"})
-    df["timestamps"] = pd.to_datetime(df["trade_date"], format="%Y%m%d")
     # Keep the most recent `lookback` bars
     df = df.tail(lookback).reset_index(drop=True)
     x_df = df[["open", "high", "low", "close", "volume", "amount"]].copy()
     x_timestamp = df["timestamps"].copy()
-    last_trade_date = df["trade_date"].iloc[-1]
     return x_df, x_timestamp, last_trade_date
@@ -56,22 +123,20 @@ def get_future_trading_dates(last_trade_date: str, pred_len: int) -> pd.Series:
     follow `last_trade_date` (format: YYYYMMDD).
     """
     last_dt = datetime.strptime(last_trade_date, "%Y%m%d")
-    # 3× buffer so we always have enough dates even over a long holiday
-    end_dt = last_dt + timedelta(days=pred_len * 3)
-    cal = _pro.trade_cal(
-        exchange="SSE",
-        start_date=(last_dt + timedelta(days=1)).strftime("%Y%m%d"),
-        end_date=end_dt.strftime("%Y%m%d"),
-        is_open="1",
-    )
-    cal = cal.sort_values("cal_date")
-    dates = pd.to_datetime(cal["cal_date"].values[:pred_len], format="%Y%m%d")
-    if len(dates) < pred_len:
-        raise ValueError(
-            f"Could only obtain {len(dates)} future trading dates; "
-            f"increase buffer or check Tushare calendar coverage."
-        )
-    return pd.Series(dates)

 from datetime import datetime, timedelta
+import threading
+import akshare as ak
 import pandas as pd
+_TRADE_CALENDAR_CACHE: pd.DatetimeIndex | None = None
+_TRADE_CALENDAR_CACHED_AT: datetime | None = None
+_TRADE_CALENDAR_CACHE_TTL = timedelta(hours=12)
+_TRADE_CALENDAR_LOCK = threading.Lock()
+def _normalize_symbol(raw_symbol: str) -> str:
+    """
+    Convert user input into the 6-digit stock code expected by
+    `ak.stock_zh_a_hist`.
+    Accepted examples:
+      - "603777"
+      - "600900.SH"
+      - "000063.SZ"
+    """
+    symbol = raw_symbol.strip().upper()
+    if "." in symbol:
+        symbol = symbol.split(".", 1)[0]
+    if len(symbol) != 6 or not symbol.isdigit():
+        raise ValueError(
+            f"Invalid stock code {raw_symbol!r}; expected 6 digits like '603777' "
+            "or Tushare-style code like '600900.SH'."
+        )
+    return symbol
+def _get_trade_calendar_cached() -> pd.DatetimeIndex:
+    """
+    Fetch and cache exchange trading dates in-process to avoid repeated
+    network calls on each request.
+    """
+    global _TRADE_CALENDAR_CACHE, _TRADE_CALENDAR_CACHED_AT
+    now = datetime.now()
+    with _TRADE_CALENDAR_LOCK:
+        if (
+            _TRADE_CALENDAR_CACHE is not None
+            and _TRADE_CALENDAR_CACHED_AT is not None
+            and (now - _TRADE_CALENDAR_CACHED_AT) < _TRADE_CALENDAR_CACHE_TTL
+        ):
+            return _TRADE_CALENDAR_CACHE
+    cal = ak.tool_trade_date_hist_sina()
+    cal_col = "trade_date" if "trade_date" in cal.columns else "日期"
+    all_dates = pd.to_datetime(cal[cal_col]).sort_values().drop_duplicates()
+    cached = pd.DatetimeIndex(all_dates)
+    with _TRADE_CALENDAR_LOCK:
+        _TRADE_CALENDAR_CACHE = cached
+        _TRADE_CALENDAR_CACHED_AT = now
+    return cached
 def fetch_stock_data(
+    symbol: str, lookback: int
 ) -> tuple[pd.DataFrame, pd.Series, str]:
     """
     Returns:
         x_timestamp  : pd.Series[datetime], aligned to x_df
         last_trade_date: str "YYYYMMDD", the most recent bar date
     """
+    normalized_symbol = _normalize_symbol(symbol)
     end_date = datetime.today().strftime("%Y%m%d")
+    # 4x buffer to account for weekends/long holidays.
+    start_date = (datetime.today() - timedelta(days=lookback * 4)).strftime("%Y%m%d")
+    df = ak.stock_zh_a_hist(
+        symbol=normalized_symbol,
+        period="daily",
         start_date=start_date,
         end_date=end_date,
+        adjust="qfq",
     )
     if df is None or df.empty:
+        raise ValueError(f"No data returned for symbol={symbol!r}")
+    df = df.rename(
+        columns={
+            "日期": "trade_date",
+            "开盘": "open",
+            "最高": "high",
+            "最低": "low",
+            "收盘": "close",
+            "成交量": "volume",
+            "成交额": "amount",
+        }
+    )
+    required_cols = ["trade_date", "open", "high", "low", "close", "volume", "amount"]
+    missing = [c for c in required_cols if c not in df.columns]
+    if missing:
+        raise ValueError(f"AkShare response missing columns: {missing}")
+    df["trade_date"] = pd.to_datetime(df["trade_date"])
+    for col in ["open", "high", "low", "close", "volume", "amount"]:
+        df[col] = pd.to_numeric(df[col], errors="coerce")
+    df = df.dropna(subset=["trade_date", "open", "high", "low", "close", "volume", "amount"])
     df = df.sort_values("trade_date").reset_index(drop=True)
+    df["timestamps"] = df["trade_date"]
     # Keep the most recent `lookback` bars
     df = df.tail(lookback).reset_index(drop=True)
     x_df = df[["open", "high", "low", "close", "volume", "amount"]].copy()
     x_timestamp = df["timestamps"].copy()
+    last_trade_date = df["trade_date"].iloc[-1].strftime("%Y%m%d")
     return x_df, x_timestamp, last_trade_date
     follow `last_trade_date` (format: YYYYMMDD).
     """
     last_dt = datetime.strptime(last_trade_date, "%Y%m%d")
+    dates: list[pd.Timestamp] = []
+    # Prefer real exchange trade dates from AkShare.
+    try:
+        all_dates = _get_trade_calendar_cached()
+        dates.extend([d for d in all_dates if d > pd.Timestamp(last_dt)][:pred_len])
+    except Exception:
+        # If calendar fetch fails, fall back to weekday-based dates.
+        pass
+    candidate = last_dt + timedelta(days=1)
+    while len(dates) < pred_len:
+        if candidate.weekday() < 5:
+            dates.append(pd.Timestamp(candidate))
+        candidate += timedelta(days=1)
+    return pd.Series(pd.DatetimeIndex(dates[:pred_len]))

predictor.py CHANGED Viewed

@@ -24,6 +24,7 @@ logger = logging.getLogger(__name__)
 KRONOS_DIR = os.environ.get("KRONOS_DIR", "/app/Kronos")
 MODEL_ID = "NeoQuasar/Kronos-base"
 TOKENIZER_ID = "NeoQuasar/Kronos-Tokenizer-base"
 # ── Bootstrap Kronos source ──────────────────────────────────────────────────
@@ -66,6 +67,37 @@ def get_predictor() -> KronosPredictor:
     return _predictor
 # ── Monte-Carlo prediction ────────────────────────────────────────────────────
 def run_mc_prediction(
     x_df: pd.DataFrame,
@@ -83,45 +115,62 @@ def run_mc_prediction(
         ci             : dict[field]["low"/"high"] → ndarray(pred_len,), 95% CI
         trading_low    : ndarray(pred_len,), q2.5 of predicted_low
         trading_high   : ndarray(pred_len,), q97.5 of predicted_high
-        direction_prob : float ∈ [0,1], fraction of samples where final close > last close
         last_close     : float, closing price of the last historical bar
     """
     predictor = get_predictor()
-    samples = []
-    for _ in range(sample_count):
         with _infer_lock:
-            s = predictor.predict(
                 df=x_df,
                 x_timestamp=x_timestamp,
                 y_timestamp=y_timestamp,
                 pred_len=pred_len,
                 T=0.8,
                 top_p=0.9,
-                sample_count=1,
                 verbose=False,
             )
-        samples.append(s)
     pred_mean = pd.concat(samples).groupby(level=0).mean()
-    def stack(field: str) -> np.ndarray:
-        return np.stack([s[field].values for s in samples])  # (sample_count, pred_len)
     alpha = 2.5  # → 95 % CI
     ci = {
         field: {
-            "low":  np.percentile(stack(field), alpha,       axis=0),
-            "high": np.percentile(stack(field), 100 - alpha, axis=0),
         }
-        for field in ["open", "high", "low", "close", "volume"]
     }
     trading_low  = ci["low"]["low"]    # q2.5  of the predicted daily low
     trading_high = ci["high"]["high"]  # q97.5 of the predicted daily high
     last_close = float(x_df["close"].iloc[-1])
-    bull_count = sum(float(s["close"].iloc[-1]) > last_close for s in samples)
-    direction_prob = bull_count / sample_count
     return pred_mean, ci, trading_low, trading_high, direction_prob, last_close

 KRONOS_DIR = os.environ.get("KRONOS_DIR", "/app/Kronos")
 MODEL_ID = "NeoQuasar/Kronos-base"
 TOKENIZER_ID = "NeoQuasar/Kronos-Tokenizer-base"
+MC_BATCH_SIZE = max(1, int(os.environ.get("MC_BATCH_SIZE", "8")))
 # ── Bootstrap Kronos source ──────────────────────────────────────────────────
     return _predictor
+def _split_batched_output(
+    pred_output,
+    expected_count: int,
+    pred_len: int,
+) -> list[pd.DataFrame]:
+    """
+    Normalize predictor output into `expected_count` DataFrame samples.
+    Supports single-sample DataFrame and common batched return shapes.
+    """
+    if isinstance(pred_output, pd.DataFrame):
+        if expected_count == 1:
+            return [pred_output]
+        if isinstance(pred_output.index, pd.MultiIndex):
+            grouped = [g.droplevel(0) for _, g in pred_output.groupby(level=0, sort=False)]
+            if len(grouped) == expected_count:
+                return grouped
+        if len(pred_output) == expected_count * pred_len:
+            return [
+                pred_output.iloc[i * pred_len:(i + 1) * pred_len].copy()
+                for i in range(expected_count)
+            ]
+    if isinstance(pred_output, (list, tuple)):
+        if len(pred_output) == expected_count and all(
+            isinstance(item, pd.DataFrame) for item in pred_output
+        ):
+            return list(pred_output)
+        if expected_count == 1 and len(pred_output) == 1 and isinstance(pred_output[0], pd.DataFrame):
+            return [pred_output[0]]
+    raise ValueError("Unsupported predict() output format for batched sampling")
 # ── Monte-Carlo prediction ────────────────────────────────────────────────────
 def run_mc_prediction(
     x_df: pd.DataFrame,
         ci             : dict[field]["low"/"high"] → ndarray(pred_len,), 95% CI
         trading_low    : ndarray(pred_len,), q2.5 of predicted_low
         trading_high   : ndarray(pred_len,), q97.5 of predicted_high
+        direction_prob : float ∈ [0,1], horizon-level bullish probability
         last_close     : float, closing price of the last historical bar
     """
     predictor = get_predictor()
+    samples: list[pd.DataFrame] = []
+    supports_batched_sampling = True
+    remaining = sample_count
+    while remaining > 0:
+        batch_n = min(remaining, MC_BATCH_SIZE if supports_batched_sampling else 1)
         with _infer_lock:
+            pred_output = predictor.predict(
                 df=x_df,
                 x_timestamp=x_timestamp,
                 y_timestamp=y_timestamp,
                 pred_len=pred_len,
                 T=0.8,
                 top_p=0.9,
+                sample_count=batch_n,
                 verbose=False,
             )
+        try:
+            batch_samples = _split_batched_output(pred_output, batch_n, pred_len)
+        except ValueError:
+            if batch_n > 1:
+                # Fallback for predictor implementations that do not support
+                # returning per-sample outputs for sample_count>1.
+                supports_batched_sampling = False
+                continue
+            raise
+        samples.extend(batch_samples)
+        remaining -= batch_n
     pred_mean = pd.concat(samples).groupby(level=0).mean()
+    stacked = {
+        field: np.stack([s[field].values for s in samples])  # (sample_count, pred_len)
+        for field in ["open", "high", "low", "close", "volume"]
+    }
     alpha = 2.5  # → 95 % CI
     ci = {
         field: {
+            "low":  np.percentile(stacked[field], alpha,       axis=0),
+            "high": np.percentile(stacked[field], 100 - alpha, axis=0),
         }
+        for field in stacked
     }
     trading_low  = ci["low"]["low"]    # q2.5  of the predicted daily low
     trading_high = ci["high"]["high"]  # q97.5 of the predicted daily high
     last_close = float(x_df["close"].iloc[-1])
+    close_paths = stacked["close"]  # (sample_count, pred_len)
+    # Use all future points to estimate horizon bullish probability.
+    bull_count = int((close_paths > last_close).sum())
+    total_points = int(close_paths.size)
+    direction_prob = bull_count / total_points
     return pred_mean, ci, trading_low, trading_high, direction_prob, last_close

requirements.txt CHANGED Viewed

@@ -9,4 +9,4 @@ huggingface_hub==0.33.1
 matplotlib==3.9.3
 tqdm==4.67.1
 safetensors==0.6.2
-tushare

 matplotlib==3.9.3
 tqdm==4.67.1
 safetensors==0.6.2
+akshare