--- title: Trading Forecasting Backend colorFrom: blue colorTo: green sdk: docker app_port: 7860 pinned: false --- # Trading Forecasting Backend This folder is now a standalone Hugging Face Docker Space backend. Upload the contents of this `backend` folder to a Hugging Face Space repository, upload the separate `dataset` folder to a Hugging Face Dataset repository, and deploy the separate `frontend` folder to Netlify. The backend contains the quantitative model code, training scripts, model outputs, primary market data, and alternative data from the forecasting research workspace. ## Hugging Face Space Setup Create a new Hugging Face Space with Docker SDK, then upload this backend folder as the Space root. Required Space variables/secrets: - `FRONTEND_ORIGINS`: your Netlify URL, for example `https://your-site.netlify.app`. - `CRON_SECRET`: a long shared secret. Use the same value in Netlify. - `HF_DATASET_REPO_ID`: your Hugging Face Dataset repo id, for example `your-username/your-forecasting-dataset`. Useful optional settings: - `AUTO_UPDATE_ENABLED=true` - `AUTO_RETRAIN_ENABLED=true` - `AUTO_UPDATE_ON_START=false` - `DATASET_SYNC_ON_START=true` - `HF_DATASET_REVISION=main` - `DAILY_UPDATE_TIME=17:30` - `UPDATE_TIMEZONE=Asia/Kolkata` - `MARKET_BUILD_WORKERS=2` The app listens on port `7860` and exposes Swagger docs at `/docs`. ## API Routes - `GET /health` - Space health, file checks, latest data date, and update status. - `GET /api/status` - same as health, for frontend polling. - `GET /api/forecast/latest` - latest stock high/low, first-extrema, and Nifty forecasts. - `GET /api/models/summaries` - model summary JSONs. - `GET /api/data/catalog` - searchable data manifest. - `GET /api/data/sample?category=bars&asset=nifty50&timeframe=1d` - small sample from a manifest dataset. - `POST /api/cron/tick` - Netlify scheduled ping endpoint; starts an update only when due. - `POST /api/update/start` - manual update trigger. Send `x-admin-secret` if `CRON_SECRET` or `ADMIN_SECRET` is set. - `POST /api/dataset/sync` - manually sync the Hugging Face Dataset repo into the Space runtime. ## Netlify Keep-Awake Cron The `frontend` folder now includes: - `frontend/netlify.toml` - `frontend/netlify/functions/keep-space-awake.mjs` On Netlify, set these environment variables: - `HUGGING_FACE_SPACE_URL=https://YOUR-HF-USERNAME-YOUR-SPACE.hf.space` - `CRON_SECRET=` The scheduled function runs every 10 minutes and calls `/api/cron/tick`. This keeps the Space warm and lets the backend start its daily update/retrain job after the configured market-close time. ## Layout - `app.py` - FastAPI backend app for Hugging Face Spaces. - `Dockerfile` - Docker Space runtime setup. - `requirements.txt` - Python dependencies. - `research_runtime/Code/models/` - trainable model packages and the small latest forecast/summary outputs needed by the API. - `research_runtime/Code/scripts/data_ingestion/` - data refresh scripts used by update jobs. - `research_runtime/Code/scripts/data_preparation/` - research data rebuild scripts used by update jobs. `research_runtime/Data/` and `research_runtime/Alt Data/` are intentionally not bundled in the Space repo anymore. They now live in the separate Hugging Face Dataset repo and are downloaded into `research_runtime/` by the backend when `HF_DATASET_REPO_ID` is set. ## Main Model Outputs To Wire First - Stock high/low forecasts: `research_runtime/Code/models/stock_high_low_forecaster/outputs/latest_forecasts.csv` - Stock high/low metrics: `research_runtime/Code/models/stock_high_low_forecaster/outputs/metrics_by_symbol.csv` - First-extrema forecasts: `research_runtime/Code/models/first_extrema_forecaster/outputs/latest_forecasts.csv` - Nifty forecasts: `research_runtime/Code/models/nifty_forecaster/outputs/forecaster_latest_forecasts.csv` - Nifty summary: `research_runtime/Code/models/nifty_forecaster/outputs/forecaster_summary.json` ## Training Entrypoints Run these from `backend/research_runtime` so project-relative paths resolve correctly: ```powershell python Code\models\stock_high_low_forecaster\train.py python Code\models\first_extrema_forecaster\train.py python Code\models\nifty_forecaster\train.py ``` ## Data Labels These live in the separate Dataset repo: - Raw minute OHLCV: `Data/raw/minute/*_minute.csv` - Processed bars: `Data/processed/bars/{1m,5m,1h,4h,1d}/*.csv` - Processed features: `Data/processed/features/{1m,5m,1h,4h,1d}/*.csv` - Market panels: `Data/processed/panels/*_market_panel.csv` - Master daily panel: `Data/processed/panels/daily_master_panel.csv` - Data manifest: `Data/metadata/manifest.csv` - Feature dictionary: `Data/metadata/feature_dictionary.csv` - Options features: `Alt Data/options/processed/*_options_daily_features.csv` - Institutional panel: `Alt Data/institutional/processed/institutional_daily_panel.csv` - External daily panel: `Alt Data/external/processed/external_daily_panel.csv` - Corporate events: `Alt Data/corporate/processed/corporate_announcements.csv` ## Frontend Wiring Notes The current frontend is static mock data in `frontend/index.html` and `frontend/script.js`. - Forecast cards can call `/api/forecast/latest`. - Model accuracy and version/date stats can call `/api/models/summaries`. - Market Data can call `/api/data/catalog` and `/api/data/sample`. ## Pruned From Backend - Kotak credential/runtime files. - Live-trading scripts and live broker artifacts. - Kotak monitor artifacts and cached NSE temp folders. - Python `__pycache__` folders. - CatBoost generated training-log folder. - One-off maintenance/backfill scripts. - Backtest artifacts, chart images, old trade reports, test prediction dumps, generated training datasets, and saved model binaries. `KOTAKBANK` CSV files remain because those are normal market datasets for Kotak Mahindra Bank, not broker-runtime files.