Prediction_site / README.md
Jitendra12421's picture
Upload 7 files
a32ec2b verified
metadata
title: Trading Forecasting Backend
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false

Trading Forecasting Backend

This folder is now a standalone Hugging Face Docker Space backend. Upload the contents of this backend folder to a Hugging Face Space repository, upload the separate dataset folder to a Hugging Face Dataset repository, and deploy the separate frontend folder to Netlify.

The backend contains the quantitative model code, training scripts, model outputs, primary market data, and alternative data from the forecasting research workspace.

Hugging Face Space Setup

Create a new Hugging Face Space with Docker SDK, then upload this backend folder as the Space root.

Required Space variables/secrets:

  • FRONTEND_ORIGINS: your Netlify URL, for example https://your-site.netlify.app.
  • CRON_SECRET: a long shared secret. Use the same value in Netlify.
  • HF_DATASET_REPO_ID: your Hugging Face Dataset repo id, for example your-username/your-forecasting-dataset.

Useful optional settings:

  • AUTO_UPDATE_ENABLED=true
  • AUTO_RETRAIN_ENABLED=true
  • AUTO_UPDATE_ON_START=false
  • DATASET_SYNC_ON_START=true
  • HF_DATASET_REVISION=main
  • DAILY_UPDATE_TIME=17:30
  • UPDATE_TIMEZONE=Asia/Kolkata
  • MARKET_BUILD_WORKERS=2

The app listens on port 7860 and exposes Swagger docs at /docs.

API Routes

  • GET /health - Space health, file checks, latest data date, and update status.
  • GET /api/status - same as health, for frontend polling.
  • GET /api/forecast/latest - latest stock high/low, first-extrema, and Nifty forecasts.
  • GET /api/models/summaries - model summary JSONs.
  • GET /api/data/catalog - searchable data manifest.
  • GET /api/data/sample?category=bars&asset=nifty50&timeframe=1d - small sample from a manifest dataset.
  • POST /api/cron/tick - Netlify scheduled ping endpoint; starts an update only when due.
  • POST /api/update/start - manual update trigger. Send x-admin-secret if CRON_SECRET or ADMIN_SECRET is set.
  • POST /api/dataset/sync - manually sync the Hugging Face Dataset repo into the Space runtime.

Netlify Keep-Awake Cron

The frontend folder now includes:

  • frontend/netlify.toml
  • frontend/netlify/functions/keep-space-awake.mjs

On Netlify, set these environment variables:

  • HUGGING_FACE_SPACE_URL=https://YOUR-HF-USERNAME-YOUR-SPACE.hf.space
  • CRON_SECRET=<same value as the Space CRON_SECRET>

The scheduled function runs every 10 minutes and calls /api/cron/tick. This keeps the Space warm and lets the backend start its daily update/retrain job after the configured market-close time.

Layout

  • app.py - FastAPI backend app for Hugging Face Spaces.
  • Dockerfile - Docker Space runtime setup.
  • requirements.txt - Python dependencies.
  • research_runtime/Code/models/ - trainable model packages and the small latest forecast/summary outputs needed by the API.
  • research_runtime/Code/scripts/data_ingestion/ - data refresh scripts used by update jobs.
  • research_runtime/Code/scripts/data_preparation/ - research data rebuild scripts used by update jobs.

research_runtime/Data/ and research_runtime/Alt Data/ are intentionally not bundled in the Space repo anymore. They now live in the separate Hugging Face Dataset repo and are downloaded into research_runtime/ by the backend when HF_DATASET_REPO_ID is set.

Main Model Outputs To Wire First

  • Stock high/low forecasts: research_runtime/Code/models/stock_high_low_forecaster/outputs/latest_forecasts.csv
  • Stock high/low metrics: research_runtime/Code/models/stock_high_low_forecaster/outputs/metrics_by_symbol.csv
  • First-extrema forecasts: research_runtime/Code/models/first_extrema_forecaster/outputs/latest_forecasts.csv
  • Nifty forecasts: research_runtime/Code/models/nifty_forecaster/outputs/forecaster_latest_forecasts.csv
  • Nifty summary: research_runtime/Code/models/nifty_forecaster/outputs/forecaster_summary.json

Training Entrypoints

Run these from backend/research_runtime so project-relative paths resolve correctly:

python Code\models\stock_high_low_forecaster\train.py
python Code\models\first_extrema_forecaster\train.py
python Code\models\nifty_forecaster\train.py

Data Labels

These live in the separate Dataset repo:

  • Raw minute OHLCV: Data/raw/minute/*_minute.csv
  • Processed bars: Data/processed/bars/{1m,5m,1h,4h,1d}/*.csv
  • Processed features: Data/processed/features/{1m,5m,1h,4h,1d}/*.csv
  • Market panels: Data/processed/panels/*_market_panel.csv
  • Master daily panel: Data/processed/panels/daily_master_panel.csv
  • Data manifest: Data/metadata/manifest.csv
  • Feature dictionary: Data/metadata/feature_dictionary.csv
  • Options features: Alt Data/options/processed/*_options_daily_features.csv
  • Institutional panel: Alt Data/institutional/processed/institutional_daily_panel.csv
  • External daily panel: Alt Data/external/processed/external_daily_panel.csv
  • Corporate events: Alt Data/corporate/processed/corporate_announcements.csv

Frontend Wiring Notes

The current frontend is static mock data in frontend/index.html and frontend/script.js.

  • Forecast cards can call /api/forecast/latest.
  • Model accuracy and version/date stats can call /api/models/summaries.
  • Market Data can call /api/data/catalog and /api/data/sample.

Pruned From Backend

  • Kotak credential/runtime files.
  • Live-trading scripts and live broker artifacts.
  • Kotak monitor artifacts and cached NSE temp folders.
  • Python __pycache__ folders.
  • CatBoost generated training-log folder.
  • One-off maintenance/backfill scripts.
  • Backtest artifacts, chart images, old trade reports, test prediction dumps, generated training datasets, and saved model binaries.

KOTAKBANK CSV files remain because those are normal market datasets for Kotak Mahindra Bank, not broker-runtime files.