Prediction_site / README.md
Jitendra12421's picture
Upload 7 files
a32ec2b verified
---
title: Trading Forecasting Backend
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---
# Trading Forecasting Backend
This folder is now a standalone Hugging Face Docker Space backend. Upload the contents of this `backend` folder to a Hugging Face Space repository, upload the separate `dataset` folder to a Hugging Face Dataset repository, and deploy the separate `frontend` folder to Netlify.
The backend contains the quantitative model code, training scripts, model outputs, primary market data, and alternative data from the forecasting research workspace.
## Hugging Face Space Setup
Create a new Hugging Face Space with Docker SDK, then upload this backend folder as the Space root.
Required Space variables/secrets:
- `FRONTEND_ORIGINS`: your Netlify URL, for example `https://your-site.netlify.app`.
- `CRON_SECRET`: a long shared secret. Use the same value in Netlify.
- `HF_DATASET_REPO_ID`: your Hugging Face Dataset repo id, for example `your-username/your-forecasting-dataset`.
Useful optional settings:
- `AUTO_UPDATE_ENABLED=true`
- `AUTO_RETRAIN_ENABLED=true`
- `AUTO_UPDATE_ON_START=false`
- `DATASET_SYNC_ON_START=true`
- `HF_DATASET_REVISION=main`
- `DAILY_UPDATE_TIME=17:30`
- `UPDATE_TIMEZONE=Asia/Kolkata`
- `MARKET_BUILD_WORKERS=2`
The app listens on port `7860` and exposes Swagger docs at `/docs`.
## API Routes
- `GET /health` - Space health, file checks, latest data date, and update status.
- `GET /api/status` - same as health, for frontend polling.
- `GET /api/forecast/latest` - latest stock high/low, first-extrema, and Nifty forecasts.
- `GET /api/models/summaries` - model summary JSONs.
- `GET /api/data/catalog` - searchable data manifest.
- `GET /api/data/sample?category=bars&asset=nifty50&timeframe=1d` - small sample from a manifest dataset.
- `POST /api/cron/tick` - Netlify scheduled ping endpoint; starts an update only when due.
- `POST /api/update/start` - manual update trigger. Send `x-admin-secret` if `CRON_SECRET` or `ADMIN_SECRET` is set.
- `POST /api/dataset/sync` - manually sync the Hugging Face Dataset repo into the Space runtime.
## Netlify Keep-Awake Cron
The `frontend` folder now includes:
- `frontend/netlify.toml`
- `frontend/netlify/functions/keep-space-awake.mjs`
On Netlify, set these environment variables:
- `HUGGING_FACE_SPACE_URL=https://YOUR-HF-USERNAME-YOUR-SPACE.hf.space`
- `CRON_SECRET=<same value as the Space CRON_SECRET>`
The scheduled function runs every 10 minutes and calls `/api/cron/tick`. This keeps the Space warm and lets the backend start its daily update/retrain job after the configured market-close time.
## Layout
- `app.py` - FastAPI backend app for Hugging Face Spaces.
- `Dockerfile` - Docker Space runtime setup.
- `requirements.txt` - Python dependencies.
- `research_runtime/Code/models/` - trainable model packages and the small latest forecast/summary outputs needed by the API.
- `research_runtime/Code/scripts/data_ingestion/` - data refresh scripts used by update jobs.
- `research_runtime/Code/scripts/data_preparation/` - research data rebuild scripts used by update jobs.
`research_runtime/Data/` and `research_runtime/Alt Data/` are intentionally not bundled in the Space repo anymore. They now live in the separate Hugging Face Dataset repo and are downloaded into `research_runtime/` by the backend when `HF_DATASET_REPO_ID` is set.
## Main Model Outputs To Wire First
- Stock high/low forecasts: `research_runtime/Code/models/stock_high_low_forecaster/outputs/latest_forecasts.csv`
- Stock high/low metrics: `research_runtime/Code/models/stock_high_low_forecaster/outputs/metrics_by_symbol.csv`
- First-extrema forecasts: `research_runtime/Code/models/first_extrema_forecaster/outputs/latest_forecasts.csv`
- Nifty forecasts: `research_runtime/Code/models/nifty_forecaster/outputs/forecaster_latest_forecasts.csv`
- Nifty summary: `research_runtime/Code/models/nifty_forecaster/outputs/forecaster_summary.json`
## Training Entrypoints
Run these from `backend/research_runtime` so project-relative paths resolve correctly:
```powershell
python Code\models\stock_high_low_forecaster\train.py
python Code\models\first_extrema_forecaster\train.py
python Code\models\nifty_forecaster\train.py
```
## Data Labels
These live in the separate Dataset repo:
- Raw minute OHLCV: `Data/raw/minute/*_minute.csv`
- Processed bars: `Data/processed/bars/{1m,5m,1h,4h,1d}/*.csv`
- Processed features: `Data/processed/features/{1m,5m,1h,4h,1d}/*.csv`
- Market panels: `Data/processed/panels/*_market_panel.csv`
- Master daily panel: `Data/processed/panels/daily_master_panel.csv`
- Data manifest: `Data/metadata/manifest.csv`
- Feature dictionary: `Data/metadata/feature_dictionary.csv`
- Options features: `Alt Data/options/processed/*_options_daily_features.csv`
- Institutional panel: `Alt Data/institutional/processed/institutional_daily_panel.csv`
- External daily panel: `Alt Data/external/processed/external_daily_panel.csv`
- Corporate events: `Alt Data/corporate/processed/corporate_announcements.csv`
## Frontend Wiring Notes
The current frontend is static mock data in `frontend/index.html` and `frontend/script.js`.
- Forecast cards can call `/api/forecast/latest`.
- Model accuracy and version/date stats can call `/api/models/summaries`.
- Market Data can call `/api/data/catalog` and `/api/data/sample`.
## Pruned From Backend
- Kotak credential/runtime files.
- Live-trading scripts and live broker artifacts.
- Kotak monitor artifacts and cached NSE temp folders.
- Python `__pycache__` folders.
- CatBoost generated training-log folder.
- One-off maintenance/backfill scripts.
- Backtest artifacts, chart images, old trade reports, test prediction dumps, generated training datasets, and saved model binaries.
`KOTAKBANK` CSV files remain because those are normal market datasets for Kotak Mahindra Bank, not broker-runtime files.