File size: 5,901 Bytes
1730cb5
a32ec2b
 
 
1730cb5
a32ec2b
1730cb5
 
 
a32ec2b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: Trading Forecasting Backend
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---

# Trading Forecasting Backend

This folder is now a standalone Hugging Face Docker Space backend. Upload the contents of this `backend` folder to a Hugging Face Space repository, upload the separate `dataset` folder to a Hugging Face Dataset repository, and deploy the separate `frontend` folder to Netlify.

The backend contains the quantitative model code, training scripts, model outputs, primary market data, and alternative data from the forecasting research workspace.

## Hugging Face Space Setup

Create a new Hugging Face Space with Docker SDK, then upload this backend folder as the Space root.

Required Space variables/secrets:

- `FRONTEND_ORIGINS`: your Netlify URL, for example `https://your-site.netlify.app`.
- `CRON_SECRET`: a long shared secret. Use the same value in Netlify.
- `HF_DATASET_REPO_ID`: your Hugging Face Dataset repo id, for example `your-username/your-forecasting-dataset`.

Useful optional settings:

- `AUTO_UPDATE_ENABLED=true`
- `AUTO_RETRAIN_ENABLED=true`
- `AUTO_UPDATE_ON_START=false`
- `DATASET_SYNC_ON_START=true`
- `HF_DATASET_REVISION=main`
- `DAILY_UPDATE_TIME=17:30`
- `UPDATE_TIMEZONE=Asia/Kolkata`
- `MARKET_BUILD_WORKERS=2`

The app listens on port `7860` and exposes Swagger docs at `/docs`.

## API Routes

- `GET /health` - Space health, file checks, latest data date, and update status.
- `GET /api/status` - same as health, for frontend polling.
- `GET /api/forecast/latest` - latest stock high/low, first-extrema, and Nifty forecasts.
- `GET /api/models/summaries` - model summary JSONs.
- `GET /api/data/catalog` - searchable data manifest.
- `GET /api/data/sample?category=bars&asset=nifty50&timeframe=1d` - small sample from a manifest dataset.
- `POST /api/cron/tick` - Netlify scheduled ping endpoint; starts an update only when due.
- `POST /api/update/start` - manual update trigger. Send `x-admin-secret` if `CRON_SECRET` or `ADMIN_SECRET` is set.
- `POST /api/dataset/sync` - manually sync the Hugging Face Dataset repo into the Space runtime.

## Netlify Keep-Awake Cron

The `frontend` folder now includes:

- `frontend/netlify.toml`
- `frontend/netlify/functions/keep-space-awake.mjs`

On Netlify, set these environment variables:

- `HUGGING_FACE_SPACE_URL=https://YOUR-HF-USERNAME-YOUR-SPACE.hf.space`
- `CRON_SECRET=<same value as the Space CRON_SECRET>`

The scheduled function runs every 10 minutes and calls `/api/cron/tick`. This keeps the Space warm and lets the backend start its daily update/retrain job after the configured market-close time.

## Layout

- `app.py` - FastAPI backend app for Hugging Face Spaces.
- `Dockerfile` - Docker Space runtime setup.
- `requirements.txt` - Python dependencies.
- `research_runtime/Code/models/` - trainable model packages and the small latest forecast/summary outputs needed by the API.
- `research_runtime/Code/scripts/data_ingestion/` - data refresh scripts used by update jobs.
- `research_runtime/Code/scripts/data_preparation/` - research data rebuild scripts used by update jobs.

`research_runtime/Data/` and `research_runtime/Alt Data/` are intentionally not bundled in the Space repo anymore. They now live in the separate Hugging Face Dataset repo and are downloaded into `research_runtime/` by the backend when `HF_DATASET_REPO_ID` is set.

## Main Model Outputs To Wire First

- Stock high/low forecasts: `research_runtime/Code/models/stock_high_low_forecaster/outputs/latest_forecasts.csv`
- Stock high/low metrics: `research_runtime/Code/models/stock_high_low_forecaster/outputs/metrics_by_symbol.csv`
- First-extrema forecasts: `research_runtime/Code/models/first_extrema_forecaster/outputs/latest_forecasts.csv`
- Nifty forecasts: `research_runtime/Code/models/nifty_forecaster/outputs/forecaster_latest_forecasts.csv`
- Nifty summary: `research_runtime/Code/models/nifty_forecaster/outputs/forecaster_summary.json`

## Training Entrypoints

Run these from `backend/research_runtime` so project-relative paths resolve correctly:

```powershell
python Code\models\stock_high_low_forecaster\train.py
python Code\models\first_extrema_forecaster\train.py
python Code\models\nifty_forecaster\train.py
```

## Data Labels

These live in the separate Dataset repo:

- Raw minute OHLCV: `Data/raw/minute/*_minute.csv`
- Processed bars: `Data/processed/bars/{1m,5m,1h,4h,1d}/*.csv`
- Processed features: `Data/processed/features/{1m,5m,1h,4h,1d}/*.csv`
- Market panels: `Data/processed/panels/*_market_panel.csv`
- Master daily panel: `Data/processed/panels/daily_master_panel.csv`
- Data manifest: `Data/metadata/manifest.csv`
- Feature dictionary: `Data/metadata/feature_dictionary.csv`
- Options features: `Alt Data/options/processed/*_options_daily_features.csv`
- Institutional panel: `Alt Data/institutional/processed/institutional_daily_panel.csv`
- External daily panel: `Alt Data/external/processed/external_daily_panel.csv`
- Corporate events: `Alt Data/corporate/processed/corporate_announcements.csv`

## Frontend Wiring Notes

The current frontend is static mock data in `frontend/index.html` and `frontend/script.js`.

- Forecast cards can call `/api/forecast/latest`.
- Model accuracy and version/date stats can call `/api/models/summaries`.
- Market Data can call `/api/data/catalog` and `/api/data/sample`.

## Pruned From Backend

- Kotak credential/runtime files.
- Live-trading scripts and live broker artifacts.
- Kotak monitor artifacts and cached NSE temp folders.
- Python `__pycache__` folders.
- CatBoost generated training-log folder.
- One-off maintenance/backfill scripts.
- Backtest artifacts, chart images, old trade reports, test prediction dumps, generated training datasets, and saved model binaries.

`KOTAKBANK` CSV files remain because those are normal market datasets for Kotak Mahindra Bank, not broker-runtime files.