NSE Nifty 50 Swing Trading Predictor v9 — PRODUCTION UNIFIED MODEL

Author: mohan170802
Models: mohan170802/nse-nifty50-swing-predictor

Single Unified Model: 60.3% Test Accuracy on Unseen Data

After extensive data science and ML engineering research, we achieved 60.3% test accuracy with a single unified model that predicts BUY vs NOT_BUY for all 50 NSE Nifty stocks simultaneously.

Architecture: Why This Works

Previous Approach (v7): 7 Sector Models

One model per sector: BANKING, IT, AUTO, ENERGY, FMCG, PHARMA, DIVERSIFIED
Mean accuracy: 57.7% | Best: IT 62.5%
Problem: Requires maintaining 7 models, no cross-sector learning

v9: Single Unified Cross-Sectional Model ✅

One CatBoost model trained on ALL 50 stocks together
Cross-sectional features: z-scores and percentile ranks computed within each trading date across all stocks
Temporal target encoding: Expanding mean of BUY rate per ticker/sector (no leakage)
Time-decay weighting: Recent samples weighted more heavily
Native categorical handling: ticker (50 levels) and sector (7 levels) as CatBoost categorical features

Results

Model	Val Accuracy	Test Accuracy	Test F1	Test AUC
CatBoost	62.5%	60.3%	0.337	0.559
XGBoost	57.7%	58.6%	0.357	0.551

Winner: CatBoost — better categorical feature handling, ordered target statistics prevent leakage

Top 15 Features (by importance)

Rank	Feature	Importance	Description
1	`vix_level`	23.0	India VIX (market fear gauge)
2	`month`	21.2	Calendar month (seasonality)
3	`ticker`	4.4	Stock ticker (CatBoost categorical)
4	`macd_signal_rel_z`	3.5	MACD signal cross-sectional z-score
5	`sector`	3.4	Sector (CatBoost categorical)
6	`vix_chg_5d`	3.2	VIX 5-day change
7	`ticker_target_enc`	2.9	Historical BUY rate per ticker
8	`obv`	2.8	On-balance volume
9	`macd_rel_z`	2.4	MACD cross-sectional z-score
10	`sector_target_enc`	2.1	Historical BUY rate per sector
11	`nifty_ret`	2.0	Nifty50 daily return
12	`atr_norm`	1.8	Normalized ATR (volatility)
13	`adx_14`	1.8	ADX (trend strength)
14	`rsi_28`	1.7	RSI 28-day
15	`dist_sma_50`	1.7	Distance from 50-day SMA

Key insight: vix_level and month dominate — market regime and seasonality matter more than any single stock indicator. Cross-sectional relative features (*_rel_z) contribute significantly.

Feature Engineering (105 features)

Technical Indicators

Momentum: RSI(14), RSI(28), MACD, Stochastic, ADX
Volatility: Bollinger Bands %B & width, ATR normalized
Volume: OBV, VWAP deviation, volume ratio

Returns & Trend

Lagged log returns: 1, 2, 3, 5, 10 days
Rolling: mean/volatility 5, 10, 20 days
Price action: HL range, distance from SMA(5/10/20/50), trend 5/20, momentum 10/20

Cross-Sectional (Relative) — THE KEY INNOVATION

For every raw feature: feature_rel_z = z-score across all 50 stocks on same date
For every raw feature: feature_rel_pct = percentile rank across all 50 stocks
Sector-relative: z-score within each sector on same date
Market percentile: stock's return rank vs all stocks

Macro & Calendar

Nifty50 return, India VIX level & 5d change, relative to Nifty
Day-of-week, month, ticker, sector

Temporal Target Encoding (No Leakage)

ticker_target_enc: Expanding mean of BUY rate per ticker, shifted by 1
sector_target_enc: Expanding mean of BUY rate per sector, shifted by 1

Training Details

Parameter	Value
Data	5 years daily OHLCV (Jul 2021 – Apr 2026)
Stocks	50 NSE Nifty constituents
Samples	58,790 rows
Split	70/15/15 temporal (train/val/test)
Horizon	10 trading days
Target	+3% upside within 10 days = BUY
Algorithm	CatBoostClassifier
Loss	Logloss (binary)
Metric	AUC (early stopping)
Depth	6
Learning rate	0.03
Iterations	2,000 (early stop at ~500)
Categoricals	ticker (50), sector (7)
Weighting	Time-decay: exp(-days_ago / 365 * 0.5)

Production Usage

Quick Predict

python nse_v9_predict.py --ticker RELIANCE.NS --threshold 0.55

Python API

import catboost as cb
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download("mohan170802/nse-nifty50-swing-predictor", "unified_model_v9.cbm")
model = cb.CatBoostClassifier()
model.load_model(model_path)

# After engineering same 105 features from latest data:
# proba = model.predict_proba(features)[0, 1]
# signal = "BUY" if proba > 0.55 else "NOT_BUY"

Thresholds for Swing Trading

Strategy	Threshold	Precision	Trades	Use Case
Conservative	0.65	High	Few	Quality over quantity
Moderate	0.55	Balanced	Medium	Recommended start
Aggressive	0.50	Lower	Many	Catch more opportunities

Files

unified_model_v9.cbm          # CatBoost binary model
unified_model_v9.json         # XGBoost backup model
unified_features_v9.txt       # 105 feature names (load order critical)
label_encoders_v9.json        # Ticker/sector label mappings
summary_unified_v9.json       # Full metrics, top features, model comparison
nse_v9_predict.py             # Production inference script
nse_v9_final.py               # Training script (reproducible)

Performance Reality Check

Random baseline: 50% (binary)
Our model: 60.3% on completely unseen test data
Edge: +10.3 percentage points above random
AUC: 0.559 (weak but consistent signal)
F1: 0.337 (low due to class imbalance — 43% buy rate)

Interpretation: This is a weak but real signal. In trading:

60% directional accuracy with proper position sizing and stop-losses can be profitable
Use as ONE signal among many (not standalone)
The model identifies when market regime + stock setup align for above-average probability of +3% move

Improvements for v10+

Fundamental features: P/E, ROE, EPS growth, debt/equity (from NSE/Indian exchanges)
News/sentiment: Earnings announcements, macro events
Options data: Implied volatility, put/call ratio
More history: 10+ years to capture more market cycles
Quarterly retraining: As recommended by Yang et al. research
Ensemble with deep learning: Transformer models (MASTER paper) for cross-sectional attention

⚠️ Disclaimer

Not financial advice. Past performance does not guarantee future results. Use with:

Stop-losses (suggest -2%)
Position sizing (risk ≤2% per trade)
Portfolio-level risk management
Paper trading before real capital

Citation

Built using insights from:

CatBoost: unbiased boosting with categorical features (Dorogush et al., NeurIPS 2017)
MASTER: Market-Guided Stock Transformer (Li et al., KDD 2023)
Marcos López de Prado: Advances in Financial Machine Learning (Triple Barrier Method)

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'mohan170802/nse-nifty50-swing-predictor'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support