NSE Nifty 50 Swing Trading Predictor v9 β€” PRODUCTION UNIFIED MODEL

Author: mohan170802
Models: mohan170802/nse-nifty50-swing-predictor

Single Unified Model: 60.3% Test Accuracy on Unseen Data

After extensive data science and ML engineering research, we achieved 60.3% test accuracy with a single unified model that predicts BUY vs NOT_BUY for all 50 NSE Nifty stocks simultaneously.


Architecture: Why This Works

Previous Approach (v7): 7 Sector Models

  • One model per sector: BANKING, IT, AUTO, ENERGY, FMCG, PHARMA, DIVERSIFIED
  • Mean accuracy: 57.7% | Best: IT 62.5%
  • Problem: Requires maintaining 7 models, no cross-sector learning

v9: Single Unified Cross-Sectional Model βœ…

  • One CatBoost model trained on ALL 50 stocks together
  • Cross-sectional features: z-scores and percentile ranks computed within each trading date across all stocks
  • Temporal target encoding: Expanding mean of BUY rate per ticker/sector (no leakage)
  • Time-decay weighting: Recent samples weighted more heavily
  • Native categorical handling: ticker (50 levels) and sector (7 levels) as CatBoost categorical features

Results

Model Val Accuracy Test Accuracy Test F1 Test AUC
CatBoost 62.5% 60.3% 0.337 0.559
XGBoost 57.7% 58.6% 0.357 0.551

Winner: CatBoost β€” better categorical feature handling, ordered target statistics prevent leakage


Top 15 Features (by importance)

Rank Feature Importance Description
1 vix_level 23.0 India VIX (market fear gauge)
2 month 21.2 Calendar month (seasonality)
3 ticker 4.4 Stock ticker (CatBoost categorical)
4 macd_signal_rel_z 3.5 MACD signal cross-sectional z-score
5 sector 3.4 Sector (CatBoost categorical)
6 vix_chg_5d 3.2 VIX 5-day change
7 ticker_target_enc 2.9 Historical BUY rate per ticker
8 obv 2.8 On-balance volume
9 macd_rel_z 2.4 MACD cross-sectional z-score
10 sector_target_enc 2.1 Historical BUY rate per sector
11 nifty_ret 2.0 Nifty50 daily return
12 atr_norm 1.8 Normalized ATR (volatility)
13 adx_14 1.8 ADX (trend strength)
14 rsi_28 1.7 RSI 28-day
15 dist_sma_50 1.7 Distance from 50-day SMA

Key insight: vix_level and month dominate β€” market regime and seasonality matter more than any single stock indicator. Cross-sectional relative features (*_rel_z) contribute significantly.


Feature Engineering (105 features)

Technical Indicators

  • Momentum: RSI(14), RSI(28), MACD, Stochastic, ADX
  • Volatility: Bollinger Bands %B & width, ATR normalized
  • Volume: OBV, VWAP deviation, volume ratio

Returns & Trend

  • Lagged log returns: 1, 2, 3, 5, 10 days
  • Rolling: mean/volatility 5, 10, 20 days
  • Price action: HL range, distance from SMA(5/10/20/50), trend 5/20, momentum 10/20

Cross-Sectional (Relative) β€” THE KEY INNOVATION

  • For every raw feature: feature_rel_z = z-score across all 50 stocks on same date
  • For every raw feature: feature_rel_pct = percentile rank across all 50 stocks
  • Sector-relative: z-score within each sector on same date
  • Market percentile: stock's return rank vs all stocks

Macro & Calendar

  • Nifty50 return, India VIX level & 5d change, relative to Nifty
  • Day-of-week, month, ticker, sector

Temporal Target Encoding (No Leakage)

  • ticker_target_enc: Expanding mean of BUY rate per ticker, shifted by 1
  • sector_target_enc: Expanding mean of BUY rate per sector, shifted by 1

Training Details

Parameter Value
Data 5 years daily OHLCV (Jul 2021 – Apr 2026)
Stocks 50 NSE Nifty constituents
Samples 58,790 rows
Split 70/15/15 temporal (train/val/test)
Horizon 10 trading days
Target +3% upside within 10 days = BUY
Algorithm CatBoostClassifier
Loss Logloss (binary)
Metric AUC (early stopping)
Depth 6
Learning rate 0.03
Iterations 2,000 (early stop at ~500)
Categoricals ticker (50), sector (7)
Weighting Time-decay: exp(-days_ago / 365 * 0.5)

Production Usage

Quick Predict

python nse_v9_predict.py --ticker RELIANCE.NS --threshold 0.55

Python API

import catboost as cb
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download("mohan170802/nse-nifty50-swing-predictor", "unified_model_v9.cbm")
model = cb.CatBoostClassifier()
model.load_model(model_path)

# After engineering same 105 features from latest data:
# proba = model.predict_proba(features)[0, 1]
# signal = "BUY" if proba > 0.55 else "NOT_BUY"

Thresholds for Swing Trading

Strategy Threshold Precision Trades Use Case
Conservative 0.65 High Few Quality over quantity
Moderate 0.55 Balanced Medium Recommended start
Aggressive 0.50 Lower Many Catch more opportunities

Files

unified_model_v9.cbm          # CatBoost binary model
unified_model_v9.json         # XGBoost backup model
unified_features_v9.txt       # 105 feature names (load order critical)
label_encoders_v9.json        # Ticker/sector label mappings
summary_unified_v9.json       # Full metrics, top features, model comparison
nse_v9_predict.py             # Production inference script
nse_v9_final.py               # Training script (reproducible)

Performance Reality Check

  • Random baseline: 50% (binary)
  • Our model: 60.3% on completely unseen test data
  • Edge: +10.3 percentage points above random
  • AUC: 0.559 (weak but consistent signal)
  • F1: 0.337 (low due to class imbalance β€” 43% buy rate)

Interpretation: This is a weak but real signal. In trading:

  • 60% directional accuracy with proper position sizing and stop-losses can be profitable
  • Use as ONE signal among many (not standalone)
  • The model identifies when market regime + stock setup align for above-average probability of +3% move

Improvements for v10+

  1. Fundamental features: P/E, ROE, EPS growth, debt/equity (from NSE/Indian exchanges)
  2. News/sentiment: Earnings announcements, macro events
  3. Options data: Implied volatility, put/call ratio
  4. More history: 10+ years to capture more market cycles
  5. Quarterly retraining: As recommended by Yang et al. research
  6. Ensemble with deep learning: Transformer models (MASTER paper) for cross-sectional attention

⚠️ Disclaimer

Not financial advice. Past performance does not guarantee future results. Use with:

  • Stop-losses (suggest -2%)
  • Position sizing (risk ≀2% per trade)
  • Portfolio-level risk management
  • Paper trading before real capital

Citation

Built using insights from:

  • CatBoost: unbiased boosting with categorical features (Dorogush et al., NeurIPS 2017)
  • MASTER: Market-Guided Stock Transformer (Li et al., KDD 2023)
  • Marcos LΓ³pez de Prado: Advances in Financial Machine Learning (Triple Barrier Method)

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'mohan170802/nse-nifty50-swing-predictor'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support