- NSE Nifty 50 Swing Trading Predictor v9 β PRODUCTION UNIFIED MODEL
- Single Unified Model: 60.3% Test Accuracy on Unseen Data
- Architecture: Why This Works
- Results
- Top 15 Features (by importance)
- Feature Engineering (105 features)
- Training Details
- Production Usage
- Files
- Performance Reality Check
- Improvements for v10+
- β οΈ Disclaimer
- Citation
- Generated by ML Intern
- Usage
- Single Unified Model: 60.3% Test Accuracy on Unseen Data
NSE Nifty 50 Swing Trading Predictor v9 β PRODUCTION UNIFIED MODEL
Author: mohan170802
Models: mohan170802/nse-nifty50-swing-predictor
Single Unified Model: 60.3% Test Accuracy on Unseen Data
After extensive data science and ML engineering research, we achieved 60.3% test accuracy with a single unified model that predicts BUY vs NOT_BUY for all 50 NSE Nifty stocks simultaneously.
Architecture: Why This Works
Previous Approach (v7): 7 Sector Models
- One model per sector: BANKING, IT, AUTO, ENERGY, FMCG, PHARMA, DIVERSIFIED
- Mean accuracy: 57.7% | Best: IT 62.5%
- Problem: Requires maintaining 7 models, no cross-sector learning
v9: Single Unified Cross-Sectional Model β
- One CatBoost model trained on ALL 50 stocks together
- Cross-sectional features: z-scores and percentile ranks computed within each trading date across all stocks
- Temporal target encoding: Expanding mean of BUY rate per ticker/sector (no leakage)
- Time-decay weighting: Recent samples weighted more heavily
- Native categorical handling:
ticker(50 levels) andsector(7 levels) as CatBoost categorical features
Results
| Model | Val Accuracy | Test Accuracy | Test F1 | Test AUC |
|---|---|---|---|---|
| CatBoost | 62.5% | 60.3% | 0.337 | 0.559 |
| XGBoost | 57.7% | 58.6% | 0.357 | 0.551 |
Winner: CatBoost β better categorical feature handling, ordered target statistics prevent leakage
Top 15 Features (by importance)
| Rank | Feature | Importance | Description |
|---|---|---|---|
| 1 | vix_level |
23.0 | India VIX (market fear gauge) |
| 2 | month |
21.2 | Calendar month (seasonality) |
| 3 | ticker |
4.4 | Stock ticker (CatBoost categorical) |
| 4 | macd_signal_rel_z |
3.5 | MACD signal cross-sectional z-score |
| 5 | sector |
3.4 | Sector (CatBoost categorical) |
| 6 | vix_chg_5d |
3.2 | VIX 5-day change |
| 7 | ticker_target_enc |
2.9 | Historical BUY rate per ticker |
| 8 | obv |
2.8 | On-balance volume |
| 9 | macd_rel_z |
2.4 | MACD cross-sectional z-score |
| 10 | sector_target_enc |
2.1 | Historical BUY rate per sector |
| 11 | nifty_ret |
2.0 | Nifty50 daily return |
| 12 | atr_norm |
1.8 | Normalized ATR (volatility) |
| 13 | adx_14 |
1.8 | ADX (trend strength) |
| 14 | rsi_28 |
1.7 | RSI 28-day |
| 15 | dist_sma_50 |
1.7 | Distance from 50-day SMA |
Key insight: vix_level and month dominate β market regime and seasonality matter more than any single stock indicator. Cross-sectional relative features (*_rel_z) contribute significantly.
Feature Engineering (105 features)
Technical Indicators
- Momentum: RSI(14), RSI(28), MACD, Stochastic, ADX
- Volatility: Bollinger Bands %B & width, ATR normalized
- Volume: OBV, VWAP deviation, volume ratio
Returns & Trend
- Lagged log returns: 1, 2, 3, 5, 10 days
- Rolling: mean/volatility 5, 10, 20 days
- Price action: HL range, distance from SMA(5/10/20/50), trend 5/20, momentum 10/20
Cross-Sectional (Relative) β THE KEY INNOVATION
- For every raw feature:
feature_rel_z= z-score across all 50 stocks on same date - For every raw feature:
feature_rel_pct= percentile rank across all 50 stocks - Sector-relative: z-score within each sector on same date
- Market percentile: stock's return rank vs all stocks
Macro & Calendar
- Nifty50 return, India VIX level & 5d change, relative to Nifty
- Day-of-week, month, ticker, sector
Temporal Target Encoding (No Leakage)
ticker_target_enc: Expanding mean of BUY rate per ticker, shifted by 1sector_target_enc: Expanding mean of BUY rate per sector, shifted by 1
Training Details
| Parameter | Value |
|---|---|
| Data | 5 years daily OHLCV (Jul 2021 β Apr 2026) |
| Stocks | 50 NSE Nifty constituents |
| Samples | 58,790 rows |
| Split | 70/15/15 temporal (train/val/test) |
| Horizon | 10 trading days |
| Target | +3% upside within 10 days = BUY |
| Algorithm | CatBoostClassifier |
| Loss | Logloss (binary) |
| Metric | AUC (early stopping) |
| Depth | 6 |
| Learning rate | 0.03 |
| Iterations | 2,000 (early stop at ~500) |
| Categoricals | ticker (50), sector (7) |
| Weighting | Time-decay: exp(-days_ago / 365 * 0.5) |
Production Usage
Quick Predict
python nse_v9_predict.py --ticker RELIANCE.NS --threshold 0.55
Python API
import catboost as cb
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download("mohan170802/nse-nifty50-swing-predictor", "unified_model_v9.cbm")
model = cb.CatBoostClassifier()
model.load_model(model_path)
# After engineering same 105 features from latest data:
# proba = model.predict_proba(features)[0, 1]
# signal = "BUY" if proba > 0.55 else "NOT_BUY"
Thresholds for Swing Trading
| Strategy | Threshold | Precision | Trades | Use Case |
|---|---|---|---|---|
| Conservative | 0.65 | High | Few | Quality over quantity |
| Moderate | 0.55 | Balanced | Medium | Recommended start |
| Aggressive | 0.50 | Lower | Many | Catch more opportunities |
Files
unified_model_v9.cbm # CatBoost binary model
unified_model_v9.json # XGBoost backup model
unified_features_v9.txt # 105 feature names (load order critical)
label_encoders_v9.json # Ticker/sector label mappings
summary_unified_v9.json # Full metrics, top features, model comparison
nse_v9_predict.py # Production inference script
nse_v9_final.py # Training script (reproducible)
Performance Reality Check
- Random baseline: 50% (binary)
- Our model: 60.3% on completely unseen test data
- Edge: +10.3 percentage points above random
- AUC: 0.559 (weak but consistent signal)
- F1: 0.337 (low due to class imbalance β 43% buy rate)
Interpretation: This is a weak but real signal. In trading:
- 60% directional accuracy with proper position sizing and stop-losses can be profitable
- Use as ONE signal among many (not standalone)
- The model identifies when market regime + stock setup align for above-average probability of +3% move
Improvements for v10+
- Fundamental features: P/E, ROE, EPS growth, debt/equity (from NSE/Indian exchanges)
- News/sentiment: Earnings announcements, macro events
- Options data: Implied volatility, put/call ratio
- More history: 10+ years to capture more market cycles
- Quarterly retraining: As recommended by Yang et al. research
- Ensemble with deep learning: Transformer models (MASTER paper) for cross-sectional attention
β οΈ Disclaimer
Not financial advice. Past performance does not guarantee future results. Use with:
- Stop-losses (suggest -2%)
- Position sizing (risk β€2% per trade)
- Portfolio-level risk management
- Paper trading before real capital
Citation
Built using insights from:
- CatBoost: unbiased boosting with categorical features (Dorogush et al., NeurIPS 2017)
- MASTER: Market-Guided Stock Transformer (Li et al., KDD 2023)
- Marcos LΓ³pez de Prado: Advances in Financial Machine Learning (Triple Barrier Method)
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'mohan170802/nse-nifty50-swing-predictor'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.