YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
language: - en - tr tags: - finance - stocks - news-analysis - stock-prediction - information-retrieval license: apache-2.0 datasets: - financial-news metrics: - recall pipeline_tag: zero-shot-classification
π Stocky Stock Predictor
Predict stock symbols from news headlines using a two-stage deep learning pipeline.
π― Model Overview
This model predicts which stock symbols are mentioned or relevant to a given news headline. It uses a sophisticated two-stage retrieval system:
- Stage 1 - Contrastive Retrieval: Fast dual-encoder retrieves top-50 candidates
- Stage 2 - Cross-Encoder Reranking: Precise scoring to get final top-10 predictions
π Performance
Test Set Metrics (10,620 samples):
- Recall@5: 51.17%
- Recall@10: 54.40%
- Recall@20: 58.76%
Real-World Performance:
- Direct company mentions (e.g., "Nvidia announces..."): ~80% accuracy
- Generic sector news (e.g., "Tech stocks rally"): ~50% accuracy
π Quick Start
from predict import StockyPredictor
# Initialize predictor
predictor = StockyPredictor()
# Your list of stock symbols (load all 6,973 stocks in production)
stocks = ["NVDA", "AAPL", "MSFT", "GOOGL", "AMZN", "TSLA", "META", ...]
# Predict
title = "Nvidia announces new AI chip breakthrough"
predictions = predictor.predict(title, stocks, top_k=10)
# Results: [('NVDA', 0.95), ('AMD', 0.78), ...]
for stock, score in predictions:
print(f"{stock}: {score:.4f}")
π‘ Use Cases
- News Analysis: Automatically tag news articles with relevant stocks
- Trading Signals: Identify stocks mentioned in breaking news
- Portfolio Monitoring: Track news about your holdings
- Market Research: Analyze media coverage of stocks
ποΈ Model Architecture
Components:
Tokenizer (35K vocab)
- Custom WordPiece trained on financial text
- Top 500 stocks as special tokens
- 98.4% coverage of frequent stocks
MLM Pretrained Encoder (113M params)
- BERT-base architecture
- Pretrained from scratch on FinCorpus + news
- Perplexity: 5.08
Contrastive Model (226M params)
- Dual-encoder (title encoder + stock encoder)
- Trained with InfoNCE loss
- Retrieves top-50 candidates in milliseconds
Cross-Encoder Reranker (113M params)
- BERT + classification head
- Binary relevance scoring
- Validation loss: 0.1619
Training Details:
- Dataset: 106,207 news articles, 6,973 unique stocks
- Training time: ~4-5 days on V100 GPU
- Stages: MLM (3 epochs) β Contrastive (5 epochs) β Reranking (3 epochs)
π Examples
Example 1: Tech Company
title = "Apple unveils new iPhone with advanced camera"
predictions = predictor.predict(title, all_stocks)
# Top result: AAPL (0.94)
Example 2: Cryptocurrency
title = "Bitcoin price surges past $50,000"
predictions = predictor.predict(title, all_stocks)
# Top result: BTC-USD (0.92)
Example 3: Multiple Companies
title = "Tech giants report strong earnings amid AI boom"
predictions = predictor.predict(title, all_stocks)
# Top results: NVDA (0.89), MSFT (0.85), GOOGL (0.78)
β οΈ Limitations
- Works best when company names are mentioned directly
- Generic market news (e.g., "tech sector rallies") may have lower accuracy
- ETF symbols (SPY, QQQ) have limited training data
- Primarily trained on English news (some Turkish support)
π§ Advanced Usage
Custom Stock List
# Load your own stock universe
import pandas as pd
stocks_df = pd.read_csv("my_stocks.csv")
stocks = stocks_df["symbol"].tolist()
predictions = predictor.predict(title, stocks, top_k=20)
Batch Prediction
news_headlines = [
"Nvidia announces new GPU",
"Tesla deliveries beat estimates",
"Amazon expands AWS services"
]
all_predictions = []
for headline in news_headlines:
preds = predictor.predict(headline, all_stocks)
all_predictions.append({
"headline": headline,
"predictions": preds
})
Custom Candidate Size
# Retrieve more candidates for better recall
predictions = predictor.predict(
title,
all_stocks,
top_k=10,
candidates_k=100 # Default: 50
)
π Model Card
| Component | Size | Description |
|---|---|---|
| Tokenizer | 35K vocab | Financial domain tokenizer |
| Title Encoder | 113M | Encodes news headlines |
| Stock Encoder | 113M | Encodes stock symbols |
| Reranker | 113M | Scores title-stock pairs |
| Total | ~340M params | Complete pipeline |
π Citation
@misc{stocky2025,
title={Stocky: Stock Prediction from News Headlines},
author={Stocky AI Team},
year={2025},
publisher={Hugging Face},
howpublished={\\url{https://huggingface.co/stocky-ai/stocky-stock-predictor}}
}
π License
Apache 2.0
π€ Contact
- Organization: stocky-ai
- Issues: Report bugs or request features on GitHub
Built with β€οΈ using PyTorch and Transformers
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support