ExponentialScience
/

LedgerBERT-Market-Sentiment

+---
+datasets:
+- ExponentialScience/DLT-Sentiment-News
+language:
+- en
+base_model:
+- ExponentialScience/LedgerBERT
+---
+# LedgerBERT-Market-Sentiment
+## Model Description
+### Model Summary
+LedgerBERT-Market-Sentiment is a fine-tuned version of LedgerBERT (https://huggingface.co/ExponentialScience/LedgerBERT) specialized for sentiment analysis of cryptocurrency and DLT-related content. The model classifies text into three market direction sentiment categories: **bullish** (positive market outlook), **bearish** (negative market outlook), and **neutral** (balanced or unclear market direction).
+This model is particularly effective for analyzing cryptocurrency news headlines, social media posts, and other DLT-related content where understanding market sentiment is important.
+- **Model type:** BERT-base encoder for sequence classification
+- **Language:** English
+- **License:** Creative Commons Attribution-NonCommercial 4.0 International (CC-BY-NC 4.0)
+- **Base model:** LedgerBERT (ExponentialScience/LedgerBERT)
+- **Fine-tuning dataset:** DLT-Sentiment-News (23,301 examples)
+- **Task:** Multi-class sentiment classification (3 classes)
+### Model Architecture
+- **Architecture:** BERT-base for sequence classification
+- **Parameters:** 110 million
+- **Hidden size:** 768
+- **Number of layers:** 12
+- **Attention heads:** 12
+- **Vocabulary size:** 30,522 (SciBERT vocabulary)
+- **Max sequence length:** 512 tokens
+- **Output:** 3-class logits (bullish, bearish, neutral)
+## Intended Uses
+### Primary Use Cases
+This model is designed for sentiment analysis tasks in the cryptocurrency and DLT domain:
+- **Market sentiment analysis**: Analyzing sentiment in cryptocurrency news articles, headlines, and market commentary
+- **Social media monitoring**: Understanding market direction sentiment in tweets, Reddit posts, and forum discussions
+- **News aggregation**: Automatically categorizing cryptocurrency news by market sentiment
+- **Research applications**: Studying sentiment trends and their relationship to market dynamics
+- **Content filtering**: Organizing DLT content based on market outlook
+### Example Applications
+```python
+# Analyzing news headlines
+"Bitcoin surges to new all-time high" → Bullish
+"Ethereum faces regulatory scrutiny" → Bearish
+"Stablecoin market remains stable" → Neutral
+# Social media sentiment
+"To the moon! 🚀" → Bullish
+"Another crypto winter incoming" → Bearish
+"Waiting for clear market direction" → Neutral
+```
+### Out-of-Scope Uses
+- **Investment decisions**: This model should NOT be used as the sole basis for making investment or trading decisions
+- **Financial advice**: Not suitable for providing personalized financial or investment recommendations
+- **Real-time trading**: Should not be used for automated high-frequency trading systems
+- **Market manipulation**: Must not be used to coordinate or facilitate market manipulation
+- **General sentiment analysis**: Optimized for market direction sentiment; may not perform well on general emotional sentiment
+## Training Details
+### Training Data
+The model was fine-tuned on the **DLT-Sentiment-News dataset**, which contains:
+- **Size:** 23,301 examples
+- **Tokens:** 1.85 million tokens (average 79.51 tokens per example)
+- **Temporal coverage:** January 2021 to May 2025
+- **Source:** CryptoPanic platform cryptocurrency news headlines and descriptions
+- **Labels:** Crowdsourced votes from active cryptocurrency community members
+- **Classification method:** Percentile-based labeling (25th and 75th percentiles as boundaries)
+**Label distribution by sentiment dimension:**
+- **Market Direction:** bullish, bearish, neutral
+The dataset provides domain expertise through crowdsourced annotations from cryptocurrency users, making the labels more relevant than general crowdworker annotations.
+**Note:** News articles are absent from the DLT-Corpus used to pre-train LedgerBERT, making this an out-of-domain generalization test that demonstrates the model's robust language understanding.
+For more details on the dataset used for tine-tuning, see: https://huggingface.co/datasets/ExponentialScience/DLT-Sentiment-News
+### Training Procedure
+**Fine-tuning hyperparameters:**
+- **Epochs:** 3
+- **Learning rate:** 2×10⁻⁵
+- **Warmup steps:** 500
+- **Batch size:** 8 per device (training and evaluation)
+- **Train/test split:** 90% training, 10% testing
+- **Optimizer:** AdamW with fused operations
+- **Precision:** bfloat16
+- **Max sequence length:** 512 tokens (tokenizer default)
+- **Truncation:** Enabled
+- **Padding:** Enabled
+## Limitations and Biases
+### Known Limitations
+- **Temporal lag**: Not suitable for real-time sentiment analysis; trained on historical data (2021-2025)
+- **Context dependency**: Headlines and descriptions lack full article context, which may affect sentiment interpretation
+- **Language coverage**: English only; does not support other languages
+- **Sarcasm and irony**: May struggle with nuanced language common in cryptocurrency discourse (e.g., "HFSP" - Have Fun Staying Poor)
+- **Evolving terminology**: Cryptocurrency memes and terminology evolve rapidly; may not capture newest slang
+- **Market volatility**: Sentiment can change rapidly after news publication; static predictions may become outdated quickly
+### Potential Biases
+The model may reflect biases present in the training data:
+- **Platform bias**: Data from CryptoPanic users only; may not represent broader market sentiment
+- **User bias**: Active crypto community members may have different perspectives than general investors
+- **Temporal bias**: Training data spans 2021-2025, reflecting specific market conditions (bull markets, bear markets, crypto winters)
+- **Source bias**: Certain news sources or cryptocurrencies may be over-represented in the training data
+- **Geographic bias**: English-language news sources are over-represented
+- **Market condition bias**: Dataset reflects specific market cycles that may not generalize to all conditions
+### Data Collection Biases
+- **Vote manipulation**: Despite quality filters, coordinated voting on the source platform cannot be completely ruled out
+- **Minimum vote threshold**: Filtering by median votes may exclude less popular but valid sentiment signals
+- **Percentile-based labeling**: Classification boundaries (25th/75th percentiles) are somewhat arbitrary
+## How to Use
+### Basic Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model and tokenizer
+model_name = "ExponentialScience/LedgerBERT-Market-Sentiment"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Example texts
+texts = [
+    "Bitcoin reaches new all-time high amid institutional adoption",
+    "SEC announces crackdown on cryptocurrency exchanges",
+    "Ethereum network upgrade proceeding as planned"
+]
+# Classify sentiment
+for text in texts:
+    inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True, padding=True)
+    with torch.no_grad():
+        outputs = model(**inputs)
+        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+        predicted_class = predictions.argmax(dim=-1).item()
+    # Map to labels (adjust based on your label mapping)
+    labels = ["bearish", "bullish", "neutral"]  # Order may vary
+    sentiment = labels[predicted_class]
+    confidence = predictions[0][predicted_class].item()
+    print(f"Text: {text}")
+    print(f"Sentiment: {sentiment} (confidence: {confidence:.3f})\n")
+```
+### Batch Processing
+```python
+from transformers import pipeline
+# Create sentiment analysis pipeline
+classifier = pipeline(
+    "text-classification",
+    model="ExponentialScience/LedgerBERT-Market-Sentiment",
+    tokenizer="ExponentialScience/LedgerBERT-Market-Sentiment"
+)
+# Process multiple texts
+texts = [
+    "DeFi protocol launches new staking mechanism",
+    "Major cryptocurrency exchange faces liquidity crisis",
+    "Blockchain adoption continues in enterprise sector"
+]
+results = classifier(texts, truncation=True, max_length=512)
+for text, result in zip(texts, results):
+    print(f"Text: {text}")
+    print(f"Sentiment: {result['label']} (score: {result['score']:.3f})\n")
+```
+### Integration with News Feeds
+```python
+import feedparser
+from transformers import pipeline
+# Initialize classifier
+classifier = pipeline(
+    "text-classification",
+    model="ExponentialScience/LedgerBERT-Market-Sentiment"
+)
+# Example: Analyze cryptocurrency news feed
+feed_url = "https://example-crypto-news.com/rss"
+feed = feedparser.parse(feed_url)
+for entry in feed.entries[:5]:  # Process first 5 entries
+    title = entry.title
+    result = classifier(title, truncation=True, max_length=512)[0]
+    print(f"Headline: {title}")
+    print(f"Market Sentiment: {result['label']} ({result['score']:.2%})")
+    print(f"Link: {entry.link}\n")
+```
+## Citation
+If you use LedgerBERT-Market-Sentiment in your research, please cite:
+```bibtex
+@article{hernandez2025dlt-corpus,
+  title={DLT-Corpus: A Large-Scale Text Collection for the Distributed Ledger Technology Domain},
+  author={Hernandez Cruz, Walter and Devine, Peter and Vadgama, Nikhil and Tasca, Paolo and Xu, Jiahua},
+  year={2025}
+}
+```
+## Related Resources
+- **Base Model (LedgerBERT)**: https://huggingface.co/ExponentialScience/LedgerBERT
+- **Training Dataset**: https://huggingface.co/datasets/ExponentialScience/DLT-Sentiment-News
+- **DLT-Corpus Collection**: https://huggingface.co/collections/ExponentialScience/dlt-corpus-68e44e40d4e7a3bd7a224402
+### Additional Fine-tuned Models
+LedgerBERT can also be fine-tuned for other sentiment dimensions available in the DLT-Sentiment-News dataset (https://huggingface.co/datasets/ExponentialScience/DLT-Sentiment-News):
+- **Content Characteristics** (liked, disliked, neutral)
+- **Engagement Quality** (important, lol, neutral)
+## Model Card Contact
+For questions or feedback about LedgerBERT-Market-Sentiment, please open an issue on the GitHub repository: https://github.com/dlt-science/DLT-Corpus
+---
+**⚠️ Important Disclaimer:** This model is provided for research and educational purposes only. It should not be used as financial advice or as the sole basis for investment decisions. Cryptocurrency markets are highly volatile and unpredictable. Always conduct your own research and consult with qualified financial advisors before making investment decisions.