FuseChain: Ethereum Fraud Detection via Multimodal Signal Fusion

Model Summary

FuseChain is a multimodal supervised classification model for detecting fraudulent Ethereum Externally Owned Accounts (EOAs). It integrates on-chain transaction features with off-chain contextual signals from market data, Reddit, and Twitter to classify Ethereum addresses as scam or normal.

The model is an XGBoost classifier trained on a novel address-level dataset of 35,272 Ethereum EOAs, achieving an F1-score of 82.5% and an AUC of 96.1% on a stratified held-out test set โ€” representing a 14.7 point F1 improvement over an on-chain only baseline.


Model Details

Property Details
Model Type XGBoost Classifier
Task Binary Classification (Scam / Normal)
Input 31 address-level multimodal features
Output Fraud probability score (0 to 1)
Classification Threshold 0.5
Explainability TreeSHAP (per-prediction feature attribution)
Training Framework XGBoost 2.x, Scikit-learn
Language Python 3.10+

Performance

Test Set Results (Stratified 80/20 Split)

Metric Normal Scam Overall
Precision 0.96 0.89 0.95
Recall 0.98 0.77 0.95
F1-Score 0.97 0.83 0.95
AUC-ROC - - 0.961
Accuracy - - 95%

Ablation Study Results

Configuration Features F1 AUC
On-Chain Only 14 0.678 0.919
On-Chain + Market 19 0.721 0.936
On-Chain + Market + Reddit 22 0.802 0.955
On-Chain + Market + Twitter 28 0.825 0.962
On-Chain + Market + Reddit + Twitter 31 0.825 0.961

Feature Set

The model was trained on 31 features across four modalities:

Modality Features Examples
On-Chain 14 eth_net_flow_max, eth_recv_mean, burst_max_tx_5m_mean, active_days
Twitter 9 twitter_avg_retweets_mean, twitter_avg_positive_mean, twitter_fraud_mention_ratio_mean
Market 5 market_intraday_volatility_mean, market_daily_return_mean
Reddit 3 reddit_total_fraud_mentions_mean, reddit_avg_sentiment_mean

For the full feature schema refer to address_features_metadata.json in this repository.


Global Modality Contribution (SHAP)

Modality Contribution
On-Chain 58.6%
Twitter 25.8%
Reddit 10.8%
Market 4.8%

Most Discriminative Features per Modality

Modality Top Feature
On-Chain eth_net_flow_max
Twitter twitter_avg_retweets_mean
Reddit reddit_total_fraud_mentions_mean
Market market_intraday_volatility_mean

Model Hyperparameters

Parameter Value
n_estimators 200
max_depth 5
learning_rate 0.05
min_child_weight 3
subsample 0.8
colsample_bytree 0.8
Classification Threshold 0.5

Dataset

The FuseChain dataset used to train this model is publicly available on Hugging Face:

FuseChain Multimodal Ethereum Fraud Dataset

Citation

If you use this model or the FuseChain framework in your research, please cite:

@misc{fusechain2026,
  title={FuseChain: Ethereum Fraud Detection via Multimodal Signal Fusion},
  author={Fernando, Nileshka},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/datasets/Nileshka/fusechain-data}}
}

Related Resources

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Nileshka/fusechain-model 1