FuseChain: Ethereum Fraud Detection via Multimodal Signal Fusion

Model Summary

FuseChain is a multimodal supervised classification model for detecting fraudulent Ethereum Externally Owned Accounts (EOAs). It integrates on-chain transaction features with off-chain contextual signals from market data, Reddit, and Twitter to classify Ethereum addresses as scam or normal.

The model is an XGBoost classifier trained on a novel address-level dataset of 35,272 Ethereum EOAs, achieving an F1-score of 82.5% and an AUC of 96.1% on a stratified held-out test set — representing a 14.7 point F1 improvement over an on-chain only baseline.

Model Details

Property	Details
Model Type	XGBoost Classifier
Task	Binary Classification (Scam / Normal)
Input	31 address-level multimodal features
Output	Fraud probability score (0 to 1)
Classification Threshold	0.5
Explainability	TreeSHAP (per-prediction feature attribution)
Training Framework	XGBoost 2.x, Scikit-learn
Language	Python 3.10+

Performance

Test Set Results (Stratified 80/20 Split)

Metric	Normal	Scam	Overall
Precision	0.96	0.89	0.95
Recall	0.98	0.77	0.95
F1-Score	0.97	0.83	0.95
AUC-ROC	-	-	0.961
Accuracy	-	-	95%

Ablation Study Results

Configuration	Features	F1	AUC
On-Chain Only	14	0.678	0.919
On-Chain + Market	19	0.721	0.936
On-Chain + Market + Reddit	22	0.802	0.955
On-Chain + Market + Twitter	28	0.825	0.962
On-Chain + Market + Reddit + Twitter	31	0.825	0.961

Feature Set

The model was trained on 31 features across four modalities:

Modality	Features	Examples
On-Chain	14	`eth_net_flow_max`, `eth_recv_mean`, `burst_max_tx_5m_mean`, `active_days`
Twitter	9	`twitter_avg_retweets_mean`, `twitter_avg_positive_mean`, `twitter_fraud_mention_ratio_mean`
Market	5	`market_intraday_volatility_mean`, `market_daily_return_mean`
Reddit	3	`reddit_total_fraud_mentions_mean`, `reddit_avg_sentiment_mean`

For the full feature schema refer to address_features_metadata.json in this repository.

Global Modality Contribution (SHAP)

Modality	Contribution
On-Chain	58.6%
Twitter	25.8%
Reddit	10.8%
Market	4.8%

Most Discriminative Features per Modality

Modality	Top Feature
On-Chain	`eth_net_flow_max`
Twitter	`twitter_avg_retweets_mean`
Reddit	`reddit_total_fraud_mentions_mean`
Market	`market_intraday_volatility_mean`

Model Hyperparameters

Parameter	Value
n_estimators	200
max_depth	5
learning_rate	0.05
min_child_weight	3
subsample	0.8
colsample_bytree	0.8
Classification Threshold	0.5

Dataset

The FuseChain dataset used to train this model is publicly available on Hugging Face:

FuseChain Multimodal Ethereum Fraud Dataset

Citation

If you use this model or the FuseChain framework in your research, please cite:

@misc{fusechain2026,
  title={FuseChain: Ethereum Fraud Detection via Multimodal Signal Fusion},
  author={Fernando, Nileshka},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/datasets/Nileshka/fusechain-data}}
}

Related Resources

Dataset: FuseChain Multimodal Ethereum Fraud Dataset
Code Repository: FuseChain GitHub

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Nileshka
/

fusechain-model