--- license: mit task_categories: - tabular-classification language: - en tags: - ethereum - fraud-detection - blockchain - multimodal - xgboost - shap - explainable-ai pretty_name: FuseChain Ethereum Fraud Detection Model --- # FuseChain: Ethereum Fraud Detection via Multimodal Signal Fusion ## Model Summary FuseChain is a multimodal supervised classification model for detecting fraudulent Ethereum Externally Owned Accounts (EOAs). It integrates on-chain transaction features with off-chain contextual signals from market data, Reddit, and Twitter to classify Ethereum addresses as scam or normal. The model is an **XGBoost classifier** trained on a novel address-level dataset of **35,272 Ethereum EOAs**, achieving an **F1-score of 82.5%** and an **AUC of 96.1%** on a stratified held-out test set — representing a **14.7 point F1 improvement** over an on-chain only baseline. --- ## Model Details | Property | Details | |---|---| | Model Type | XGBoost Classifier | | Task | Binary Classification (Scam / Normal) | | Input | 31 address-level multimodal features | | Output | Fraud probability score (0 to 1) | | Classification Threshold | 0.5 | | Explainability | TreeSHAP (per-prediction feature attribution) | | Training Framework | XGBoost 2.x, Scikit-learn | | Language | Python 3.10+ | --- ## Performance ### Test Set Results (Stratified 80/20 Split) | Metric | Normal | Scam | Overall | |---|---|---|---| | Precision | 0.96 | 0.89 | 0.95 | | Recall | 0.98 | 0.77 | 0.95 | | F1-Score | 0.97 | 0.83 | 0.95 | | AUC-ROC | - | - | 0.961 | | Accuracy | - | - | 95% | ### Ablation Study Results | Configuration | Features | F1 | AUC | |---|---|---|---| | On-Chain Only | 14 | 0.678 | 0.919 | | On-Chain + Market | 19 | 0.721 | 0.936 | | On-Chain + Market + Reddit | 22 | 0.802 | 0.955 | | On-Chain + Market + Twitter | 28 | 0.825 | 0.962 | | On-Chain + Market + Reddit + Twitter | 31 | 0.825 | 0.961 | --- ## Feature Set The model was trained on 31 features across four modalities: | Modality | Features | Examples | |---|---|---| | On-Chain | 14 | `eth_net_flow_max`, `eth_recv_mean`, `burst_max_tx_5m_mean`, `active_days` | | Twitter | 9 | `twitter_avg_retweets_mean`, `twitter_avg_positive_mean`, `twitter_fraud_mention_ratio_mean` | | Market | 5 | `market_intraday_volatility_mean`, `market_daily_return_mean` | | Reddit | 3 | `reddit_total_fraud_mentions_mean`, `reddit_avg_sentiment_mean` | For the full feature schema refer to `address_features_metadata.json` in this repository. --- ### Global Modality Contribution (SHAP) | Modality | Contribution | |---|---| | On-Chain | 58.6% | | Twitter | 25.8% | | Reddit | 10.8% | | Market | 4.8% | ### Most Discriminative Features per Modality | Modality | Top Feature | |---|---| | On-Chain | `eth_net_flow_max` | | Twitter | `twitter_avg_retweets_mean` | | Reddit | `reddit_total_fraud_mentions_mean` | | Market | `market_intraday_volatility_mean` | --- ## Model Hyperparameters | Parameter | Value | |---|---| | n_estimators | 200 | | max_depth | 5 | | learning_rate | 0.05 | | min_child_weight | 3 | | subsample | 0.8 | | colsample_bytree | 0.8 | | Classification Threshold | 0.5 | --- ## Dataset The FuseChain dataset used to train this model is publicly available on Hugging Face: [FuseChain Multimodal Ethereum Fraud Dataset](https://huggingface.co/datasets/Nileshka/fusechain-data) ## Citation If you use this model or the FuseChain framework in your research, please cite: ```bibtex @misc{fusechain2026, title={FuseChain: Ethereum Fraud Detection via Multimodal Signal Fusion}, author={Fernando, Nileshka}, year={2026}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/datasets/Nileshka/fusechain-data}} } ``` --- ## Related Resources - **Dataset:** [FuseChain Multimodal Ethereum Fraud Dataset](https://huggingface.co/datasets/Nileshka/fusechain-data) - **Code Repository:** [FuseChain GitHub](https://github.com/NileshFdo/FuseChain-FYP)