Upload Simple Fusion.zip
Browse filesSimple Fusion (XLM-RoBERTa + Engineered Features)
This model is a fusion architecture that combines XLM-RoBERTa embeddings with engineered linguistic and domain-specific features for misinformation detection in engineering texts.
Class 0: Real engineering documents
Class 1: AI-generated misinformation
Model Components
fusion_simple.pt → PyTorch model weights
scaler.pkl → Scikit-learn scaler for the 12-dimensional engineered feature set
Base encoder: xlm-roberta-base (mean-pooled hidden states)
Training Details
Fusion Mechanism: Naive concatenation of Transformer embeddings with scaled engineered features.
Engineered Features (12D): Include counts, readability proxies, punctuation density, engineering/safety keyword ratios, and numeric/standards signals.
Optimizer: AdamW
Sequence length: 256 tokens
Datasets: EMC Dataset
Intended Uses
Detecting AI-generated misinformation in engineering documents.
Serving as a hybrid baseline for comparing Transformer-only vs. feature-only vs. fusion models.
⚠️ Limitations:
Fusion is naive (simple concatenation), which our experiments showed is brittle under adversarial attacks.
Requires both raw text and engineered features to run inference.
Not directly plug-and-play like Transformer-only models.
- Simple Fusion.zip +3 -0
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7bec7b1e140bf97f0a15f78bdd443e1161a5c2cf05795cb62acd5bf7afe9bf43
|
| 3 |
+
size 814466443
|