Upload XGBoost.zip
Browse filesXGBoost + Engineered Features (Misinformation Detection in Engineering)
This model is an XGBoost classifier trained on a 12-dimensional engineered feature set for detecting AI-generated misinformation in engineering documents. It does not use Transformer embeddings, relying entirely on structured features.
Class 0: Real engineering documents
Class 1: AI-generated misinformation
Model Components
xgb_model.json → Serialized XGBoost model weights
scaler.pkl → Scikit-learn StandardScaler for feature normalization
Training Details
Features (12D):
Document length (characters, words, sentences)
Punctuation density
Readability proxies (Flesch–Kincaid, Gunning Fog)
Engineering/safety keyword ratios
Numeric feature counts (e.g., decimals, scientific notation)
Standard acronym detection (ISO, IEEE, ANSI, etc.)
Framework: XGBoost (tree-based gradient boosting)
Training split: EMC dataset (train/val/test)
Evaluation metric: Macro F1
Intended Uses
Lightweight detection of AI-generated misinformation in engineering contexts
Situations requiring interpretability and low computational cost
Deployment in resource-constrained environments
⚠️ Limitations:
Brittle under adversarial perturbations (e.g., synonym replacement, semantic cue masking)
Lower robustness compared to Transformer-based or fusion models
Only works with the engineered feature set; requires preprocessing pipeline
- XGBoost.zip +3 -0
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e91de14cd53cd1ad621f2d30954fbbdfac21291e914d33810e721f4cc5044202
|
| 3 |
+
size 730245
|