Upload XGBoost.zip

XGBoost + Engineered Features (Misinformation Detection in Engineering)

This model is an XGBoost classifier trained on a 12-dimensional engineered feature set for detecting AI-generated misinformation in engineering documents. It does not use Transformer embeddings, relying entirely on structured features.

Class 0: Real engineering documents

Class 1: AI-generated misinformation

Model Components

xgb_model.json → Serialized XGBoost model weights

scaler.pkl → Scikit-learn StandardScaler for feature normalization

Training Details

Features (12D):

Document length (characters, words, sentences)

Punctuation density

Readability proxies (Flesch–Kincaid, Gunning Fog)

Engineering/safety keyword ratios

Numeric feature counts (e.g., decimals, scientific notation)

Standard acronym detection (ISO, IEEE, ANSI, etc.)

Framework: XGBoost (tree-based gradient boosting)

Training split: EMC dataset (train/val/test)

Evaluation metric: Macro F1

Intended Uses

Lightweight detection of AI-generated misinformation in engineering contexts

Situations requiring interpretability and low computational cost

Deployment in resource-constrained environments

⚠️ Limitations:

Brittle under adversarial perturbations (e.g., synonym replacement, semantic cue masking)

Lower robustness compared to Transformer-based or fusion models

Only works with the engineered feature set; requires preprocessing pipeline

Files changed (1) hide show

XGBoost.zip +3 -0

XGBoost.zip ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e91de14cd53cd1ad621f2d30954fbbdfac21291e914d33810e721f4cc5044202
+size 730245