YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Got it! Here’s the English version of the Hugging Face–ready Model Card draft for your file PhailomXgboost_dm_model.pkl. I’ve preserved all the technical details but translated and refined for an international scientific audience.


🧾 Model Card β€” PhailomXgboost_dm_model

license: unknown           # TODO: choose a license (e.g., mit, apache-2.0, cc-by-4.0)
library_name: xgboost
tags:
  - xgboost
  - classification
  - tabular-data
  - healthcare
  - NCD
  - diabetes-risk
language:
  - en
  - th
model-index:
  - name: PhailomXgboost_dm_model
    results:
      - task:
          type: tabular-classification
        dataset:
          name: TODO-dataset-name
          type: private
          split: test
        metrics:
          - type: accuracy
            value: TODO
          - type: f1
            value: TODO
          - type: roc_auc
            value: TODO

πŸ“Œ Model Summary

PhailomXgboost_dm_model is an XGBoost classifier developed for early-stage screening of non-communicable diseases (NCDs), with a focus on diabetes risk prediction using community health screening data. The model outputs three classes: Normal, At-Risk, and Diabetic, making it suitable for cost-effective and rapid community-level health assessments.


🧠 Intended Use & Limitations

Intended use

  • Community-level health screening for diabetes/NCD risk.
  • Educational and research purposes (health data mining, public health informatics).
  • Integration into dashboards or lightweight apps (e.g., Streamlit, Hugging Face Spaces).

Not for

  • Direct clinical diagnosis.
  • Replacement for laboratory tests or medical professionals.

Limitations

  • Performance depends heavily on data quality (missing values, outliers).
  • Potential bias if the dataset is imbalanced across classes.
  • Threshold tuning is required to balance sensitivity and specificity for different contexts.

🧯 Ethical Considerations

  • Respect data privacy (PDPA/GDPR compliance).
  • Communicate clearly that this model is a screening tool, not a diagnostic system.
  • Regularly validate fairness across subgroups (gender, age, region).

πŸ—‚οΈ Data

  • Source: community health screening dataset (private, internal project).

  • Dataset size: ~3,418 records (balanced across Normal, At-Risk, Diabetic).

  • Example features:

    • Demographics: Age, Age group, Village, Screening date
    • Vitals: Systolic/diastolic blood pressure, Weight, Height, BMI
    • Contextual variables: Household or screening group identifiers

TODO: Fill in exact feature schema, units (e.g., mmHg, kg, cm), and preprocessing methods.


πŸ—οΈ Training Procedure

  • Model: XGBoost (tree-based gradient boosting), multi-class classification.

  • Objective: multi:softprob (multi-class probability prediction).

  • Preprocessing:

    • Missing values handled by imputation.
    • One-hot or ordinal encoding for categorical features.
    • Stratified split into training/validation/test.
  • Hyperparameters tuned: max_depth, learning_rate (eta), subsample, colsample_bytree, min_child_weight, n_estimators.

  • Evaluation Metrics: Accuracy, Macro-F1, ROC-AUC (One-vs-Rest).

TODO: Insert actual hyperparameters and results.


πŸ“ˆ Evaluation

Metric Test Set
Accuracy TODO
Macro F1 TODO
ROC-AUC (OVR) TODO

Confusion Matrix (example format)

            Pred:Normal  Pred:At-Risk  Pred:Diabetic
True:Normal      TODO        TODO          TODO
True:At-Risk     TODO        TODO          TODO
True:Diabetic    TODO        TODO          TODO

🧩 Input Schema

Expected columns must match the training pipeline order. Example schema from project context:

expected_columns = [
  "age_group", "record_id", "age", "village_no", "village_name", "screening_date",
  "bp_systolic", "bp_diastolic", "weight", "height",
  # ... add remaining features
]

TODO: Fill with the exact column list and datatypes.


πŸš€ Inference

1) Load from pickle file

import pickle, pandas as pd

with open("PhailomXgboost_dm_model.pkl", "rb") as f:
    model = pickle.load(f)

X = pd.DataFrame([{
    "age_group": "60-69",
    "record_id": 1,
    "age": 64,
    "village_no": 5,
    "village_name": "SampleVillage",
    "screening_date": "2025-07-01",
    "bp_systolic": 146,
    "bp_diastolic": 90,
    "weight": 68.0,
    "height": 160.0,
    # ... include all expected features
}], columns=expected_columns)

proba = model.predict_proba(X)[0]
pred  = model.classes_[proba.argmax()]
print(pred, proba)

2) Use XGBoost native format (recommended for HF)

model.get_booster().save_model("model.json")

βš™οΈ Environment & Reproducibility

  • Python: TODO
  • xgboost: TODO
  • scikit-learn: TODO
  • pandas/numpy: TODO
  • Random seed: 42

Attach:

  • requirements.txt
  • training script/preprocessing code
  • evaluation reports and figures

πŸ§ͺ Validation & Monitoring

  • Adjust classification thresholds for public health contexts.
  • Monitor drift when applied to new populations.
  • Revalidate if data collection tools change.

πŸ“£ Citation

TODO: Add references or project details for citation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support