Fine_Tuning_Dataset / README.md
Asimzaman19's picture
Add fine-tuned Titanic classifier with model card
d60dc2d verified
metadata
language: en
license: mit
tags:
  - classification
  - tabular
  - titanic
  - survival-prediction
  - pytorch
datasets:
  - titanic
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: Fine_Tuning_Dataset
    results:
      - task:
          type: tabular-classification
        dataset:
          name: Titanic
          type: titanic
        metrics:
          - type: accuracy
            value: 0.6111
          - type: f1
            value: 0
          - type: precision
            value: 0
          - type: recall
            value: 0

🚢 Titanic Survival Classifier

A lightweight MLP classifier wrapped in the Hugging Face PreTrainedModel interface, trained to predict passenger survival on the Titanic dataset.

Model description

Component Detail
Architecture 4-layer MLP with BatchNorm, GELU, Dropout
Hidden dim 128
Input features 13 engineered tabular features
Output Binary (survived / not survived)
Parameters ~12,578

Training details

Setting Value
Optimizer AdamW
Learning rate 0.001
Scheduler Cosine annealing
Epochs 30
Batch size 32
Train / Val / Test split 80 / 10 / 10 %

Feature engineering

Features used: Pclass, Sex, Age, SibSp, Parch, Fare, Embarked, HasCabin, FamilySize, IsAlone, AgeBand, FareBand, Title

Key transformations applied:

  • Title extraction from passenger names (Mr, Mrs, Miss, Master, Rare)
  • Age imputation using median per title group
  • FamilySize = SibSp + Parch + 1; IsAlone flag
  • HasCabin binary flag
  • AgeBand and FareBand discretisation
  • StandardScaler normalisation (params saved in scaler_params.json)

Test set performance

Metric Score
Accuracy 0.6111
Precision 0.0
Recall 0.0
F1-Score 0.0

How to use

import json, torch, numpy as np
from huggingface_hub import hf_hub_download
from transformers import PretrainedConfig, PreTrainedModel

REPO = "Asimzaman19/Fine_Tuning_Dataset"

# Load model
model = TitanicClassifier.from_pretrained(REPO)
model.eval()

# Load scaler params
params_path = hf_hub_download(REPO, "scaler_params.json")
with open(params_path) as f:
    sp = json.load(f)
mean  = np.array(sp["mean"])
scale = np.array(sp["scale"])

# Prepare a sample (must match FEATURES order)
raw = np.array([[3, 1, 22, 1, 0, 7.25, 0, 0, 2, 0, 1, 0, 0]], dtype=np.float32)
scaled = ((raw - mean) / scale).astype(np.float32)

with torch.no_grad():
    logits = model(torch.tensor(scaled)).logits
    pred   = logits.argmax(-1).item()
    prob   = torch.softmax(logits, dim=-1)[0, 1].item()

print(f"Survived: {bool(pred)} (prob={prob:.2%})")

Dataset

The Titanic dataset contains information about 891 passengers including demographics, ticket class, and fare — with the binary survival label as target.

Limitations

  • Trained on a small historical dataset (891 rows); performance may not generalise beyond the Titanic domain.
  • Features are hand-engineered; a more robust pipeline would use automated feature selection.

License

MIT