Add fine-tuned Titanic classifier with model card

d60dc2d verified about 1 month ago

3.36 kB

language: en
license: mit
tags:
  - classification
  - tabular
  - titanic
  - survival-prediction
  - pytorch
datasets:
  - titanic
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: Fine_Tuning_Dataset
    results:
      - task:
          type: tabular-classification
        dataset:
          name: Titanic
          type: titanic
        metrics:
          - type: accuracy
            value: 0.6111
          - type: f1
            value: 0
          - type: precision
            value: 0
          - type: recall
            value: 0

🚢 Titanic Survival Classifier

A lightweight MLP classifier wrapped in the Hugging Face PreTrainedModel interface, trained to predict passenger survival on the Titanic dataset.

Model description

Component	Detail
Architecture	4-layer MLP with BatchNorm, GELU, Dropout
Hidden dim	128
Input features	13 engineered tabular features
Output	Binary (survived / not survived)
Parameters	~12,578

Training details

Setting	Value
Optimizer	AdamW
Learning rate	0.001
Scheduler	Cosine annealing
Epochs	30
Batch size	32
Train / Val / Test split	80 / 10 / 10 %

Feature engineering

Features used: Pclass, Sex, Age, SibSp, Parch, Fare, Embarked, HasCabin, FamilySize, IsAlone, AgeBand, FareBand, Title

Key transformations applied:

Title extraction from passenger names (Mr, Mrs, Miss, Master, Rare)
Age imputation using median per title group
FamilySize = SibSp + Parch + 1; IsAlone flag
HasCabin binary flag
AgeBand and FareBand discretisation
StandardScaler normalisation (params saved in scaler_params.json)

Test set performance

Metric	Score
Accuracy	0.6111
Precision	0.0
Recall	0.0
F1-Score	0.0

How to use

import json, torch, numpy as np
from huggingface_hub import hf_hub_download
from transformers import PretrainedConfig, PreTrainedModel

REPO = "Asimzaman19/Fine_Tuning_Dataset"

# Load model
model = TitanicClassifier.from_pretrained(REPO)
model.eval()

# Load scaler params
params_path = hf_hub_download(REPO, "scaler_params.json")
with open(params_path) as f:
    sp = json.load(f)
mean  = np.array(sp["mean"])
scale = np.array(sp["scale"])

# Prepare a sample (must match FEATURES order)
raw = np.array([[3, 1, 22, 1, 0, 7.25, 0, 0, 2, 0, 1, 0, 0]], dtype=np.float32)
scaled = ((raw - mean) / scale).astype(np.float32)

with torch.no_grad():
    logits = model(torch.tensor(scaled)).logits
    pred   = logits.argmax(-1).item()
    prob   = torch.softmax(logits, dim=-1)[0, 1].item()

print(f"Survived: {bool(pred)} (prob={prob:.2%})")

Dataset

The Titanic dataset contains information about 891 passengers including demographics, ticket class, and fare — with the binary survival label as target.

Limitations

Trained on a small historical dataset (891 rows); performance may not generalise beyond the Titanic domain.
Features are hand-engineered; a more robust pipeline would use automated feature selection.

License

MIT