File size: 3,361 Bytes

d60dc2d

---

language: en
license: mit
tags:
  - classification
  - tabular
  - titanic
  - survival-prediction
  - pytorch
datasets:
  - titanic
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: Fine_Tuning_Dataset
    results:
      - task:
          type: tabular-classification
        dataset:
          name: Titanic
          type: titanic
        metrics:
          - type: accuracy
            value: 0.6111
          - type: f1
            value: 0.0
          - type: precision
            value: 0.0
          - type: recall
            value: 0.0
---


# 🚢 Titanic Survival Classifier

A lightweight MLP classifier wrapped in the Hugging Face `PreTrainedModel` interface,
trained to predict passenger survival on the Titanic dataset.

## Model description

| Component | Detail |
|-----------|--------|
| Architecture | 4-layer MLP with BatchNorm, GELU, Dropout |
| Hidden dim | 128 |
| Input features | 13 engineered tabular features |
| Output | Binary (survived / not survived) |
| Parameters | ~12,578 |

## Training details

| Setting | Value |
|---------|-------|
| Optimizer | AdamW |
| Learning rate | 0.001 |
| Scheduler | Cosine annealing |
| Epochs | 30 |
| Batch size | 32 |
| Train / Val / Test split | 80 / 10 / 10 % |

## Feature engineering

Features used: `Pclass, Sex, Age, SibSp, Parch, Fare, Embarked, HasCabin, FamilySize, IsAlone, AgeBand, FareBand, Title`

Key transformations applied:
- **Title extraction** from passenger names (Mr, Mrs, Miss, Master, Rare)
- **Age imputation** using median per title group
- **FamilySize** = SibSp + Parch + 1; **IsAlone** flag
- **HasCabin** binary flag
- **AgeBand** and **FareBand** discretisation
- StandardScaler normalisation (params saved in `scaler_params.json`)

## Test set performance

| Metric | Score |
|--------|-------|
| Accuracy | 0.6111 |
| Precision | 0.0 |
| Recall | 0.0 |
| F1-Score | 0.0 |

## How to use

```python

import json, torch, numpy as np

from huggingface_hub import hf_hub_download

from transformers import PretrainedConfig, PreTrainedModel



REPO = "Asimzaman19/Fine_Tuning_Dataset"



# Load model

model = TitanicClassifier.from_pretrained(REPO)

model.eval()



# Load scaler params

params_path = hf_hub_download(REPO, "scaler_params.json")

with open(params_path) as f:

    sp = json.load(f)

mean  = np.array(sp["mean"])

scale = np.array(sp["scale"])



# Prepare a sample (must match FEATURES order)

raw = np.array([[3, 1, 22, 1, 0, 7.25, 0, 0, 2, 0, 1, 0, 0]], dtype=np.float32)

scaled = ((raw - mean) / scale).astype(np.float32)



with torch.no_grad():

    logits = model(torch.tensor(scaled)).logits

    pred   = logits.argmax(-1).item()

    prob   = torch.softmax(logits, dim=-1)[0, 1].item()



print(f"Survived: {bool(pred)} (prob={prob:.2%})")

```

## Dataset

The [Titanic dataset](https://www.kaggle.com/competitions/titanic) contains
information about 891 passengers including demographics, ticket class, and
fare — with the binary survival label as target.

## Limitations

- Trained on a small historical dataset (891 rows); performance may not
  generalise beyond the Titanic domain.
- Features are hand-engineered; a more robust pipeline would use automated
  feature selection.

## License

MIT