File size: 3,361 Bytes
d60dc2d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | ---
language: en
license: mit
tags:
- classification
- tabular
- titanic
- survival-prediction
- pytorch
datasets:
- titanic
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: Fine_Tuning_Dataset
results:
- task:
type: tabular-classification
dataset:
name: Titanic
type: titanic
metrics:
- type: accuracy
value: 0.6111
- type: f1
value: 0.0
- type: precision
value: 0.0
- type: recall
value: 0.0
---
# 🚢 Titanic Survival Classifier
A lightweight MLP classifier wrapped in the Hugging Face `PreTrainedModel` interface,
trained to predict passenger survival on the Titanic dataset.
## Model description
| Component | Detail |
|-----------|--------|
| Architecture | 4-layer MLP with BatchNorm, GELU, Dropout |
| Hidden dim | 128 |
| Input features | 13 engineered tabular features |
| Output | Binary (survived / not survived) |
| Parameters | ~12,578 |
## Training details
| Setting | Value |
|---------|-------|
| Optimizer | AdamW |
| Learning rate | 0.001 |
| Scheduler | Cosine annealing |
| Epochs | 30 |
| Batch size | 32 |
| Train / Val / Test split | 80 / 10 / 10 % |
## Feature engineering
Features used: `Pclass, Sex, Age, SibSp, Parch, Fare, Embarked, HasCabin, FamilySize, IsAlone, AgeBand, FareBand, Title`
Key transformations applied:
- **Title extraction** from passenger names (Mr, Mrs, Miss, Master, Rare)
- **Age imputation** using median per title group
- **FamilySize** = SibSp + Parch + 1; **IsAlone** flag
- **HasCabin** binary flag
- **AgeBand** and **FareBand** discretisation
- StandardScaler normalisation (params saved in `scaler_params.json`)
## Test set performance
| Metric | Score |
|--------|-------|
| Accuracy | 0.6111 |
| Precision | 0.0 |
| Recall | 0.0 |
| F1-Score | 0.0 |
## How to use
```python
import json, torch, numpy as np
from huggingface_hub import hf_hub_download
from transformers import PretrainedConfig, PreTrainedModel
REPO = "Asimzaman19/Fine_Tuning_Dataset"
# Load model
model = TitanicClassifier.from_pretrained(REPO)
model.eval()
# Load scaler params
params_path = hf_hub_download(REPO, "scaler_params.json")
with open(params_path) as f:
sp = json.load(f)
mean = np.array(sp["mean"])
scale = np.array(sp["scale"])
# Prepare a sample (must match FEATURES order)
raw = np.array([[3, 1, 22, 1, 0, 7.25, 0, 0, 2, 0, 1, 0, 0]], dtype=np.float32)
scaled = ((raw - mean) / scale).astype(np.float32)
with torch.no_grad():
logits = model(torch.tensor(scaled)).logits
pred = logits.argmax(-1).item()
prob = torch.softmax(logits, dim=-1)[0, 1].item()
print(f"Survived: {bool(pred)} (prob={prob:.2%})")
```
## Dataset
The [Titanic dataset](https://www.kaggle.com/competitions/titanic) contains
information about 891 passengers including demographics, ticket class, and
fare — with the binary survival label as target.
## Limitations
- Trained on a small historical dataset (891 rows); performance may not
generalise beyond the Titanic domain.
- Features are hand-engineered; a more robust pipeline would use automated
feature selection.
## License
MIT
|