Fine_Tuning_Dataset / README.md
Asimzaman19's picture
Add fine-tuned Titanic classifier with model card
d60dc2d verified
---
language: en
license: mit
tags:
- classification
- tabular
- titanic
- survival-prediction
- pytorch
datasets:
- titanic
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: Fine_Tuning_Dataset
results:
- task:
type: tabular-classification
dataset:
name: Titanic
type: titanic
metrics:
- type: accuracy
value: 0.6111
- type: f1
value: 0.0
- type: precision
value: 0.0
- type: recall
value: 0.0
---
# 🚢 Titanic Survival Classifier
A lightweight MLP classifier wrapped in the Hugging Face `PreTrainedModel` interface,
trained to predict passenger survival on the Titanic dataset.
## Model description
| Component | Detail |
|-----------|--------|
| Architecture | 4-layer MLP with BatchNorm, GELU, Dropout |
| Hidden dim | 128 |
| Input features | 13 engineered tabular features |
| Output | Binary (survived / not survived) |
| Parameters | ~12,578 |
## Training details
| Setting | Value |
|---------|-------|
| Optimizer | AdamW |
| Learning rate | 0.001 |
| Scheduler | Cosine annealing |
| Epochs | 30 |
| Batch size | 32 |
| Train / Val / Test split | 80 / 10 / 10 % |
## Feature engineering
Features used: `Pclass, Sex, Age, SibSp, Parch, Fare, Embarked, HasCabin, FamilySize, IsAlone, AgeBand, FareBand, Title`
Key transformations applied:
- **Title extraction** from passenger names (Mr, Mrs, Miss, Master, Rare)
- **Age imputation** using median per title group
- **FamilySize** = SibSp + Parch + 1; **IsAlone** flag
- **HasCabin** binary flag
- **AgeBand** and **FareBand** discretisation
- StandardScaler normalisation (params saved in `scaler_params.json`)
## Test set performance
| Metric | Score |
|--------|-------|
| Accuracy | 0.6111 |
| Precision | 0.0 |
| Recall | 0.0 |
| F1-Score | 0.0 |
## How to use
```python
import json, torch, numpy as np
from huggingface_hub import hf_hub_download
from transformers import PretrainedConfig, PreTrainedModel
REPO = "Asimzaman19/Fine_Tuning_Dataset"
# Load model
model = TitanicClassifier.from_pretrained(REPO)
model.eval()
# Load scaler params
params_path = hf_hub_download(REPO, "scaler_params.json")
with open(params_path) as f:
sp = json.load(f)
mean = np.array(sp["mean"])
scale = np.array(sp["scale"])
# Prepare a sample (must match FEATURES order)
raw = np.array([[3, 1, 22, 1, 0, 7.25, 0, 0, 2, 0, 1, 0, 0]], dtype=np.float32)
scaled = ((raw - mean) / scale).astype(np.float32)
with torch.no_grad():
logits = model(torch.tensor(scaled)).logits
pred = logits.argmax(-1).item()
prob = torch.softmax(logits, dim=-1)[0, 1].item()
print(f"Survived: {bool(pred)} (prob={prob:.2%})")
```
## Dataset
The [Titanic dataset](https://www.kaggle.com/competitions/titanic) contains
information about 891 passengers including demographics, ticket class, and
fare — with the binary survival label as target.
## Limitations
- Trained on a small historical dataset (891 rows); performance may not
generalise beyond the Titanic domain.
- Features are hand-engineered; a more robust pipeline would use automated
feature selection.
## License
MIT