--- language: en license: mit tags: - classification - tabular - titanic - survival-prediction - pytorch datasets: - titanic metrics: - accuracy - f1 - precision - recall model-index: - name: Fine_Tuning_Dataset results: - task: type: tabular-classification dataset: name: Titanic type: titanic metrics: - type: accuracy value: 0.6111 - type: f1 value: 0.0 - type: precision value: 0.0 - type: recall value: 0.0 --- # 🚢 Titanic Survival Classifier A lightweight MLP classifier wrapped in the Hugging Face `PreTrainedModel` interface, trained to predict passenger survival on the Titanic dataset. ## Model description | Component | Detail | |-----------|--------| | Architecture | 4-layer MLP with BatchNorm, GELU, Dropout | | Hidden dim | 128 | | Input features | 13 engineered tabular features | | Output | Binary (survived / not survived) | | Parameters | ~12,578 | ## Training details | Setting | Value | |---------|-------| | Optimizer | AdamW | | Learning rate | 0.001 | | Scheduler | Cosine annealing | | Epochs | 30 | | Batch size | 32 | | Train / Val / Test split | 80 / 10 / 10 % | ## Feature engineering Features used: `Pclass, Sex, Age, SibSp, Parch, Fare, Embarked, HasCabin, FamilySize, IsAlone, AgeBand, FareBand, Title` Key transformations applied: - **Title extraction** from passenger names (Mr, Mrs, Miss, Master, Rare) - **Age imputation** using median per title group - **FamilySize** = SibSp + Parch + 1; **IsAlone** flag - **HasCabin** binary flag - **AgeBand** and **FareBand** discretisation - StandardScaler normalisation (params saved in `scaler_params.json`) ## Test set performance | Metric | Score | |--------|-------| | Accuracy | 0.6111 | | Precision | 0.0 | | Recall | 0.0 | | F1-Score | 0.0 | ## How to use ```python import json, torch, numpy as np from huggingface_hub import hf_hub_download from transformers import PretrainedConfig, PreTrainedModel REPO = "Asimzaman19/Fine_Tuning_Dataset" # Load model model = TitanicClassifier.from_pretrained(REPO) model.eval() # Load scaler params params_path = hf_hub_download(REPO, "scaler_params.json") with open(params_path) as f: sp = json.load(f) mean = np.array(sp["mean"]) scale = np.array(sp["scale"]) # Prepare a sample (must match FEATURES order) raw = np.array([[3, 1, 22, 1, 0, 7.25, 0, 0, 2, 0, 1, 0, 0]], dtype=np.float32) scaled = ((raw - mean) / scale).astype(np.float32) with torch.no_grad(): logits = model(torch.tensor(scaled)).logits pred = logits.argmax(-1).item() prob = torch.softmax(logits, dim=-1)[0, 1].item() print(f"Survived: {bool(pred)} (prob={prob:.2%})") ``` ## Dataset The [Titanic dataset](https://www.kaggle.com/competitions/titanic) contains information about 891 passengers including demographics, ticket class, and fare — with the binary survival label as target. ## Limitations - Trained on a small historical dataset (891 rows); performance may not generalise beyond the Titanic domain. - Features are hand-engineered; a more robust pipeline would use automated feature selection. ## License MIT