Add fine-tuned Titanic classifier with model card
Browse files- README.md +130 -0
- config.json +11 -0
- feature_names.json +1 -0
- model.safetensors +3 -0
- scaler_params.json +47 -0
- test_metrics.json +6 -0
README.md
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- classification
|
| 6 |
+
- tabular
|
| 7 |
+
- titanic
|
| 8 |
+
- survival-prediction
|
| 9 |
+
- pytorch
|
| 10 |
+
datasets:
|
| 11 |
+
- titanic
|
| 12 |
+
metrics:
|
| 13 |
+
- accuracy
|
| 14 |
+
- f1
|
| 15 |
+
- precision
|
| 16 |
+
- recall
|
| 17 |
+
model-index:
|
| 18 |
+
- name: Fine_Tuning_Dataset
|
| 19 |
+
results:
|
| 20 |
+
- task:
|
| 21 |
+
type: tabular-classification
|
| 22 |
+
dataset:
|
| 23 |
+
name: Titanic
|
| 24 |
+
type: titanic
|
| 25 |
+
metrics:
|
| 26 |
+
- type: accuracy
|
| 27 |
+
value: 0.6111
|
| 28 |
+
- type: f1
|
| 29 |
+
value: 0.0
|
| 30 |
+
- type: precision
|
| 31 |
+
value: 0.0
|
| 32 |
+
- type: recall
|
| 33 |
+
value: 0.0
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
# 🚢 Titanic Survival Classifier
|
| 37 |
+
|
| 38 |
+
A lightweight MLP classifier wrapped in the Hugging Face `PreTrainedModel` interface,
|
| 39 |
+
trained to predict passenger survival on the Titanic dataset.
|
| 40 |
+
|
| 41 |
+
## Model description
|
| 42 |
+
|
| 43 |
+
| Component | Detail |
|
| 44 |
+
|-----------|--------|
|
| 45 |
+
| Architecture | 4-layer MLP with BatchNorm, GELU, Dropout |
|
| 46 |
+
| Hidden dim | 128 |
|
| 47 |
+
| Input features | 13 engineered tabular features |
|
| 48 |
+
| Output | Binary (survived / not survived) |
|
| 49 |
+
| Parameters | ~12,578 |
|
| 50 |
+
|
| 51 |
+
## Training details
|
| 52 |
+
|
| 53 |
+
| Setting | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Optimizer | AdamW |
|
| 56 |
+
| Learning rate | 0.001 |
|
| 57 |
+
| Scheduler | Cosine annealing |
|
| 58 |
+
| Epochs | 30 |
|
| 59 |
+
| Batch size | 32 |
|
| 60 |
+
| Train / Val / Test split | 80 / 10 / 10 % |
|
| 61 |
+
|
| 62 |
+
## Feature engineering
|
| 63 |
+
|
| 64 |
+
Features used: `Pclass, Sex, Age, SibSp, Parch, Fare, Embarked, HasCabin, FamilySize, IsAlone, AgeBand, FareBand, Title`
|
| 65 |
+
|
| 66 |
+
Key transformations applied:
|
| 67 |
+
- **Title extraction** from passenger names (Mr, Mrs, Miss, Master, Rare)
|
| 68 |
+
- **Age imputation** using median per title group
|
| 69 |
+
- **FamilySize** = SibSp + Parch + 1; **IsAlone** flag
|
| 70 |
+
- **HasCabin** binary flag
|
| 71 |
+
- **AgeBand** and **FareBand** discretisation
|
| 72 |
+
- StandardScaler normalisation (params saved in `scaler_params.json`)
|
| 73 |
+
|
| 74 |
+
## Test set performance
|
| 75 |
+
|
| 76 |
+
| Metric | Score |
|
| 77 |
+
|--------|-------|
|
| 78 |
+
| Accuracy | 0.6111 |
|
| 79 |
+
| Precision | 0.0 |
|
| 80 |
+
| Recall | 0.0 |
|
| 81 |
+
| F1-Score | 0.0 |
|
| 82 |
+
|
| 83 |
+
## How to use
|
| 84 |
+
|
| 85 |
+
```python
|
| 86 |
+
import json, torch, numpy as np
|
| 87 |
+
from huggingface_hub import hf_hub_download
|
| 88 |
+
from transformers import PretrainedConfig, PreTrainedModel
|
| 89 |
+
|
| 90 |
+
REPO = "Asimzaman19/Fine_Tuning_Dataset"
|
| 91 |
+
|
| 92 |
+
# Load model
|
| 93 |
+
model = TitanicClassifier.from_pretrained(REPO)
|
| 94 |
+
model.eval()
|
| 95 |
+
|
| 96 |
+
# Load scaler params
|
| 97 |
+
params_path = hf_hub_download(REPO, "scaler_params.json")
|
| 98 |
+
with open(params_path) as f:
|
| 99 |
+
sp = json.load(f)
|
| 100 |
+
mean = np.array(sp["mean"])
|
| 101 |
+
scale = np.array(sp["scale"])
|
| 102 |
+
|
| 103 |
+
# Prepare a sample (must match FEATURES order)
|
| 104 |
+
raw = np.array([[3, 1, 22, 1, 0, 7.25, 0, 0, 2, 0, 1, 0, 0]], dtype=np.float32)
|
| 105 |
+
scaled = ((raw - mean) / scale).astype(np.float32)
|
| 106 |
+
|
| 107 |
+
with torch.no_grad():
|
| 108 |
+
logits = model(torch.tensor(scaled)).logits
|
| 109 |
+
pred = logits.argmax(-1).item()
|
| 110 |
+
prob = torch.softmax(logits, dim=-1)[0, 1].item()
|
| 111 |
+
|
| 112 |
+
print(f"Survived: {bool(pred)} (prob={prob:.2%})")
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
## Dataset
|
| 116 |
+
|
| 117 |
+
The [Titanic dataset](https://www.kaggle.com/competitions/titanic) contains
|
| 118 |
+
information about 891 passengers including demographics, ticket class, and
|
| 119 |
+
fare — with the binary survival label as target.
|
| 120 |
+
|
| 121 |
+
## Limitations
|
| 122 |
+
|
| 123 |
+
- Trained on a small historical dataset (891 rows); performance may not
|
| 124 |
+
generalise beyond the Titanic domain.
|
| 125 |
+
- Features are hand-engineered; a more robust pipeline would use automated
|
| 126 |
+
feature selection.
|
| 127 |
+
|
| 128 |
+
## License
|
| 129 |
+
|
| 130 |
+
MIT
|
config.json
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"TitanicClassifier"
|
| 4 |
+
],
|
| 5 |
+
"dropout": 0.3,
|
| 6 |
+
"dtype": "float32",
|
| 7 |
+
"hidden_dim": 128,
|
| 8 |
+
"input_dim": 13,
|
| 9 |
+
"model_type": "titanic_mlp",
|
| 10 |
+
"transformers_version": "5.5.1"
|
| 11 |
+
}
|
feature_names.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked", "HasCabin", "FamilySize", "IsAlone", "AgeBand", "FareBand", "Title"]
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c76448052b452a154f85d63b3d33839ba4cd0b4e132956706adde0cc45550447
|
| 3 |
+
size 53232
|
scaler_params.json
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"mean": [
|
| 3 |
+
2.3047752808988764,
|
| 4 |
+
0.3553370786516854,
|
| 5 |
+
29.256558988745628,
|
| 6 |
+
0.526685393258427,
|
| 7 |
+
0.40308988764044945,
|
| 8 |
+
32.95399021968413,
|
| 9 |
+
0.376056338028169,
|
| 10 |
+
0.23174157303370788,
|
| 11 |
+
1.9297752808988764,
|
| 12 |
+
0.601123595505618,
|
| 13 |
+
2.0168539325842696,
|
| 14 |
+
1.5112359550561798,
|
| 15 |
+
0.75
|
| 16 |
+
],
|
| 17 |
+
"scale": [
|
| 18 |
+
0.8370491944896091,
|
| 19 |
+
0.4786153353027576,
|
| 20 |
+
13.464659259524808,
|
| 21 |
+
1.1202258108590086,
|
| 22 |
+
0.8238996767285812,
|
| 23 |
+
49.28614526814358,
|
| 24 |
+
0.6419523014908564,
|
| 25 |
+
0.4219448025056973,
|
| 26 |
+
1.635494961693865,
|
| 27 |
+
0.48966725276662787,
|
| 28 |
+
0.8609814043661498,
|
| 29 |
+
1.1267371808289228,
|
| 30 |
+
1.0436404520994895
|
| 31 |
+
],
|
| 32 |
+
"feature_names": [
|
| 33 |
+
"Pclass",
|
| 34 |
+
"Sex",
|
| 35 |
+
"Age",
|
| 36 |
+
"SibSp",
|
| 37 |
+
"Parch",
|
| 38 |
+
"Fare",
|
| 39 |
+
"Embarked",
|
| 40 |
+
"HasCabin",
|
| 41 |
+
"FamilySize",
|
| 42 |
+
"IsAlone",
|
| 43 |
+
"AgeBand",
|
| 44 |
+
"FareBand",
|
| 45 |
+
"Title"
|
| 46 |
+
]
|
| 47 |
+
}
|
test_metrics.json
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"accuracy": 0.6111,
|
| 3 |
+
"precision": 0.0,
|
| 4 |
+
"recall": 0.0,
|
| 5 |
+
"f1": 0.0
|
| 6 |
+
}
|