KeepUp-multilingiual-ensemble-model

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5673
Accuracy: 0.7864

Model description

This is a custom PyTorch ensemble model that combines CNN and BiGRU with attention to classify text using both embeddings and numerical features.

🧠 Model Overview

Model Name: EnsembleSmallData
Framework: PyTorch
Purpose: Classification on small datasets with mixed data types (e.g., token embeddings + numeric features)

🏗️ Architecture

CNN branch for local sequence patterns
BiGRU + attention for temporal context
Fully connected layers for numeric inputs
Final fusion + classifier

📦 Inputs

embed_x: Tensor of shape (batch_size, seq_len, embed_dim)
numeric_x: Tensor of shape (batch_size, num_features)
Output: Logits (use argmax to get predicted class)

🔧 Configuration (config.json)

{json.dumps(config, indent=2)}


## 🔧 Model Details

- CNN + BiGRU (w/ attention)
- Embedding input + Numeric features
- BatchNorm, Dropout, Fully Connected layers

## 🧠 Intended Uses & Limitations

### ✅ Intended Uses

This model is best suited for:

- **Small-scale classification tasks** with limited data.
- Problems involving **both sequential (embedded)** and **numerical/tabular features**.
- **Binary or multi-class classification**, especially where traditional models struggle due to overfitting or underfitting.
- Scenarios where combining text-like sequences (e.g., tokens, timeseries) and structured features improves performance.

Example domains:
- Healthcare (e.g., patient text + vitals)
- Financial fraud (e.g., transaction embeddings + metadata)
- Product or customer classification

---

### ⚠️ Limitations

- **Limited generalization** to entirely new domains unless retrained.
- Assumes **balanced or moderately imbalanced data**. For high imbalance, add appropriate weighting or sampling techniques.
- Does **not include interpretability** or explainability modules out-of-the-box.

---

> For optimal performance, consider tuning dropout rates, hidden sizes, or the optimizer for your dataset.



## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10

### Training results

| Training Loss | Epoch | Validation Loss| Accuracy |
|:-------------:|:-----:|:--------------:|:--------:|
| 0.7371        | 1.0   |0.6684          | 0.5922   |
| 0.67          | 2.0   |0.6313          | 0.6214   |
| 0.6933        | 3.0   |0.6107          | 0.6596   |
| 0.6632        | 4.0   |0.5903          | 0.6990   |
| 0.6478        | 5.0   |0.5673          | 0.7164   |
| 0.6271        | 6.0   |0.5584          | 0.7222   |
| 0.6211        | 7.0   |0.5313          | 0.7514   |
| 0.6100        | 8.0   |0.5207          | 0.7696   |
| 0.5901        | 9.0   |0.4803          | 0.7890   |
| 0.5778        | 10.0  |0.4573          | 0.7891   |

### Framework versions

- Transformers 4.54.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.21.2

Downloads last month: 1

Safetensors

Model size

374k params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support