KeepUp-multilingiual-ensemble-model
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5673
- Accuracy: 0.7864
Model description
This is a custom PyTorch ensemble model that combines CNN and BiGRU with attention to classify text using both embeddings and numerical features.
π§ Model Overview
Model Name: EnsembleSmallData
Framework: PyTorch
Purpose: Classification on small datasets with mixed data types (e.g., token embeddings + numeric features)
ποΈ Architecture
- CNN branch for local sequence patterns
- BiGRU + attention for temporal context
- Fully connected layers for numeric inputs
- Final fusion + classifier
π¦ Inputs
embed_x: Tensor of shape(batch_size, seq_len, embed_dim)numeric_x: Tensor of shape(batch_size, num_features)- Output: Logits (use
argmaxto get predicted class)
π§ Configuration (config.json)
{json.dumps(config, indent=2)}
## π§ Model Details
- CNN + BiGRU (w/ attention)
- Embedding input + Numeric features
- BatchNorm, Dropout, Fully Connected layers
## π§ Intended Uses & Limitations
### β
Intended Uses
This model is best suited for:
- **Small-scale classification tasks** with limited data.
- Problems involving **both sequential (embedded)** and **numerical/tabular features**.
- **Binary or multi-class classification**, especially where traditional models struggle due to overfitting or underfitting.
- Scenarios where combining text-like sequences (e.g., tokens, timeseries) and structured features improves performance.
Example domains:
- Healthcare (e.g., patient text + vitals)
- Financial fraud (e.g., transaction embeddings + metadata)
- Product or customer classification
---
### β οΈ Limitations
- **Limited generalization** to entirely new domains unless retrained.
- Assumes **balanced or moderately imbalanced data**. For high imbalance, add appropriate weighting or sampling techniques.
- Does **not include interpretability** or explainability modules out-of-the-box.
---
> For optimal performance, consider tuning dropout rates, hidden sizes, or the optimizer for your dataset.
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10
### Training results
| Training Loss | Epoch | Validation Loss| Accuracy |
|:-------------:|:-----:|:--------------:|:--------:|
| 0.7371 | 1.0 |0.6684 | 0.5922 |
| 0.67 | 2.0 |0.6313 | 0.6214 |
| 0.6933 | 3.0 |0.6107 | 0.6596 |
| 0.6632 | 4.0 |0.5903 | 0.6990 |
| 0.6478 | 5.0 |0.5673 | 0.7164 |
| 0.6271 | 6.0 |0.5584 | 0.7222 |
| 0.6211 | 7.0 |0.5313 | 0.7514 |
| 0.6100 | 8.0 |0.5207 | 0.7696 |
| 0.5901 | 9.0 |0.4803 | 0.7890 |
| 0.5778 | 10.0 |0.4573 | 0.7891 |
### Framework versions
- Transformers 4.54.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.21.2
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support