Model Card

Model Description

This model is an AutoML tabular classification model trained using AutoGluon on a classmate's dataset hosted on Hugging Face. The task is to predict the Genre of a book based on its physical dimensions and page count.

Data

Dataset: Zion's Book tabular dataset from Hugging Face (its-zion-18/Books-tabular-dataset)
Splits: The dataset has 'original' and 'augmented' splits. The model was trained on the 'augmented' split (240 samples) and evaluated on the 'original' split (30 samples).
Features: Height, Width, Depth, Page Count
Target: Genre (a categorical variable with 5 classes)

Training

Framework: AutoGluon tabular
Training Data: df_synth_train (augmented split, 80% for training)
Time Limit: 300 seconds
Presets: best_quality
Evaluation Metric (during training): accuracy
AutoML Search: AutoGluon performed a search over various models and hyperparameters, including tree-based models (LightGBM, XGBoost, ExtraTrees, RandomForest) and neural networks (NeuralNetTorch, NeuralNetFastAI). Stacked ensembling was also used.
Best Model: The best model identified by AutoGluon was WeightedEnsemble_L2.

Evaluation

Evaluation Data: df_orig (original split)
Metrics:
- Accuracy: 1.0000
- Weighted F1 Score: 1.0000

Hyperparameters and Search Space

AutoGluon's best_quality preset explores a wide range of models and their default hyperparameters, with some tuning. The leaderboard displays the performance of individual models and the final ensemble.

Limitations and Ethical Considerations

The dataset is small, especially the original split used for final evaluation. The high accuracy on the original split might not generalize to a larger, more diverse dataset.
The model was trained on synthetic data which may not fully capture the nuances of real-world book dimensions and genres.
The model's performance is highly dependent on the quality and representativeness of the synthetic data.

License

MIT

Hardware and Compute

The model was trained on a Google Colab environment with the specified hardware (CPU and RAM details from the notebook environment).
The training was subject to a time limit of 300 seconds.

AI Usage Disclosure

This model was developed with the assistance of an AI agent to generate and execute code for data loading, preprocessing, model training using AutoGluon, and model evaluation.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

FaiyazAzam
/

24679-tabular-autolguon-predictor