Model Card
Model Description
This model is an AutoML tabular classification model trained using AutoGluon on a classmate's dataset hosted on Hugging Face. The task is to predict the Genre of a book based on its physical dimensions and page count.
Data
- Dataset: Zion's Book tabular dataset from Hugging Face (
its-zion-18/Books-tabular-dataset) - Splits: The dataset has 'original' and 'augmented' splits. The model was trained on the 'augmented' split (240 samples) and evaluated on the 'original' split (30 samples).
- Features:
Height,Width,Depth,Page Count - Target:
Genre(a categorical variable with 5 classes)
Training
- Framework: AutoGluon tabular
- Training Data:
df_synth_train(augmented split, 80% for training) - Time Limit: 300 seconds
- Presets:
best_quality - Evaluation Metric (during training): accuracy
- AutoML Search: AutoGluon performed a search over various models and hyperparameters, including tree-based models (LightGBM, XGBoost, ExtraTrees, RandomForest) and neural networks (NeuralNetTorch, NeuralNetFastAI). Stacked ensembling was also used.
- Best Model: The best model identified by AutoGluon was
WeightedEnsemble_L2.
Evaluation
- Evaluation Data:
df_orig(original split) - Metrics:
- Accuracy: 1.0000
- Weighted F1 Score: 1.0000
Hyperparameters and Search Space
AutoGluon's best_quality preset explores a wide range of models and their default hyperparameters, with some tuning. The leaderboard displays the performance of individual models and the final ensemble.
Limitations and Ethical Considerations
- The dataset is small, especially the original split used for final evaluation. The high accuracy on the original split might not generalize to a larger, more diverse dataset.
- The model was trained on synthetic data which may not fully capture the nuances of real-world book dimensions and genres.
- The model's performance is highly dependent on the quality and representativeness of the synthetic data.
License
MIT
Hardware and Compute
- The model was trained on a Google Colab environment with the specified hardware (CPU and RAM details from the notebook environment).
- The training was subject to a time limit of 300 seconds.
AI Usage Disclosure
This model was developed with the assistance of an AI agent to generate and execute code for data loading, preprocessing, model training using AutoGluon, and model evaluation.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support