Model Card for {{ model_id | default("Model ID", true) }}

This is a fine tuned version of the RandomForestEntr_BAG_L1 model for classification. This was fine tuned on the EricCRX/books-tabular-datasetwhich is a dataset of the measurements of books. In this case, it was used for binary classification between softcover and hardcover books.

Model Details

Model Description

This model uses the RandomForestEntr_BAG_L1 with accuracy as the main parameter and multi class accuracy and cross entropy as the main hyperparameters. It also uses L1 regularization to reduce overfitting.

Developed by: Devin DeCosmo
Model type: Binary Classifier
Language(s) (NLP): English
License: MIT
Finetuned from model: RandomForestEntr_BAG_L1

Uses

This is used for classification of books as softcover or hardcover based on their measurements.

Out-of-Scope Use

If the dataset was expanded, this could be used to classify other types of books or a larger dataset.

Bias, Risks, and Limitations

This is trained off a small dataset of 30 original books and 300 augmented rows. This limited training dataset is liable to overfitting of the model and additional information is required to make it more robust.

Recommendations

The small dataset size means this model is not highly generalizable.

How to Get Started with the Model

Use the code below to get started with the model. This code is from the 24-679 Lecture on tabular datasets.

Download the zipped native predictor directory zip_local_path = huggingface_hub.hf_hub_download( repo_id=MODEL_REPO_ID, repo_type="model", filename="autogluon_predictor_dir.zip", local_dir=str(download_dir), local_dir_use_symlinks=False )

Unzip to a folder native_dir = download_dir / "predictor_dir" if native_dir.exists(): shutil.rmtree(native_dir) native_dir.mkdir(parents=True, exist_ok=True)

with zipfile.ZipFile(zip_local_path, "r") as zf: zf.extractall(str(native_dir))

Load native predictor predictor_native = autogluon.tabular.TabularPredictor.load(str(native_dir))

Inference on synthetic test X_test = df_synth_test.drop(columns=[TARGET_COL]) y_true = df_synth_test[TARGET_COL].reset_index(drop=True) y_pred = predictor_native.predict(X_test).reset_index(drop=True)

Combine results results = pandas.DataFrame({ "y_true": y_true, "y_pred": y_pred }) display(results)

Training Details

Training Data

EricCRX/books-tabular-dataset

This is the training dataset used. It consists of 30 original measurements used for validation along with 300 synthetic pieces of data from training.

Training Procedure

This model was trained with an AutoML process with accuracy as the main metrics. This model used a max time_limit of 300 seconds to reduce training time and "best_quality" to improve results

Training Hyperparameters

Training regime: {{ training_regime | default("[More Information Needed]", true)}}

Evaluation

Testing Data, Factors & Metrics

Testing Data

maryzhang/hw1-24679-image-dataset The testing data was the 'original' split, the 30 original images in this set.

Factors

This dataset is evaluating whether the books are hardcovers "1", or softcovers "0"

Metrics

The testing metric used was accuracy to ensure the highest accuracy of the model possible. Training time was also considered to ensure final models were not computationally infeasible.

Results

After training with the initial dataset, this model reached an accuracy of 97% in validation. It also had an individual prediction time of 0.12 seconds making it fast with a high accuracy.

This validation should not be taken as a metric for robustness. Due to the small dataset, this cannot be confirmed to work for outside mearements. Expanding this dataset could find issues or improvements to this model.

Summary

This model reached a high accuracy with our current model, but this perfomance can not be confirmed to continue as the dataset was very small.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

ddecosmo
/

classical_autoML_model