useful_charts_table_text_images_vs_useless_images_classifier

This model is a fine-tuned version of google/vit-base-patch16-224-in21k on the codewithaman/useful_charts_table_text_images_vs_useless_images dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0851
  • Accuracy: 0.9853

To use the model

from transformers import ViTFeatureExtractor, ViTForImageClassification
from PIL import Image
import torch

# Define the device to run the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Inference on device: {device}")

# Load the feature extractor and model
model_name_or_path = 'codewithaman/useful_charts_table_text_images_vs_useless_images_classifier'
feature_extractor = ViTFeatureExtractor.from_pretrained(model_name_or_path)
model = ViTForImageClassification.from_pretrained(model_name_or_path).to(device)

# Load local image
def load_image_from_path(image_path):
    image = Image.open(image_path)
    return image.convert("RGB").resize((224, 224))

# Define the inference function
def classify_image(image):
    # Prepare image for the model
    inputs = feature_extractor(images=image, return_tensors="pt").to(device)
    
    # Make prediction
    with torch.no_grad():
        outputs = model(**inputs)
    
    # Extract the predicted label
    predicted_class = outputs.logits.argmax(-1).item()
    label = model.config.id2label[predicted_class]
    return label

# Example usage
image_path = "path/to/your/image.jpg"  # Replace with your local image path
image = load_image_from_path(image_path)
predicted_label = classify_image(image)

print(f"Predicted label: {predicted_label}")

Model description

This model is a Vision Transformer (ViT)-based image classifier fine-tuned on a dataset of images categorized as "useful charts with text" and "useless images." It leverages the google/vit-base-patch16-224-in21k model as its backbone, benefiting from pre-trained weights on a large corpus of general images. This architecture allows the model to capture detailed visual features that distinguish between the two classes effectively, making it particularly useful for identifying informative visual content.

The model takes an image as input and classifies it into one of the specified categories. Its feature extractor processes images into a format compatible with the ViT model, which uses self-attention to understand spatial relationships within images. The model has been optimized for accuracy in distinguishing images based on their content's relevance, focusing on high-level visual features.

Intended uses & limitations

Intended Uses

  • Image Classification for Educational Content: Useful for identifying visually rich, informative charts and tables, which can assist in content moderation or educational material curation.
  • Content Filtering: Can be used to filter out irrelevant or "useless" images in large datasets where only informational images are desired.
  • Dataset Augmentation: Helpful in creating cleaner datasets by selecting images with specific content types, particularly in educational or training datasets.

Limitations

  • Generalizability: This model is specifically fine-tuned on images labeled as either useful charts with text or useless images. It may not generalize well to other types of image classification tasks.
  • Resolution and Size Constraints: The model's architecture is designed for images resized to 224x224 pixels, so images of significantly different resolutions may affect performance.
  • Content-specific Accuracy: Since this model is trained on a specific dataset, it may misclassify images that do not closely resemble the training data (e.g., abstract or artistic images).
  • Sensitive Information: This model does not have filters for detecting sensitive or inappropriate content; manual filtering may be required if sensitive content is expected.

Training and evaluation data

The model was trained on the codewithaman/useful_charts_table_text_images_vs_useless_images dataset from the Hugging Face Hub. The dataset contains two main classes:

  • Useful Charts and Tables with Text: Images that contain structured, informative visuals like charts, graphs, and tables, often with textual information relevant for educational or informative purposes.
  • Useless Images: Images that lack informative content or visual structure useful for educational or analytical purposes.

The training data includes transformations to resize and normalize images, ensuring they are compatible with the ViT model’s input requirements. The evaluation process was carried out on a validation subset, assessing model accuracy and reliability in classifying images into the appropriate categories.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 4
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.8814 0.0203 100 0.9432 0.7601
0.4374 0.0405 200 0.4927 0.8864
0.0042 0.0608 300 0.3534 0.9267
0.0093 0.0811 400 0.2335 0.9414
0.125 0.1013 500 0.3630 0.9286
0.4924 0.1216 600 0.2374 0.9469
0.0052 0.1419 700 0.2015 0.9487
0.3738 0.1621 800 0.4200 0.8864
0.4533 0.1824 900 0.2573 0.9286
0.027 0.2027 1000 0.3408 0.9121
0.6685 0.2229 1100 0.3140 0.8260
0.0703 0.2432 1200 0.2425 0.9322
0.9411 0.2635 1300 0.7809 0.8223
0.4378 0.2837 1400 0.6968 0.8223
0.7127 0.3040 1500 0.3294 0.8242
0.9465 0.3243 1600 0.4913 0.8223
0.3834 0.3445 1700 0.2594 0.9048
0.6691 0.3648 1800 0.3537 0.8993
0.3002 0.3851 1900 0.2502 0.9286
0.0473 0.4054 2000 0.2312 0.9322
0.634 0.4256 2100 0.2406 0.9359
0.4471 0.4459 2200 0.2983 0.9377
0.3229 0.4662 2300 0.3601 0.9212
0.4769 0.4864 2400 0.2990 0.9011
0.0135 0.5067 2500 0.3134 0.9029
0.3025 0.5270 2600 0.1748 0.9505
0.0114 0.5472 2700 0.2898 0.9212
0.1636 0.5675 2800 0.2281 0.9396
0.7427 0.5878 2900 0.2334 0.9341
0.0083 0.6080 3000 0.2466 0.9359
0.0041 0.6283 3100 0.2737 0.9432
1.7268 0.6486 3200 0.2626 0.9396
0.0115 0.6688 3300 0.2621 0.9304
0.6196 0.6891 3400 0.3546 0.9267
0.0141 0.7094 3500 0.2064 0.9505
0.006 0.7296 3600 0.2204 0.9487
0.0226 0.7499 3700 0.2544 0.9451
0.0084 0.7702 3800 0.1698 0.9542
0.0035 0.7904 3900 0.2541 0.9304
0.0137 0.8107 4000 0.1235 0.9670
0.9026 0.8310 4100 0.3319 0.9249
0.4531 0.8512 4200 0.2221 0.9414
0.0039 0.8715 4300 0.1823 0.9560
1.3298 0.8918 4400 0.2125 0.9542
0.4403 0.9120 4500 0.4900 0.8938
0.0025 0.9323 4600 0.3010 0.9249
0.0056 0.9526 4700 0.2978 0.9267
0.3642 0.9728 4800 0.2162 0.9451
0.5704 0.9931 4900 0.2459 0.9414
0.1761 1.0134 5000 0.1674 0.9652
0.0023 1.0336 5100 0.1855 0.9542
0.1477 1.0539 5200 0.1516 0.9652
0.0034 1.0742 5300 0.8117 0.7326
0.4936 1.0944 5400 0.2102 0.9377
0.0158 1.1147 5500 0.1886 0.9524
0.0041 1.1350 5600 0.2544 0.9286
0.7993 1.1552 5700 0.2523 0.9304
0.6292 1.1755 5800 0.1681 0.9451
0.0048 1.1958 5900 0.2746 0.9377
0.4908 1.2161 6000 0.3194 0.9359
0.4156 1.2363 6100 0.1320 0.9744
0.0056 1.2566 6200 0.3195 0.8993
0.0013 1.2769 6300 0.1581 0.9615
0.0027 1.2971 6400 0.2660 0.9414
0.1753 1.3174 6500 0.1858 0.9560
0.0013 1.3377 6600 0.2018 0.9615
0.0033 1.3579 6700 0.1475 0.9707
0.0037 1.3782 6800 0.1417 0.9689
1.2775 1.3985 6900 0.1101 0.9670
0.0051 1.4187 7000 0.1292 0.9707
0.4954 1.4390 7100 0.2473 0.9469
0.1533 1.4593 7200 0.1181 0.9707
0.0022 1.4795 7300 0.1512 0.9707
0.005 1.4998 7400 0.1329 0.9670
0.4396 1.5201 7500 0.1219 0.9725
0.0044 1.5403 7600 0.1665 0.9670
0.7054 1.5606 7700 0.1652 0.9670
0.4057 1.5809 7800 0.1683 0.9542
0.011 1.6011 7900 0.3927 0.9286
0.7 1.6214 8000 0.0999 0.9762
0.0026 1.6417 8100 0.1249 0.9744
0.002 1.6619 8200 0.1386 0.9615
0.0041 1.6822 8300 0.1175 0.9670
0.0034 1.7025 8400 0.1160 0.9725
0.0041 1.7227 8500 0.2097 0.9542
0.3303 1.7430 8600 0.1527 0.9597
0.006 1.7633 8700 0.1389 0.9670
0.0012 1.7835 8800 0.1799 0.9597
0.0027 1.8038 8900 0.1717 0.9615
0.4926 1.8241 9000 0.1517 0.9670
0.0023 1.8443 9100 0.1272 0.9744
0.5028 1.8646 9200 0.1444 0.9725
0.0051 1.8849 9300 0.1276 0.9744
0.0019 1.9051 9400 0.1550 0.9689
0.0052 1.9254 9500 0.1958 0.9634
0.0099 1.9457 9600 0.1359 0.9689
0.3494 1.9660 9700 0.1969 0.9542
0.0035 1.9862 9800 0.1671 0.9579
0.0025 2.0065 9900 0.1435 0.9707
0.0006 2.0268 10000 0.1187 0.9799
0.0035 2.0470 10100 0.1303 0.9780
0.7492 2.0673 10200 0.1294 0.9762
0.0154 2.0876 10300 0.1108 0.9762
0.0007 2.1078 10400 0.2675 0.9487
0.0008 2.1281 10500 0.1334 0.9689
0.003 2.1484 10600 0.1583 0.9670
0.4043 2.1686 10700 0.1198 0.9780
0.0016 2.1889 10800 0.1130 0.9799
0.0033 2.2092 10900 0.1102 0.9762
1.0287 2.2294 11000 0.1053 0.9762
0.3159 2.2497 11100 0.1004 0.9780
0.0464 2.2700 11200 0.1181 0.9762
0.002 2.2902 11300 0.2652 0.9560
0.0758 2.3105 11400 0.1413 0.9725
0.0027 2.3308 11500 0.2025 0.9451
0.0011 2.3510 11600 0.1372 0.9725
0.0009 2.3713 11700 0.1458 0.9725
0.4178 2.3916 11800 0.1403 0.9725
0.0028 2.4118 11900 0.1406 0.9725
0.0009 2.4321 12000 0.1295 0.9725
0.002 2.4524 12100 0.1685 0.9670
0.0022 2.4726 12200 0.1151 0.9744
0.0008 2.4929 12300 0.1635 0.9689
0.0035 2.5132 12400 0.1283 0.9744
0.7689 2.5334 12500 0.1551 0.9689
0.0126 2.5537 12600 0.1144 0.9762
0.0028 2.5740 12700 0.0919 0.9835
0.0053 2.5942 12800 0.1132 0.9762
0.0018 2.6145 12900 0.0851 0.9853
0.0014 2.6348 13000 0.1095 0.9780
0.0017 2.6550 13100 0.0878 0.9817
0.0014 2.6753 13200 0.1322 0.9762
0.0015 2.6956 13300 0.1059 0.9799
0.0036 2.7158 13400 0.0927 0.9817
0.0051 2.7361 13500 0.1009 0.9799
0.0028 2.7564 13600 0.1680 0.9670
0.6951 2.7767 13700 0.2497 0.9487
0.0096 2.7969 13800 0.1138 0.9780
0.5063 2.8172 13900 0.1151 0.9744
0.0026 2.8375 14000 0.1179 0.9762
0.0041 2.8577 14100 0.1266 0.9744
0.0019 2.8780 14200 0.0998 0.9780
0.0038 2.8983 14300 0.1290 0.9652
0.0131 2.9185 14400 0.1998 0.9414
0.0037 2.9388 14500 0.1214 0.9634
0.2382 2.9591 14600 0.1097 0.9780
0.0021 2.9793 14700 0.1152 0.9780
0.002 2.9996 14800 0.1001 0.9799
0.0027 3.0199 14900 0.1291 0.9780
0.971 3.0401 15000 0.1617 0.9689
0.0024 3.0604 15100 0.1245 0.9707
0.0172 3.0807 15200 0.1246 0.9725
0.0016 3.1009 15300 0.1628 0.9634
0.0016 3.1212 15400 0.1621 0.9634
0.0005 3.1415 15500 0.1104 0.9762
0.3195 3.1617 15600 0.1447 0.9725
2.3502 3.1820 15700 0.1827 0.9652
0.4252 3.2023 15800 0.1077 0.9762
0.0042 3.2225 15900 0.1431 0.9707
1.0207 3.2428 16000 0.1287 0.9744
0.5064 3.2631 16100 0.1663 0.9689
0.0018 3.2833 16200 0.1327 0.9725
0.0006 3.3036 16300 0.1163 0.9762
0.0039 3.3239 16400 0.1413 0.9725
0.5045 3.3441 16500 0.1572 0.9689
0.0069 3.3644 16600 0.1553 0.9670
0.0058 3.3847 16700 0.1022 0.9780
0.006 3.4049 16800 0.0993 0.9780
0.002 3.4252 16900 0.0954 0.9799
0.0082 3.4455 17000 0.0976 0.9762
0.0029 3.4657 17100 0.0978 0.9780
0.0008 3.4860 17200 0.0973 0.9799
0.0014 3.5063 17300 0.0979 0.9799
0.0008 3.5266 17400 0.1151 0.9744
0.0023 3.5468 17500 0.1093 0.9780
0.0012 3.5671 17600 0.0996 0.9799
0.0016 3.5874 17700 0.0980 0.9817
0.0015 3.6076 17800 0.1052 0.9799
0.0018 3.6279 17900 0.1054 0.9799
0.003 3.6482 18000 0.1052 0.9780
0.002 3.6684 18100 0.1063 0.9799
0.0011 3.6887 18200 0.1195 0.9762
0.4766 3.7090 18300 0.0873 0.9835
0.0026 3.7292 18400 0.0876 0.9835
0.0006 3.7495 18500 0.0942 0.9835
0.0014 3.7698 18600 0.0944 0.9835
0.0013 3.7900 18700 0.0972 0.9817
0.0016 3.8103 18800 0.1044 0.9817
0.0009 3.8306 18900 0.1039 0.9799
0.0008 3.8508 19000 0.0976 0.9817
0.0005 3.8711 19100 0.0969 0.9835
0.0009 3.8914 19200 0.0964 0.9835
0.0005 3.9116 19300 0.1020 0.9799
0.5488 3.9319 19400 0.0986 0.9817
0.0014 3.9522 19500 0.0963 0.9835
0.001 3.9724 19600 0.1037 0.9799
0.0009 3.9927 19700 0.1045 0.9799

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cpu
  • Datasets 2.11.0
  • Tokenizers 0.20.3
Downloads last month
36
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codewithaman/useful_charts_table_text_images_vs_useless_images_classifier

Finetuned
(2492)
this model