useful_charts_table_text_images_vs_useless_images_classifier

This model is a fine-tuned version of google/vit-base-patch16-224-in21k on the codewithaman/useful_charts_table_text_images_vs_useless_images dataset. It achieves the following results on the evaluation set:

Loss: 0.0851
Accuracy: 0.9853

To use the model

from transformers import ViTFeatureExtractor, ViTForImageClassification
from PIL import Image
import torch

# Define the device to run the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Inference on device: {device}")

# Load the feature extractor and model
model_name_or_path = 'codewithaman/useful_charts_table_text_images_vs_useless_images_classifier'
feature_extractor = ViTFeatureExtractor.from_pretrained(model_name_or_path)
model = ViTForImageClassification.from_pretrained(model_name_or_path).to(device)

# Load local image
def load_image_from_path(image_path):
    image = Image.open(image_path)
    return image.convert("RGB").resize((224, 224))

# Define the inference function
def classify_image(image):
    # Prepare image for the model
    inputs = feature_extractor(images=image, return_tensors="pt").to(device)
    
    # Make prediction
    with torch.no_grad():
        outputs = model(**inputs)
    
    # Extract the predicted label
    predicted_class = outputs.logits.argmax(-1).item()
    label = model.config.id2label[predicted_class]
    return label

# Example usage
image_path = "path/to/your/image.jpg"  # Replace with your local image path
image = load_image_from_path(image_path)
predicted_label = classify_image(image)

print(f"Predicted label: {predicted_label}")

Model description

This model is a Vision Transformer (ViT)-based image classifier fine-tuned on a dataset of images categorized as "useful charts with text" and "useless images." It leverages the google/vit-base-patch16-224-in21k model as its backbone, benefiting from pre-trained weights on a large corpus of general images. This architecture allows the model to capture detailed visual features that distinguish between the two classes effectively, making it particularly useful for identifying informative visual content.

The model takes an image as input and classifies it into one of the specified categories. Its feature extractor processes images into a format compatible with the ViT model, which uses self-attention to understand spatial relationships within images. The model has been optimized for accuracy in distinguishing images based on their content's relevance, focusing on high-level visual features.

Intended uses & limitations

Intended Uses

Image Classification for Educational Content: Useful for identifying visually rich, informative charts and tables, which can assist in content moderation or educational material curation.
Content Filtering: Can be used to filter out irrelevant or "useless" images in large datasets where only informational images are desired.
Dataset Augmentation: Helpful in creating cleaner datasets by selecting images with specific content types, particularly in educational or training datasets.

Limitations

Generalizability: This model is specifically fine-tuned on images labeled as either useful charts with text or useless images. It may not generalize well to other types of image classification tasks.
Resolution and Size Constraints: The model's architecture is designed for images resized to 224x224 pixels, so images of significantly different resolutions may affect performance.
Content-specific Accuracy: Since this model is trained on a specific dataset, it may misclassify images that do not closely resemble the training data (e.g., abstract or artistic images).
Sensitive Information: This model does not have filters for detecting sensitive or inappropriate content; manual filtering may be required if sensitive content is expected.

Training and evaluation data

The model was trained on the codewithaman/useful_charts_table_text_images_vs_useless_images dataset from the Hugging Face Hub. The dataset contains two main classes:

Useful Charts and Tables with Text: Images that contain structured, informative visuals like charts, graphs, and tables, often with textual information relevant for educational or informative purposes.
Useless Images: Images that lack informative content or visual structure useful for educational or analytical purposes.

The training data includes transformations to resize and normalize images, ensuring they are compatible with the ViT model’s input requirements. The evaluation process was carried out on a validation subset, assessing model accuracy and reliability in classifying images into the appropriate categories.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 4
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.8814	0.0203	100	0.9432	0.7601
0.4374	0.0405	200	0.4927	0.8864
0.0042	0.0608	300	0.3534	0.9267
0.0093	0.0811	400	0.2335	0.9414
0.125	0.1013	500	0.3630	0.9286
0.4924	0.1216	600	0.2374	0.9469
0.0052	0.1419	700	0.2015	0.9487
0.3738	0.1621	800	0.4200	0.8864
0.4533	0.1824	900	0.2573	0.9286
0.027	0.2027	1000	0.3408	0.9121
0.6685	0.2229	1100	0.3140	0.8260
0.0703	0.2432	1200	0.2425	0.9322
0.9411	0.2635	1300	0.7809	0.8223
0.4378	0.2837	1400	0.6968	0.8223
0.7127	0.3040	1500	0.3294	0.8242
0.9465	0.3243	1600	0.4913	0.8223
0.3834	0.3445	1700	0.2594	0.9048
0.6691	0.3648	1800	0.3537	0.8993
0.3002	0.3851	1900	0.2502	0.9286
0.0473	0.4054	2000	0.2312	0.9322
0.634	0.4256	2100	0.2406	0.9359
0.4471	0.4459	2200	0.2983	0.9377
0.3229	0.4662	2300	0.3601	0.9212
0.4769	0.4864	2400	0.2990	0.9011
0.0135	0.5067	2500	0.3134	0.9029
0.3025	0.5270	2600	0.1748	0.9505
0.0114	0.5472	2700	0.2898	0.9212
0.1636	0.5675	2800	0.2281	0.9396
0.7427	0.5878	2900	0.2334	0.9341
0.0083	0.6080	3000	0.2466	0.9359
0.0041	0.6283	3100	0.2737	0.9432
1.7268	0.6486	3200	0.2626	0.9396
0.0115	0.6688	3300	0.2621	0.9304
0.6196	0.6891	3400	0.3546	0.9267
0.0141	0.7094	3500	0.2064	0.9505
0.006	0.7296	3600	0.2204	0.9487
0.0226	0.7499	3700	0.2544	0.9451
0.0084	0.7702	3800	0.1698	0.9542
0.0035	0.7904	3900	0.2541	0.9304
0.0137	0.8107	4000	0.1235	0.9670
0.9026	0.8310	4100	0.3319	0.9249
0.4531	0.8512	4200	0.2221	0.9414
0.0039	0.8715	4300	0.1823	0.9560
1.3298	0.8918	4400	0.2125	0.9542
0.4403	0.9120	4500	0.4900	0.8938
0.0025	0.9323	4600	0.3010	0.9249
0.0056	0.9526	4700	0.2978	0.9267
0.3642	0.9728	4800	0.2162	0.9451
0.5704	0.9931	4900	0.2459	0.9414
0.1761	1.0134	5000	0.1674	0.9652
0.0023	1.0336	5100	0.1855	0.9542
0.1477	1.0539	5200	0.1516	0.9652
0.0034	1.0742	5300	0.8117	0.7326
0.4936	1.0944	5400	0.2102	0.9377
0.0158	1.1147	5500	0.1886	0.9524
0.0041	1.1350	5600	0.2544	0.9286
0.7993	1.1552	5700	0.2523	0.9304
0.6292	1.1755	5800	0.1681	0.9451
0.0048	1.1958	5900	0.2746	0.9377
0.4908	1.2161	6000	0.3194	0.9359
0.4156	1.2363	6100	0.1320	0.9744
0.0056	1.2566	6200	0.3195	0.8993
0.0013	1.2769	6300	0.1581	0.9615
0.0027	1.2971	6400	0.2660	0.9414
0.1753	1.3174	6500	0.1858	0.9560
0.0013	1.3377	6600	0.2018	0.9615
0.0033	1.3579	6700	0.1475	0.9707
0.0037	1.3782	6800	0.1417	0.9689
1.2775	1.3985	6900	0.1101	0.9670
0.0051	1.4187	7000	0.1292	0.9707
0.4954	1.4390	7100	0.2473	0.9469
0.1533	1.4593	7200	0.1181	0.9707
0.0022	1.4795	7300	0.1512	0.9707
0.005	1.4998	7400	0.1329	0.9670
0.4396	1.5201	7500	0.1219	0.9725
0.0044	1.5403	7600	0.1665	0.9670
0.7054	1.5606	7700	0.1652	0.9670
0.4057	1.5809	7800	0.1683	0.9542
0.011	1.6011	7900	0.3927	0.9286
0.7	1.6214	8000	0.0999	0.9762
0.0026	1.6417	8100	0.1249	0.9744
0.002	1.6619	8200	0.1386	0.9615
0.0041	1.6822	8300	0.1175	0.9670
0.0034	1.7025	8400	0.1160	0.9725
0.0041	1.7227	8500	0.2097	0.9542
0.3303	1.7430	8600	0.1527	0.9597
0.006	1.7633	8700	0.1389	0.9670
0.0012	1.7835	8800	0.1799	0.9597
0.0027	1.8038	8900	0.1717	0.9615
0.4926	1.8241	9000	0.1517	0.9670
0.0023	1.8443	9100	0.1272	0.9744
0.5028	1.8646	9200	0.1444	0.9725
0.0051	1.8849	9300	0.1276	0.9744
0.0019	1.9051	9400	0.1550	0.9689
0.0052	1.9254	9500	0.1958	0.9634
0.0099	1.9457	9600	0.1359	0.9689
0.3494	1.9660	9700	0.1969	0.9542
0.0035	1.9862	9800	0.1671	0.9579
0.0025	2.0065	9900	0.1435	0.9707
0.0006	2.0268	10000	0.1187	0.9799
0.0035	2.0470	10100	0.1303	0.9780
0.7492	2.0673	10200	0.1294	0.9762
0.0154	2.0876	10300	0.1108	0.9762
0.0007	2.1078	10400	0.2675	0.9487
0.0008	2.1281	10500	0.1334	0.9689
0.003	2.1484	10600	0.1583	0.9670
0.4043	2.1686	10700	0.1198	0.9780
0.0016	2.1889	10800	0.1130	0.9799
0.0033	2.2092	10900	0.1102	0.9762
1.0287	2.2294	11000	0.1053	0.9762
0.3159	2.2497	11100	0.1004	0.9780
0.0464	2.2700	11200	0.1181	0.9762
0.002	2.2902	11300	0.2652	0.9560
0.0758	2.3105	11400	0.1413	0.9725
0.0027	2.3308	11500	0.2025	0.9451
0.0011	2.3510	11600	0.1372	0.9725
0.0009	2.3713	11700	0.1458	0.9725
0.4178	2.3916	11800	0.1403	0.9725
0.0028	2.4118	11900	0.1406	0.9725
0.0009	2.4321	12000	0.1295	0.9725
0.002	2.4524	12100	0.1685	0.9670
0.0022	2.4726	12200	0.1151	0.9744
0.0008	2.4929	12300	0.1635	0.9689
0.0035	2.5132	12400	0.1283	0.9744
0.7689	2.5334	12500	0.1551	0.9689
0.0126	2.5537	12600	0.1144	0.9762
0.0028	2.5740	12700	0.0919	0.9835
0.0053	2.5942	12800	0.1132	0.9762
0.0018	2.6145	12900	0.0851	0.9853
0.0014	2.6348	13000	0.1095	0.9780
0.0017	2.6550	13100	0.0878	0.9817
0.0014	2.6753	13200	0.1322	0.9762
0.0015	2.6956	13300	0.1059	0.9799
0.0036	2.7158	13400	0.0927	0.9817
0.0051	2.7361	13500	0.1009	0.9799
0.0028	2.7564	13600	0.1680	0.9670
0.6951	2.7767	13700	0.2497	0.9487
0.0096	2.7969	13800	0.1138	0.9780
0.5063	2.8172	13900	0.1151	0.9744
0.0026	2.8375	14000	0.1179	0.9762
0.0041	2.8577	14100	0.1266	0.9744
0.0019	2.8780	14200	0.0998	0.9780
0.0038	2.8983	14300	0.1290	0.9652
0.0131	2.9185	14400	0.1998	0.9414
0.0037	2.9388	14500	0.1214	0.9634
0.2382	2.9591	14600	0.1097	0.9780
0.0021	2.9793	14700	0.1152	0.9780
0.002	2.9996	14800	0.1001	0.9799
0.0027	3.0199	14900	0.1291	0.9780
0.971	3.0401	15000	0.1617	0.9689
0.0024	3.0604	15100	0.1245	0.9707
0.0172	3.0807	15200	0.1246	0.9725
0.0016	3.1009	15300	0.1628	0.9634
0.0016	3.1212	15400	0.1621	0.9634
0.0005	3.1415	15500	0.1104	0.9762
0.3195	3.1617	15600	0.1447	0.9725
2.3502	3.1820	15700	0.1827	0.9652
0.4252	3.2023	15800	0.1077	0.9762
0.0042	3.2225	15900	0.1431	0.9707
1.0207	3.2428	16000	0.1287	0.9744
0.5064	3.2631	16100	0.1663	0.9689
0.0018	3.2833	16200	0.1327	0.9725
0.0006	3.3036	16300	0.1163	0.9762
0.0039	3.3239	16400	0.1413	0.9725
0.5045	3.3441	16500	0.1572	0.9689
0.0069	3.3644	16600	0.1553	0.9670
0.0058	3.3847	16700	0.1022	0.9780
0.006	3.4049	16800	0.0993	0.9780
0.002	3.4252	16900	0.0954	0.9799
0.0082	3.4455	17000	0.0976	0.9762
0.0029	3.4657	17100	0.0978	0.9780
0.0008	3.4860	17200	0.0973	0.9799
0.0014	3.5063	17300	0.0979	0.9799
0.0008	3.5266	17400	0.1151	0.9744
0.0023	3.5468	17500	0.1093	0.9780
0.0012	3.5671	17600	0.0996	0.9799
0.0016	3.5874	17700	0.0980	0.9817
0.0015	3.6076	17800	0.1052	0.9799
0.0018	3.6279	17900	0.1054	0.9799
0.003	3.6482	18000	0.1052	0.9780
0.002	3.6684	18100	0.1063	0.9799
0.0011	3.6887	18200	0.1195	0.9762
0.4766	3.7090	18300	0.0873	0.9835
0.0026	3.7292	18400	0.0876	0.9835
0.0006	3.7495	18500	0.0942	0.9835
0.0014	3.7698	18600	0.0944	0.9835
0.0013	3.7900	18700	0.0972	0.9817
0.0016	3.8103	18800	0.1044	0.9817
0.0009	3.8306	18900	0.1039	0.9799
0.0008	3.8508	19000	0.0976	0.9817
0.0005	3.8711	19100	0.0969	0.9835
0.0009	3.8914	19200	0.0964	0.9835
0.0005	3.9116	19300	0.1020	0.9799
0.5488	3.9319	19400	0.0986	0.9817
0.0014	3.9522	19500	0.0963	0.9835
0.001	3.9724	19600	0.1037	0.9799
0.0009	3.9927	19700	0.1045	0.9799

Framework versions

Transformers 4.46.2
Pytorch 2.5.1+cpu
Datasets 2.11.0
Tokenizers 0.20.3

Downloads last month: 36

Safetensors

Model size

85.8M params

Tensor type

F32

Model tree for codewithaman/useful_charts_table_text_images_vs_useless_images_classifier

Base model

google/vit-base-patch16-224-in21k

Finetuned

(2492)

this model