| --- |
| license: unknown |
| language: |
| - en |
| metrics: |
| - accuracy |
| - precision |
| - f1 |
| - recall |
| tags: |
| - art |
| base_model: google/vit-base-patch16-224 |
| datasets: |
| - DataScienceProject/Art_Images_Ai_And_Real_ |
| pipeline_tag: image-classification |
| library_name: transformers |
| --- |
| |
| ### Model Card for Model ID |
| This model is designed for classifying images as either 'real' or 'fake-AI generated' using a Vision Transformer (VIT) . |
|
|
| Our goal is to accurately classify the source of the image with at least 85% accuracy and achieve at least 80% in the recall test. |
|
|
| ### Model Description |
|
|
| This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images. |
| The model classifies images into two categories: 'real ' and 'fake - AI generated'. |
| It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs). |
|
|
| ### Direct Use |
|
|
| This model can be used to classify images as 'real art' or 'fake art' based on visual features learned by the Vision Transformer. |
|
|
|
|
| ### Out-of-Scope Use |
|
|
| The model may not perform well on images outside the scope of art or where the visual characteristics are drastically different from those in the training dataset. |
|
|
|
|
| ### Recommendations |
|
|
| Run the traning code on pc with an nvidia gpu better then rtx 3060 and at least 6 core cpu / use google collab. |
|
|
|
|
| ## How to Get Started with the Model |
|
|
| Prepare Data: Organize your images into appropriate folders and run the code. |
|
|
| ## model architecture |
|
|
|  |
|
|
| ## Training Details |
|
|
| -Dataset: DataScienceProject/Art_Images_Ai_And_Real_ |
|
|
| Preprocessing: Images are resized, converted to 'rgb' format , transformed into tensor and stored in special torch dataset. |
|
|
|
|
| #### Training Hyperparameters |
|
|
| optimizer = optim.Adam(model.parameters(), lr=0.001) |
| num_epochs = 10 |
| criterion = nn.CrossEntropyLoss() |
| |
| ## Evaluation |
| |
| The model takes 15-20 minutes to run , based on our dataset , equipped with the following pc hardware: cpu :i9 13900 ,ram: 32gb , gpu: rtx 3080 |
| your mileage may vary. |
| |
| ### Testing Data, Factors & Metrics |
| |
| -precision |
| -recall |
| -f1 |
| -confusion_matrix |
| -accuracy |
|
|
|
|
| ### Results |
|
|
| -test accuracy = 0.92 |
|
|
| -precision = 0.893 |
|
|
| -recall = 0.957 |
|
|
| -f1 = 0.924 |
|
|
| - |
|
|
|  |
|
|
|
|
|
|
| #### Summary |
|
|
| This model is by far the best of what we tried (CNN , Resnet , CNN + ELA). |
|
|
|
|