AashishKumar
/

AIvisionGuard-v2

@@ -11,111 +11,119 @@ tags:
 - Diffusors
 - GanDetectors
 - Cifake
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-This model card provides comprehensive information about the model's architecture, training data, evaluation metrics, and environmental impact.
-## Model Details
-### Model Description
-This model is a pre-trained model for image classification, specifically designed for detecting fake images, including both real and AI-generated synthetic images. It utilizes the ViT (Vision Transformer) architecture for image classification tasks.
-- **Developed by:** [Author(s) Name(s)]
-- **Funded by [optional]:** [Funding Source(s)]
-- **Shared by [optional]:** [Organization/Individual(s) Sharing the Model]
-- **Model type:** Vision Transformer (ViT)
-- **Language(s) (NLP):** N/A
-- **License:** Apache License 2.0
-- **Finetuned from model [optional]:** [Base Pre-trained Model]
-### Model Sources [optional]
-- **Repository:** https://github.com/AashishKumar-3002/AIGuardVision.git
-## Uses
-### Direct Use
-This model can be directly used for classifying images as real or AI-generated synthetic images.
-### Downstream Use [optional]
-This model can be fine-tuned for specific image classification tasks related to detecting fake images in various domains.
-### Out-of-Scope Use
-The model may not perform well on tasks outside the scope of image classification, such as object detection or segmentation.
-## Bias, Risks, and Limitations
-The model's performance may be influenced by biases in the training data, leading to potential inaccuracies in classification.
-### Recommendations
-Users should be aware of potential biases and limitations when using the model for classification tasks, and additional data sources may be necessary to mitigate biases.
-## How to Get Started with the Model
-Use the code below to get started with the model:
-[Code Snippet for Model Usage]
 ## Training Details
-### Training Data
-The model was trained on the CIFake dataset, which contains real and AI-generated synthetic images for training the classification model.
-### Training Procedure
-#### Preprocessing [optional]
-Data preprocessing techniques were applied to the training data, including normalization and data augmentation to improve model generalization.
-#### Training Hyperparameters
-- **Training regime:** Fine-tuning with a learning rate of 0.0000001
 - **Batch Size:** 64
 - **Epochs:** 100
-#### Speeds, Sizes, Times [optional]
 - **Training Time:** 1 hr 36 min
-## Evaluation
-### Testing Data, Factors & Metrics
-#### Testing Data
-The model was evaluated on a separate test set from the CIFake dataset.
-#### Factors
-The evaluation considered factors such as class imbalance and dataset diversity.
-#### Metrics
-Evaluation metrics included accuracy, precision, recall, and F1-score.
-### Results
-The model achieved an accuracy of [Accuracy] on the test set, with detailed metrics summarized in the following table:
-[Metrics Table]
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[Information on Model Examination, if available]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-The model architecture is based on the Vision Transformer (ViT) architecture, which uses self-attention mechanisms for image classification tasks.

 - Diffusors
 - GanDetectors
 - Cifake
+base_model:
+- google/vit-base-patch16-224
 ---
+# AI Guard Vision Model Card
+[![License: Apache 2.0](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)
+## Overview
+This model, **AI Guard Vision**, is a Vision Transformer (ViT)-based architecture designed for image classification tasks. Its primary objective is to accurately distinguish between real and AI-generated synthetic images. The model addresses the growing challenge of detecting manipulated or fake visual content to preserve trust and integrity in digital media.
+## Model Summary
+- **Model Type:** Vision Transformer (ViT) – `vit-base-patch16-224`
+- **Objective:** Real vs. AI-generated image classification
+- **License:** Apache 2.0
+- **Fine-tuned From:** `google/vit-base-patch16-224`
+- **Training Dataset:** [CIFake Dataset](https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images)
+- **Developer:** Aashish Kumar, IIIT Manipur
+## Applications & Use Cases
+- **Content Moderation:** Identifying AI-generated images across media platforms.
+- **Digital Forensics:** Verifying the authenticity of visual content for investigative purposes.
+- **Trust Preservation:** Helping maintain the integrity of digital ecosystems by combating misinformation spread through fake images.
+## How to Use the Model
+```python
+from transformers import AutoImageProcessor, ViTForImageClassification
+import torch
+from PIL import Image
+from pillow_heif import register_heif_opener, register_avif_opener
+register_heif_opener()
+register_avif_opener()
+def get_prediction(img):
+    image = Image.open(img).convert('RGB')
+    image_processor = AutoImageProcessor.from_pretrained("AashishKumar/AIvisionGuard-v2")
+    model = ViTForImageClassification.from_pretrained("AashishKumar/AIvisionGuard-v2")
+    inputs = image_processor(image, return_tensors="pt")
+    with torch.no_grad():
+        logits = model(**inputs).logits
+    top2_labels = logits.topk(2).indices.squeeze().tolist()
+    top2_scores = logits.topk(2).values.squeeze().tolist()
+    response = [{"label": model.config.id2label[label], "score": score} for label, score in zip(top2_labels, top2_scores)]
+    return response
+```
+## Dataset Information
+The model was fine-tuned on the **CIFake dataset**, which contains both real and AI-generated synthetic images:
+- **Real Images:** Collected from the CIFAR-10 dataset.
+- **Fake Images:** Generated using Stable Diffusion 1.4.
+- **Training Data:** 100,000 images (50,000 per class).
+- **Testing Data:** 20,000 images (10,000 per class).
+## Model Architecture
+- **Transformer Encoder Layers:** Utilizes self-attention mechanisms.
+- **Positional Encodings:** Helps the model understand image structure.
+- **Pretrained Weights:** Pretrained on ImageNet-21k and fine-tuned on ImageNet 2012 for enhanced performance.
+### Why Vision Transformer?
+- **Scalability and Performance:** Excels at high-level global feature extraction.
+- **State-of-the-Art Accuracy:** Leverages transformers to outperform traditional CNN models.
 ## Training Details
+- **Learning Rate:** 0.0000001
 - **Batch Size:** 64
 - **Epochs:** 100
 - **Training Time:** 1 hr 36 min
+## Evaluation Metrics
+The model was evaluated using the CIFake test dataset, with the following metrics:
+- **Accuracy:** 92%
+- **F1 Score:** 0.89
+- **Precision:** 0.85
+- **Recall:** 0.88
+| Model         | Accuracy | F1-Score | Precision | Recall |
+|---------------|----------|----------|-----------|--------|
+| Baseline      | 85%      | 0.82     | 0.78      | 0.80   |
+| Augmented     | 88%      | 0.85     | 0.83      | 0.84   |
+| Fine-tuned ViT| **92%**  | **0.89** | **0.85**  | **0.88**|
+## System Workflow
+- **Frontend:** ReactJS
+- **Backend:** Python Flask
+- **Database:** PostgreSQL
+- **Model:** Deployed via Pytorch and TensorFlow frameworks
+## Strengths and Limitations
+### Strengths:
+- **High Accuracy:** Achieves state-of-the-art performance in distinguishing real and synthetic images.
+- **Pretrained on ImageNet-21k:** Allows for efficient transfer learning and robust generalization.
+### Limitations:
+- **Synthetic Image Diversity:** The model may underperform on novel or unseen synthetic images that are significantly different from the training data.
+- **Data Bias:** Like all machine learning models, its predictions may reflect biases present in the training data.
+## Conclusion and Future Work
+This model provides a highly effective tool for detecting AI-generated synthetic images and has promising applications in content moderation, digital forensics, and trust preservation. Future improvements may include:
+- **Hybrid Architectures:** Combining transformers with convolutional layers for improved performance.
+- **Multimodal Detection:** Incorporating additional modalities (e.g., metadata or contextual information) for more comprehensive detection.