kricko
/

Auditor_Model

adversarial-attacks

Model card Files Files and versions

kricko commited on 29 days ago

Commit

3b78c5c

·

verified ·

1 Parent(s): 3b0d050

Add previous model documentation

Files changed (1) hide show

auditor_prev_model_card.md +40 -0

auditor_prev_model_card.md ADDED Viewed

	@@ -0,0 +1,40 @@

+---
+language: en
+datasets:
+- ShreyashDhoot/reward_model
+- BaiqiL/NaturalBench_Images
+- x1101/nsfw-full
+- Subh775/WeaponDetection
+---
+# Adversarial Image Auditor (Previous Iteration)
+This model serves as a deep learning-based image auditor for AI safety, capable of evaluating images and interpreting aligned text prompts across multiple distinct axes:
+1. **Adversarial Safety (Binary):** Predicting whether an image is Safe or Unsafe.
+2. **Category Classification:** Placing unsafe images directly into `Safe`, `Hate`, `Harassment`, `Sexual`, `Violence`, `Illegal Activity`, or `Sensitive IP` categories.
+3. **Artifact / Seam Quality:** Assessing the quality of image manipulation to detect adversarial seams or diffusion artifacts.
+4. **Relative Adversarial Score:** Predicting a continuous metric of adversarial strength in an image.
+5. **Prompt Faithfulness:** Calculating CLIP-based cosine similarity between the image embedding and text token embeddings.
+## Architecture
+This previous implementation of the neural auditor acts as an end-to-end convolutional and attention-based auditor.
+- **Vision Backbone:** Pretrained DenseNet121, modified to extract feature grids to construct dense 2x2 local spatial maps.
+- **Text Conditioning:** Simple text tokenizer embedding merged directly with image features via a simplified linear transformation.
+- **Output:** Multiple specialized heads computing regression (seam quality, relative adversary) and classification (binary, multi-class) output metrics.
+## Usage
+You can load this model along with its inference script `auditor_inference.py`:
+```python
+from auditor_inference import audit_image
+results = audit_image(
+    model_path="auditor_prev_best.pth",
+    image_path="example.jpg",
+    prompt="A cute cat"
+)
+print(results)
+```