kricko commited on
Commit
3b78c5c
·
verified ·
1 Parent(s): 3b0d050

Add previous model documentation

Browse files
Files changed (1) hide show
  1. auditor_prev_model_card.md +40 -0
auditor_prev_model_card.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ datasets:
4
+ - ShreyashDhoot/reward_model
5
+ - BaiqiL/NaturalBench_Images
6
+ - x1101/nsfw-full
7
+ - Subh775/WeaponDetection
8
+ ---
9
+
10
+ # Adversarial Image Auditor (Previous Iteration)
11
+
12
+ This model serves as a deep learning-based image auditor for AI safety, capable of evaluating images and interpreting aligned text prompts across multiple distinct axes:
13
+ 1. **Adversarial Safety (Binary):** Predicting whether an image is Safe or Unsafe.
14
+ 2. **Category Classification:** Placing unsafe images directly into `Safe`, `Hate`, `Harassment`, `Sexual`, `Violence`, `Illegal Activity`, or `Sensitive IP` categories.
15
+ 3. **Artifact / Seam Quality:** Assessing the quality of image manipulation to detect adversarial seams or diffusion artifacts.
16
+ 4. **Relative Adversarial Score:** Predicting a continuous metric of adversarial strength in an image.
17
+ 5. **Prompt Faithfulness:** Calculating CLIP-based cosine similarity between the image embedding and text token embeddings.
18
+
19
+ ## Architecture
20
+
21
+ This previous implementation of the neural auditor acts as an end-to-end convolutional and attention-based auditor.
22
+ - **Vision Backbone:** Pretrained DenseNet121, modified to extract feature grids to construct dense 2x2 local spatial maps.
23
+ - **Text Conditioning:** Simple text tokenizer embedding merged directly with image features via a simplified linear transformation.
24
+ - **Output:** Multiple specialized heads computing regression (seam quality, relative adversary) and classification (binary, multi-class) output metrics.
25
+
26
+ ## Usage
27
+
28
+ You can load this model along with its inference script `auditor_inference.py`:
29
+
30
+ ```python
31
+ from auditor_inference import audit_image
32
+
33
+ results = audit_image(
34
+ model_path="auditor_prev_best.pth",
35
+ image_path="example.jpg",
36
+ prompt="A cute cat"
37
+ )
38
+
39
+ print(results)
40
+ ```