kricko commited on
Commit
1308e1f
·
verified ·
1 Parent(s): 4c12567

Update repository landing page

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ datasets:
4
+ - ShreyashDhoot/reward_model
5
+ - BaiqiL/NaturalBench_Images
6
+ - x1101/nsfw-full
7
+ - Subh775/WeaponDetection
8
+ ---
9
+
10
+ # Adversarial Image Auditor
11
+
12
+ This model serves as a deep learning-based image auditor for AI safety, capable of evaluating images and interpreting aligned text prompts across multiple distinct axes:
13
+ 1. **Adversarial Safety (Binary):** Predicting whether an image is Safe or Unsafe.
14
+ 2. **Category Classification:** Placing unsafe images directly into `Safe`, `NSFW`, `Gore`, or `Weapons` categories.
15
+ 3. **Artifact / Seam Quality:** Assessing the quality of image manipulation to detect adversarial seams or diffusion artifacts.
16
+ 4. **Relative Adversarial Score:** Predicting a continuous metric of adversarial strength in an image.
17
+ 5. **Prompt Faithfulness (Contrastive InfoNCE):** Calculating a temperature-scaled contrastive probability of image–text faithfulness.
18
+
19
+ ## Architecture
20
+
21
+ This neural auditor introduces robust contrastive alignments for multimodal safety.
22
+ - **Vision Backbone:** Pretrained DenseNet121, modified to extract feature grids to construct dense 2x2 local spatial maps.
23
+ - **Text Conditioning:** Simple text tokenizer with correct Cross-Attention (`key_padding_mask` integrated, Pre-LayerNorm).
24
+ - **FiLM Modulation:** Conditions adversarial layers using timestep diffusion tokens and text feature projections directly.
25
+ - **Output:** Decoupled safety axes generating bounding-box GradCAM predictions, Continuous InfoNCE faithfulness, and safety classifications.
26
+
27
+ ## Usage
28
+
29
+ You can load this model along with its inference script `auditor_inference.py`:
30
+
31
+ ```python
32
+ from auditor_inference import audit_image
33
+
34
+ results = audit_image(
35
+ model_path="auditor_new_best.pth",
36
+ image_path="example.jpg",
37
+ prompt="A cute cat"
38
+ )
39
+
40
+ print(results)
41
+ ```