AnodHuang commited on
Commit
a2b0e14
·
verified ·
1 Parent(s): 3c91d0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -3
README.md CHANGED
@@ -1,3 +1,50 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - AnodHuang/AMVD_AS
5
+ base_model:
6
+ - MIT/ast-finetuned-audioset-10-10-0.4593
7
+ ---
8
+ # AST-AMVD-SAD-v1
9
+ ## Description
10
+ A fine-tuned audio classification model for detecting AI-generated audio content.
11
+ ## Author
12
+ - Kunyang Huang (huangku@kean.edu)
13
+ - Bin Hu (binhu.philip@gmail.com)
14
+ ## Model Details
15
+ ### Model Description
16
+ - Architecture: Based on the Audio Spectrogram Transformer (AST) architecture from MIT/ast-finetuned-audioset-10-10-0.4593
17
+ - Input: Audio waveforms converted to mel-spectrogram representations
18
+ - Output: Four-class classification for audio authenticity detection
19
+ ### Intended Use
20
+ **This model is designed to:**
21
+ - Detect AI-generated audio content
22
+ - Identify different types of synthetic audio:
23
+ - Class 0 (H): Real Human Audio
24
+ - Class 1 (C): AI Cloned Audio
25
+ - Class 2 (A): AI Generated Audio
26
+ - Class 3 (Combined): Mixed Human/AI Audio
27
+ - Primary use cases include:
28
+ - Content authenticity verification
29
+ - AI-generated content detection systems
30
+ - Audio forensics applications
31
+ ### Training Data
32
+ - Dataset: AMVD_AS Dataset
33
+ - Data Composition:
34
+ - Balanced samples across four categories
35
+ - Contains both synthetic and genuine human audio samples
36
+ ## Training Procedure
37
+ ### Fine-tuning Parameters
38
+ - Base Model: MIT/ast-finetuned-audioset-10-10-0.4593
39
+ - Initial Learning Rate: 4e-5 → 1e-5 (linear decay)
40
+ - Total Training Steps: 25,000
41
+ - Batch Size: 32
42
+ - Warmup Steps: 5,000
43
+ - Weight Decay: 0.01
44
+ - Gradient Clip Norm: 1.0
45
+ - Training Duration: ~4.5 hours (A100 GPU)
46
+ ## Evaluation
47
+ ### Validation Performance
48
+ - Training Loss 0.0874
49
+ - Gradient Norm 0.000075778
50
+ - LR Stability 1e-5