Simma7
/

audio_model

Audio Classification

audio - wav2vec2 - deepfake-detection - synthetic-speech - tts - voice-cloning

Model card Files Files and versions

Simma7 commited on Apr 5

Commit

7ca74f4

·

verified ·

1 Parent(s): 1b63c88

Update README.md

Files changed (1) hide show

README.md +5 -12

README.md CHANGED Viewed

@@ -1,26 +1,22 @@
-metadata
-library_name: transformers
-base_model: Gustking/wav2vec2-large-xlsr-deepfake-audio-classification
-base_model_relation: finetune
 license: apache-2.0
-language:
-  - en
 pipeline_tag: audio-classification
 tags:
-  - audio
   - wav2vec2
   - deepfake-detection
   - synthetic-speech
   - tts
   - voice-cloning
-datasets:
-  - garystafford/deepfake-audio-detection
 metrics:
   - accuracy
   - f1
   - precision
   - recall
   - roc_auc
 Deepfake Audio Detection Model
 Fine-tuned Wav2Vec2 model for detecting AI-generated speech. Determines if audio was spoken by a human or created by AI text-to-speech/voice cloning software.
@@ -56,7 +52,6 @@ import librosa
 from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
 # Load model and feature extractor
-model_name = "garystafford/wav2vec2-deepfake-voice-detector"
 model = AutoModelForAudioClassification.from_pretrained(model_name)
 feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
@@ -105,8 +100,6 @@ dim=-1: Applies softmax across classes for each sample, not across samples
 Batch Processing Example
 import glob
-audio_files = glob.glob("audio_folder/*.wav")
 for audio_path in audio_files:
     audio, _ = librosa.load(audio_path, sr=16000, mono=True)
     inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt", padding=True)

+---
 license: apache-2.0
+language: en
 pipeline_tag: audio-classification
+library_name: transformers
 tags:
+- audio
   - wav2vec2
   - deepfake-detection
   - synthetic-speech
   - tts
   - voice-cloning
 metrics:
   - accuracy
   - f1
   - precision
   - recall
   - roc_auc
+---
 Deepfake Audio Detection Model
 Fine-tuned Wav2Vec2 model for detecting AI-generated speech. Determines if audio was spoken by a human or created by AI text-to-speech/voice cloning software.
 from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
 # Load model and feature extractor
 model = AutoModelForAudioClassification.from_pretrained(model_name)
 feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
 Batch Processing Example
 import glob
 for audio_path in audio_files:
     audio, _ = librosa.load(audio_path, sr=16000, mono=True)
     inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt", padding=True)