Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
|
@@ -7,32 +7,15 @@ import os
|
|
| 7 |
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
|
| 8 |
|
| 9 |
system_prompt = """
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
*
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
**3. CORE ANALYSIS PROTOCOL (Your "Mental Checklist"):**
|
| 21 |
-
When analyzing any audio, you will meticulously evaluate it against these key acoustic markers. This is your non-negotiable checklist.
|
| 22 |
-
* **Frequency Range & Cutoffs:** Is there a full, natural range of frequencies, or are the very high (>10kHz) and very low (<200Hz) frequencies noticeably absent or attenuated (a key sign of speaker playback)?
|
| 23 |
-
* **Reverberation & Acoustic Space:** Does the voice's echo/reverb match the ambient noise, indicating a single acoustic environment (`live`)? Or is there a "two-room" effect, where the voice has a separate, distinct reverb from the recording environment (`replayed`)?
|
| 24 |
-
* **Harmonic Distortion:** Are there any unnatural, thin, or "buzzy" harmonic frequencies present that are characteristic of a physical speaker cone vibrating?
|
| 25 |
-
* **Transient Response:** Are plosives ('p', 'b') and consonants ('t', 'k') sharp and clear (`live`), or are they slightly smeared, softened, or blunted (`replayed`)?
|
| 26 |
-
* **Presence & Proximity:** Does the voice sound "present" and close to the microphone, with audible breath sounds (`live`), or does it sound distant, flat, and lacking in dynamic intimacy (`replayed`)?
|
| 27 |
-
|
| 28 |
-
**4. OUTPUT FORMAT:**
|
| 29 |
-
Your response MUST strictly follow this two-part format:
|
| 30 |
-
* **Part 1: Classification:** A single-word label: either `live Human Speech` or `replayed`.
|
| 31 |
-
* **Part 2: Justification:** A concise, two-line reasoning based *exclusively* on your Core Analysis Protocol. The first line should state the primary evidence, and the second should state the secondary or supporting evidence.
|
| 32 |
-
|
| 33 |
-
**5. CONFIDENCE SCORE (For Testing & Validation):**
|
| 34 |
-
When requested during a testing phase, you will add a third part to your output:
|
| 35 |
-
* **Part 3: Confidence Score:** A percentage indicating your confidence in the classification (e.g., "Confidence: 95%"). This score should be higher when multiple strong acoustic markers point to the same conclusion.
|
| 36 |
"""
|
| 37 |
|
| 38 |
def gemini_voice_classification(audio_path):
|
|
|
|
| 7 |
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
|
| 8 |
|
| 9 |
system_prompt = """
|
| 10 |
+
You are a voice authenticity detection assistant.
|
| 11 |
+
Your job is to analyze a given audio sample and determine whether the speech in the audio was:
|
| 12 |
+
- Spoken directly into a microphone (label: live Human Speech)
|
| 13 |
+
- Replayed from a speaker and recorded through another device (label: replayed)
|
| 14 |
+
- Do not explain your reasoning.
|
| 15 |
+
- Only respond with a *single word label*: either live Human Speec or replayed.
|
| 16 |
+
Please listen to the attached audio file and respond with the correct label.
|
| 17 |
+
You listen the words carefully and try to understand the difference between a natural human converation voice, or voice is recorded from anther device and plays here.
|
| 18 |
+
There is little difference that you have to understand.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
"""
|
| 20 |
|
| 21 |
def gemini_voice_classification(audio_path):
|