Spaces:

EarthSpeciesProject
/

NatureLM-Audio

Running on L40S

dianekim David commited on 6 days ago

Commit

abda905

1 Parent(s): a413555

Update examples, Help tab, and css (#137)

- Trimmed long audio examples to 10s
- Updated task dropdown list to match core tasks
- Updated Help tab with new prompting tips
- Fixed border around chat component and waveform color

---------

Co-authored-by: David <david@earthspecies.org>

Files changed (9) hide show

README.md +2 -2
app.py +25 -18
assets/American Crow - Corvus brachyrhynchos.mp3 +2 -2
assets/Lazuli_Bunting_yell-YELLLAZB20160625SM303143.mp3 +2 -2
assets/nri-GreenTreeFrogEvergladesNP.mp3 +2 -2
assets/yell-YELLAMRO20160506SM3.mp3 +2 -2
static/help.html +39 -47
static/onboarding.html +2 -2
static/style.css +8 -2

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: NatureLM Audio Debug Private
 emoji: 🔈
 colorFrom: green
 colorTo: green
@@ -14,7 +14,7 @@ thumbnail: >-
 ---
-# NatureLM-audio Demo
 This is a demo of the NatureLM-audio model. Users can upload an audio file containing animal vocalizations and ask questions about them in a chat interface.

 ---
+title: NatureLM-Audio
 emoji: 🔈
 colorFrom: green
 colorTo: green
 ---
+# NatureLM-audio
 This is a demo of the NatureLM-audio model. Users can upload an audio file containing animal vocalizations and ask questions about them in a chat interface.

app.py CHANGED Viewed

@@ -252,29 +252,30 @@ def main() -> tuple[gr.Blocks, gr.themes.Base, str]:
     robin_audio = ASSETS_DIR / "yell-YELLAMRO20160506SM3.mp3"
     whale_audio = ASSETS_DIR / "Humpback Whale - Megaptera novaeangliae.wav"
     crow_audio = ASSETS_DIR / "American Crow - Corvus brachyrhynchos.mp3"
     examples = {
-        "Identifying Focal Species (Lazuli Bunting)": [
             str(laz_audio),
             "What is the common name for the focal species in the audio?",
         ],
-        "Caption the audio (Green Tree Frog)": [
             str(frog_audio),
-            "Caption the audio, using the common name for any animal species.",
         ],
         "Caption the audio (American Robin)": [
             str(robin_audio),
             "Caption the audio, using the scientific name for any animal species.",
         ],
-        "Identifying Focal Species (Megaptera novaeangliae)": [
-            str(whale_audio),
-            "What is the scientific name for the focal species in the audio?",
-        ],
-        "Speaker Count (American Crow)": [
             str(crow_audio),
-            "How many individuals are vocalizing in this audio?",
         ],
-        "Caption the audio (Humpback Whale)": [str(whale_audio), "Caption the audio."],
     }
     gr.set_static_paths(paths=[ASSETS_DIR])
@@ -313,6 +314,7 @@ def main() -> tuple[gr.Blocks, gr.themes.Base, str]:
                         interactive=True,
                         sources=["upload"],
                         type="filepath",
                     )
                     # Validate audio duration and sample rate on upload
                     audio_input.change(
@@ -332,15 +334,19 @@ def main() -> tuple[gr.Blocks, gr.themes.Base, str]:
                     task_dropdown = gr.Dropdown(
                         [
                             "What are the common names for the species in the audio, if any?",
-                            "Caption the audio, using the scientific name for any animal species.",
-                            "Caption the audio, using the common name for any animal species.",
-                            "What is the scientific name for the focal species in the audio?",
-                            "What is the common name for the focal species in the audio?",
-                            "What is the family of the focal species in the audio?",
                             "What is the genus of the focal species in the audio?",
-                            "What is the taxonomic name of the focal species in the audio?",
-                            "What call types are heard from the focal species in the audio?",
-                            "What is the life stage of the focal species in the audio?",
                         ],
                         label="Pre-Loaded Tasks",
                         info="Select a task, or write your own prompt below.",
@@ -504,6 +510,7 @@ def main() -> tuple[gr.Blocks, gr.themes.Base, str]:
                                         gr.Audio(
                                             filepath,
                                             label=label,
                                         )
             with gr.Tab("💡 Help"):

     robin_audio = ASSETS_DIR / "yell-YELLAMRO20160506SM3.mp3"
     whale_audio = ASSETS_DIR / "Humpback Whale - Megaptera novaeangliae.wav"
     crow_audio = ASSETS_DIR / "American Crow - Corvus brachyrhynchos.mp3"
+    walrus_audio = ASSETS_DIR / "Walrus - Odobenus rosmarus.wav"
     examples = {
+        "Species Identification (Lazuli Bunting)": [
             str(laz_audio),
             "What is the common name for the focal species in the audio?",
         ],
+        "Species Detection (Humpback Whale)": [
+            str(whale_audio),
+            "What are the common names for the species in the audio, if any?",
+        ],
+        "Call Type (Green Tree Frog)": [
             str(frog_audio),
+            "What type of call is the frog making in this recording?",
         ],
         "Caption the audio (American Robin)": [
             str(robin_audio),
             "Caption the audio, using the scientific name for any animal species.",
         ],
+        "Multiple Species Identification (American Crow)": [
             str(crow_audio),
+            "List the common names of all species vocalizing in this audio clip.",
         ],
+        "Taxonomy (Walrus)": [str(walrus_audio), "What is the taxonomic name of the focal species in the audio?"],
     }
     gr.set_static_paths(paths=[ASSETS_DIR])
                         interactive=True,
                         sources=["upload"],
                         type="filepath",
+                        waveform_options=gr.WaveformOptions(waveform_progress_color="#3b82f6"),
                     )
                     # Validate audio duration and sample rate on upload
                     audio_input.change(
                     task_dropdown = gr.Dropdown(
                         [
                             "What are the common names for the species in the audio, if any?",
+                            "What species is vocalizing in this audio recording? Common name?",
+                            "Which of these is the focal species in the audio? Options: [add your options here]",
+                            "List the scientific names of all species vocalizing in this audio clip.",
                             "What is the genus of the focal species in the audio?",
+                            "What is the common name of the species vocalizing in this audio recording?"
+                            " Provide your top 3 predictions in ranked order.",
+                            "What type of vocalization or call is this?",
+                            "Is the focal species an adult or juvenile?",
+                            "Caption the audio, using common names for any animal species.",
+                            "Is there a bird vocalizing in this recording? Answer: Yes or No.",
+                            "Based on the sounds, what habitat or environment do you think this was recorded in?",
+                            "How many individual vocalizations can you detect in this audio?",
+                            "First describe what you hear, then identify the species.",
                         ],
                         label="Pre-Loaded Tasks",
                         info="Select a task, or write your own prompt below.",
                                         gr.Audio(
                                             filepath,
                                             label=label,
+                                            waveform_options=gr.WaveformOptions(waveform_progress_color="#3b82f6"),
                                         )
             with gr.Tab("💡 Help"):

assets/American Crow - Corvus brachyrhynchos.mp3 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d0f76bff28d3e3021be495754b28ef3924bc32ff0c657b67bd4ee6bb177a1f8e
-size 2164626

 version https://git-lfs.github.com/spec/v1
+oid sha256:91b67d8df44c265cd8c2aece4078b00f3d67706156e40c0b799b0db42ea3aa09
+size 402834

assets/Lazuli_Bunting_yell-YELLLAZB20160625SM303143.mp3 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6a67960286021e58ffab2d3e4b67b7e20d08b530018c64c6afefe4aae5ff28be
-size 316920

 version https://git-lfs.github.com/spec/v1
+oid sha256:b266ceff4be4c64bc2abf668b1b7de17fdc17bb84e64102d045d885f8a6b989e
+size 244523

assets/nri-GreenTreeFrogEvergladesNP.mp3 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3004b02bd1793db81f5e6ddfe2f805dbd587af3c0d03edbedec2ad23e92660dd
-size 162234

 version https://git-lfs.github.com/spec/v1
+oid sha256:6015cac9794e7ca94c84df11927718395fc4cbef9dfcb997040a7149bb13df24
+size 155036

assets/yell-YELLAMRO20160506SM3.mp3 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7a2700bbe2233505ccf592e9e06a4b196a0feb4d2d7a4773ed5f2f110696a001
-size 598352

 version https://git-lfs.github.com/spec/v1
+oid sha256:57f2595f1db3d29be3eae5908964f1b61f65ed375b14f68f6b69f6f1f78f7f79
+size 165763

static/help.html CHANGED Viewed

@@ -23,8 +23,7 @@
     </li>
     <li style="margin-bottom: 8px;">
       <strong>Trim your audio (if needed)</strong> by clicking the scissors
-      icon on the bottom right of the audio panel. Try to keep your audio
-      to 10 seconds or less.
     </li>
     <li style="margin-bottom: 8px;">
       <strong>View the Spectrogram (optional)</strong>. You can easily
@@ -45,41 +44,41 @@
 </div>
 <div class="guide-section">
   <h3>Tips</h3>
-  <b>Prompting Best Practices</b>
   <ul style="margin-top: 12px; padding-left: 20px;
              color: #6b7280; font-size: 14px; line-height: 1.6;">
-    <li>
-      When possible, use scientific or taxonomic names and mention
-      the context if known (geographic area/location, time of day
-      or year, habitat type)
-    </li>
-    <li>Ask one question at a time, and be specific about what
-        you want to know</li>
-    <ul>&#10060; Don't ask:
-      <i>"Analyze this audio and tell me all you know about it."</i>
-    </ul>
-    <ul>&#9989; Do ask:
-      <i>"What species made this sound?"</i>
     </ul>
-    <li>Keep prompts more open-ended and avoid asking Yes/No
-        or very targeted questions</li>
-    <ul>&#10060; Don't ask:
-      <i>"Is there a bottlenose dolphin vocalizing in the audio?
-      Yes or No."</i>
     </ul>
-    <ul>&#9989; Do ask:
-      <i>"What focal species, if any, are heard in the audio?"</i>
     </ul>
-    <li>Giving the model options to choose works well for broader
-        categories (less so for specific species)</li>
-    <ul>&#10060; Don't ask:
-      <i>"Classify the audio into one of the following species:
-      Bottlenose Dolphin, Orca, Great Gray Owl"</i>
-    </ul>
-    <ul>&#9989; Do ask:
-      <i>"Classify the audio into one of the following categories:
-      Cetaceans, Aves, or None."</i>
     </ul>
   </ul>
   <br>
   <b>Audio Files</b>
@@ -97,23 +96,16 @@
   <h3>Learn More</h3>
   <ul style="margin-top: 12px; padding-left: 20px;
              color: #6b7280; font-size: 14px; line-height: 1.6;">
-    <li>Read our
-      <a href="https://huggingface.co/blog/EarthSpeciesProject/nature-lm-audio-ui-demo/"
-         target="_blank">recent blog post</a>
-      with a step-by-step tutorial</li>
     <li>Check out the
       <a href="https://arxiv.org/abs/2411.07186"
-         target="_blank">published paper</a>
-      for a deeper technical dive on NatureLM-audio.</li>
-    <li>Visit the
-      <a href="https://earthspecies.github.io/naturelm-audio-demo/"
-         target="_blank">NatureLM-audio Demo Page</a>
-      for additional context, a demo video, and more examples
-      of the model in action.</li>
-    <li>Sign up for our
-      <a href="https://forms.gle/WjrbmFhKkzmEgwvY7"
-         target="_blank">closed beta waitlist</a>,
-      if you're interested in testing upcoming features like
-      longer audio files and batch processing.</li>
   </ul>
 </div>

     </li>
     <li style="margin-bottom: 8px;">
       <strong>Trim your audio (if needed)</strong> by clicking the scissors
+      icon on the bottom right of the audio panel. Only the first 10 seconds of audio will be analyzed, so trim to the most relevant section of your recording.
     </li>
     <li style="margin-bottom: 8px;">
       <strong>View the Spectrogram (optional)</strong>. You can easily
 </div>
 <div class="guide-section">
   <h3>Tips</h3>
+  <b>Prompting Tips</b> (see full <a href="https://projects.earthspecies.org/naturelm-audio/prompting_guide.html" target="_blank">Prompting Guide</a> for more)
   <ul style="margin-top: 12px; padding-left: 20px;
              color: #6b7280; font-size: 14px; line-height: 1.6;">
+    <li><strong>For Yes/No questions, always include "Answer: Yes or No."</strong> Without this, the model may respond with species names rather than a yes or no answer.</li>
+    <ul>
+      <li>
+        <i>Is an alarm call present in this recording? Answer: Yes or No.</i>
+      </li>
+      <li>
+        <i>Is there a frog or amphibian vocalizing in this recording? Answer: Yes or No.</i>
+      </li>
     </ul>
+    <li><strong>Providing geographic or temporal context</strong> can help narrow identification.</li>
+    <ul>
+      <li>
+        <i>Given the context: '[context]', what is the common name for the focal species in the audio?"</i>
+      </li>
+      <li>
+        Replace [context] with whatever metadata you have, e.g. country: BR, coordinates: -23.5, -46.6 or recorded in temperate forest, June.
+      </li>
     </ul>
+    <ul>
     </ul>
+    <li><strong>Giving the model a candidate list</strong> to choose from can improve accuracy. </li>
+    <ul>
+<li>
+      <i>Which of these is the focal species in the audio? Options: [species_choices]</i>
+    </li><li>
+      <i>Replace [species_choices] with a comma-separated list, e.g. Turdus merula, Erithacus rubecula, Fringilla coelebs, Parus major, Phylloscopus collybita.</i>
+    </li>
     </ul>
   </ul>
   <br>
   <b>Audio Files</b>
   <h3>Learn More</h3>
   <ul style="margin-top: 12px; padding-left: 20px;
              color: #6b7280; font-size: 14px; line-height: 1.6;">
+    <li>Visit the <a href="https://projects.earthspecies.org/naturelm-audio/prompting_guide.html"
+target="_blank">NatureLM-audio Project Page</a>
+for more details, examples, and the full Prompting Guide</li>
+    <li>Read our <a href="https://huggingface.co/blog/EarthSpeciesProject/nature-lm-audio-ui-demo/"
+         target="_blank">blog post</a> with a step-by-step tutorial</li>
     <li>Check out the
       <a href="https://arxiv.org/abs/2411.07186"
+         target="_blank">published paper</a> for a deeper technical dive on NatureLM-audio</li>
+    <li>Sign up for our <a href="https://forms.gle/WjrbmFhKkzmEgwvY7"
+         target="_blank">closed beta waitlist</a>, if you're interested in testing upcoming features like longer audio files and batch processing.</li>
   </ul>
 </div>

static/onboarding.html CHANGED Viewed

@@ -8,6 +8,6 @@
             </div>
         </div>
     </div>
-    <a href="https://huggingface.co/blog/EarthSpeciesProject/nature-lm-audio-ui-demo/"
-       target="_blank" class="link-btn">View Tutorial</a>
 </div>

             </div>
         </div>
     </div>
+    <a href="https://projects.earthspecies.org/naturelm-audio/quick_start.html"
+       target="_blank" class="link-btn">Quick Start Guide</a>
 </div>

static/style.css CHANGED Viewed

@@ -7,10 +7,16 @@
     margin: 2px 6px;
     align-self: center;
 }
 #spectrogram-plot {
     padding: 12px;
     margin: 12px;
 }
 .banner {
     background: white;
     border: 1px solid #e5e7eb;
@@ -35,7 +41,7 @@
     color: #6b7280;
     line-height: 1.4;
 }
-.link-btn {
     padding: 6px 12px;
     border-radius: 6px;
     font-size: 13px;
@@ -48,7 +54,7 @@
     display: inline-block;
     transition: background 0.2s ease;
 }
-.link-btn:hover {
     background: #2563eb;
 }

     margin: 2px 6px;
     align-self: center;
 }
+#chatbot {
+    border-style: none !important;
+}
 #spectrogram-plot {
     padding: 12px;
     margin: 12px;
 }
+.gradio-style a {
+    padding: 0;
+}
 .banner {
     background: white;
     border: 1px solid #e5e7eb;
     color: #6b7280;
     line-height: 1.4;
 }
+.gradio-style .link-btn {
     padding: 6px 12px;
     border-radius: 6px;
     font-size: 13px;
     display: inline-block;
     transition: background 0.2s ease;
 }
+.gradio-style .link-btn:hover {
     background: #2563eb;
 }