Spaces:

EarthSpeciesProject
/

NatureLM-Audio

Sleeping

Milad Alizadeh commited on Mar 18

Commit

c3156f6

0 Parent(s):

hf app (#95)

- Migrated the NatureLM-audio demo app from HuggingFace Spaces into this
repository (projects/NatureLM-audio-hf-app): The app previously lived as
a standalone HF Spaces repo with a copy of the NatureLM code. It now
properly depends on esp-data, esp-research, and naturelm_audio as
packages.
- Switched from Gradio SDK to Docker SDK: This lets us pull our private
codebases into the HF Space without exposing them. Once open-sourced,
this can be simplified. Downside: we lose HF's free GPU Zero tier, so
we're using paid GPUs (CPU for now while other pieces come together).
- Deploy via `make push-naturelm-app-to-hf:` Uses git subtree to push
just the app directory to HF Spaces.
- NatureLM-audio-v1.5 packaged: Moved into src/naturelm_audio/ with a
build system — necessary because this is the first cross-project
dependency in the workspace.
- CI: Added deptry check for the HF app. Updated ruff config with
workspace-aware src paths.

Known gaps

- The app needs a .generate()-like method on the model — currently uses
a mock placeholder.
- Cross-project dependencies highlight the need for a monorepo vs
multi-repo discussion.
Known gaps

- The app needs a .generate()-like method on the model — currently uses
a mock placeholder.
- Cross-project dependencies highlight the need for a monorepo vs
multi-repo discussion.

Files changed (31) hide show

.gitattributes +39 -0
Dockerfile +33 -0
README.md +29 -0
app.py +571 -0
assets/484366__spacejoe__bird-3.wav +3 -0
assets/American Crow - Corvus brachyrhynchos.mp3 +3 -0
assets/ESP_logo_white.png +3 -0
assets/Eastern Gray Squirrel - Sciurus carolinensis.wav +3 -0
assets/Gray Wolf - Canis lupus italicus.m4a +3 -0
assets/Humpback Whale - Megaptera novaeangliae.wav +3 -0
assets/Lazuli_Bunting_yell-YELLLAZB20160625SM303143.m4a +3 -0
assets/Lazuli_Bunting_yell-YELLLAZB20160625SM303143.mp3 +3 -0
assets/Sample_Audio_Files_NatureLM_audio.zip +3 -0
assets/Walrus - Odobenus rosmarus.wav +3 -0
assets/esp_favicon.png +3 -0
assets/esp_logo.png +0 -0
assets/naturelm-audio-overiew.png +3 -0
assets/nri-GreenTreeFrogEvergladesNP.mp3 +3 -0
assets/nri-SensationJazz.mp3 +3 -0
assets/nri-StreamMUWO.mp3 +3 -0
assets/nri-battlesounds.mp3 +3 -0
assets/yell-YELLAMRO20160506SM3.mp3 +3 -0
assets/yell-YELLFLBCSACR20075171.mp3 +3 -0
assets/yell-YELLWolfvCar20160111T22ms2.mp3 +3 -0
hub_logger.py +66 -0
infer.py +347 -0
pyproject.toml +22 -0
static/header.html +16 -0
static/help.html +119 -0
static/onboarding.html +13 -0
static/style.css +87 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,39 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/*.mp3 filter=lfs diff=lfs merge=lfs -text
+assets/*.m4a filter=lfs diff=lfs merge=lfs -text
+assets/*.wav filter=lfs diff=lfs merge=lfs -text
+assets/*.png filter=lfs diff=lfs merge=lfs -text

Dockerfile ADDED Viewed

	@@ -0,0 +1,33 @@

+FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04
+ENV DEBIAN_FRONTEND=noninteractive
+ENV PYTHONUNBUFFERED=1
+ENV UV_NO_DEV=1
+ENV UV_NO_CACHE=1
+env GRADIO_ANALYTICS_ENABLED="False"
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
+RUN apt-get update && apt-get install -y \
+    git \
+    git-lfs \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/* \
+    && git lfs install
+# TODO: Pin esp-research and esp-data revisions
+# TODO: remove hf-app branch once merged
+RUN --mount=type=secret,id=GH_TOKEN,mode=0444,required=true \
+    git clone -b hf-app --single-branch --depth 1 https://$(cat /run/secrets/GH_TOKEN)@github.com/earthspecies/esp-research.git /app/esp-research && \
+    git clone --single-branch --depth 1 https://$(cat /run/secrets/GH_TOKEN)@github.com/earthspecies/esp-data.git /app/esp-data
+# esp-research installs esp-data from gcloud artifact registry, which is not
+# what we want. Instead we modify esp-research to install esp-data directly from
+# the clone and then do a sync
+WORKDIR /app/esp-research
+RUN uv add /app/esp-data
+RUN uv sync --frozen
+WORKDIR /app/esp-research/projects/NatureLM-audio-hf-app
+EXPOSE 7860
+CMD ["uv", "run", "app.py"]

README.md ADDED Viewed

	@@ -0,0 +1,29 @@

+---
+title: NatureLM Audio Debug Private
+emoji: 🔈
+colorFrom: green
+colorTo: green
+sdk: docker
+sdk_version: 6.9.0
+app_port: 7860
+pinned: false
+license: apache-2.0
+short_description: Analyze your bioacoustic data with NatureLM-audio
+thumbnail: >-
+  https://cdn-uploads.huggingface.co/production/uploads/67e0630403121d657d96b0a4/VwZf6xhy8xz-AIr8rykvB.png
+---
+# NatureLM-audio Demo
+This is a demo of the NatureLM-audio model. Users can upload an audio file containing animal vocalizations and ask questions about them in a chat interface.
+## Usage
+- **First Use**: The model will load automatically when you first use it (this may take a few minutes)
+- **Subsequent Uses**: The model stays loaded for faster responses
+- **Demo Mode**: If the model fails to load, the app will run in demo mode
+## Model Loading
+The app uses lazy loading to start quickly. The model is only loaded when you first interact with it, not during app initialization. This prevents timeout issues on HuggingFace Spaces.

app.py ADDED Viewed

	@@ -0,0 +1,571 @@

+import os
+import uuid
+from pathlib import Path
+import gradio as gr
+import matplotlib.pyplot as plt
+import numpy as np
+import soundfile as sf
+import spaces
+import torch
+import torchaudio
+from esp_research.logging import logger
+from hub_logger import upload_data
+# from NatureLM.infer import Pipeline
+# from NatureLM.models.NatureLM import NatureLM
+from naturelm_audio import NatureLM  # noqa: F401
+APP_DIR = Path(__file__).resolve().parent
+STATIC_DIR = APP_DIR / "static"
+ASSETS_DIR = APP_DIR / "assets"
+SAMPLE_RATE = 16000  # Default sample rate for NatureLM-audio
+MIN_AUDIO_DURATION: float = 0.5  # seconds
+MAX_HISTORY_TURNS = 3  # Maximum number of conversation turns to include in context (user + assistant pairs)
+DEVICE: str = "cuda" if torch.cuda.is_available() else "cpu"
+# TODO: derive model version from model metadata or config instead of hardcoding
+MODEL_VERSION = "1.5"
+class _MockModel:
+    """Placeholder model that returns dummy predictions."""
+    def __call__(
+        self,
+        audios: list[str],
+        queries: list[str],
+        **kwargs: object,
+    ) -> list[list[dict]]:
+        return [[{"prediction": "(mock) I don't know yet!"}] for _ in audios]
+# TODO: replace with real model loading
+# model = NatureLM.from_pretrained("EarthSpeciesProject/NatureLM-audio")
+# model = model.eval().to(DEVICE)
+# model = Pipeline(model)
+logger.info("Device: %s", DEVICE)
+model = _MockModel()
+def validate_audio_duration(audio_path: str) -> None:
+    """Validate that the audio file meets the minimum duration requirement.
+    Parameters
+    ----------
+    audio_path : str
+        Path to the audio file.
+    Raises
+    ------
+    Error
+        If the audio duration is less than `MIN_AUDIO_DURATION`.
+    """
+    info = sf.info(audio_path)
+    duration = info.duration  # info.num_frames / info.sample_rate
+    if duration < MIN_AUDIO_DURATION:
+        raise gr.Error(f"Audio duration must be at least {MIN_AUDIO_DURATION} seconds.")
+@spaces.GPU
+def prompt_lm(
+    audios: list[str],
+    queries: list[str] | str,
+    window_length_seconds: float = 10.0,
+    hop_length_seconds: float = 10.0,
+) -> list[str]:
+    """Generate response using the model.
+    Parameters
+    ----------
+    audios : list[str]
+        List of audio file paths.
+    queries : list[str] | str
+        Query or list of queries to process.
+    window_length_seconds : float
+        Length of the window for processing audio.
+    hop_length_seconds : float
+        Hop length for processing audio.
+    Returns
+    -------
+    list[list[dict]]
+        Nested list of prediction dictionaries for each audio-query pair.
+    """
+    if model is None:
+        return "❌ Model not loaded. Please check the model configuration."
+    with torch.amp.autocast(device_type="cuda", dtype=torch.float16):
+        results: list[list[dict]] = model(
+            audios,
+            queries,
+            window_length_seconds=window_length_seconds,
+            hop_length_seconds=hop_length_seconds,
+            input_sample_rate=None,
+        )
+    return results
+def get_response(chatbot_history: list[dict], audio_input: str) -> list[dict]:
+    """Generate response from the model based on user input and audio file.
+    Parameters
+    ----------
+    chatbot_history : list[dict]
+        Current chat history with conversation context.
+    audio_input : str
+        Path to the audio file.
+    Returns
+    -------
+    list[dict]
+        Updated chat history with model response appended.
+    """
+    try:
+        # Warn if conversation is getting long
+        num_turns = len(chatbot_history)
+        if num_turns > MAX_HISTORY_TURNS * 2:  # Each turn = user + assistant message
+            gr.Warning(
+                "⚠️ Long conversations may affect response quality."
+                " Consider starting a new conversation with the Clear button."
+            )
+        # Build conversation context from history
+        conversation_context = []
+        for message in chatbot_history:
+            if message["role"] == "user":
+                conversation_context.append(f"User: {message['content']}")
+            elif message["role"] == "assistant":
+                conversation_context.append(f"Assistant: {message['content']}")
+        # Get the last user message
+        last_user_message = ""
+        for message in reversed(chatbot_history):
+            if message["role"] == "user":
+                last_user_message = message["content"]
+                break
+        # Format the full prompt with conversation history
+        if len(conversation_context) > 2:  # More than just the current query
+            # Include previous turns (limit to last MAX_HISTORY_TURNS exchanges)
+            # recent_context = conversation_context[
+            #    -(MAX_HISTORY_TURNS + 1) : -1
+            # ]  # Exclude current message
+            recent_context = conversation_context
+            full_prompt = (
+                "Previous conversation:\n" + "\n".join(recent_context) + "\n\nCurrent question: " + last_user_message
+            )
+        else:
+            full_prompt = last_user_message
+        logger.debug("Full prompt with history: %s", full_prompt)
+        response = prompt_lm(
+            audios=[audio_input],
+            queries=[full_prompt.strip()],
+            window_length_seconds=100_000,
+            hop_length_seconds=100_000,
+        )
+        # get first item
+        if isinstance(response, list) and len(response) > 0:
+            response = response[0][0]["prediction"]
+            logger.info("Model response: %s", response)
+        else:
+            response = "No response generated."
+    except Exception as e:
+        logger.exception("Error generating response: %s", e)
+        response = "Error generating response. Please try again."
+    # Add model response to chat history
+    chatbot_history.append({"role": "assistant", "content": response})
+    return chatbot_history
+def plot_spectrogram(audio: torch.Tensor) -> plt.Figure:
+    """Generate a spectrogram from the audio tensor.
+    Parameters
+    ----------
+    audio : torch.Tensor
+        Audio tensor.
+    Returns
+    -------
+    plt.Figure
+        Matplotlib figure with the spectrogram.
+    """
+    spectrogram = torchaudio.transforms.Spectrogram(n_fft=1024)(audio)
+    spectrogram = spectrogram.numpy()[0].squeeze()
+    fig, ax = plt.subplots(figsize=(13, 5))
+    ax.imshow(np.log(spectrogram + 1e-4), aspect="auto", origin="lower", cmap="viridis")
+    ax.set_title("Spectrogram")
+    # Set x ticks to reflect 0 to audio duration seconds
+    if audio.dim() > 1:
+        duration = audio.size(1) / SAMPLE_RATE
+    else:
+        duration = audio.size(0) / SAMPLE_RATE
+    ax.set_xlabel("Time")
+    ax.set_xticks([0, spectrogram.shape[1]])
+    ax.set_xticklabels(["0s", f"{duration:.2f}s"])
+    ax.set_ylabel("Frequency")
+    ax.set_yticks(
+        [
+            0,
+            spectrogram.shape[0] // 4,
+            spectrogram.shape[0] // 2,
+            3 * spectrogram.shape[0] // 4,
+            spectrogram.shape[0] - 1,
+        ]
+    )
+    # Set y ticks to reflect 0 to nyquist frequency (sample_rate/2)
+    nyquist_freq = SAMPLE_RATE / 2
+    ax.set_yticklabels(
+        [
+            "0 Hz",
+            f"{nyquist_freq / 4:.0f} Hz",
+            f"{nyquist_freq / 2:.0f} Hz",
+            f"{3 * nyquist_freq / 4:.0f} Hz",
+            f"{nyquist_freq:.0f} Hz",
+        ]
+    )
+    fig.tight_layout()
+    return fig
+def make_spectrogram_figure(audio_input: str) -> plt.Figure:
+    audio = torch.zeros(1, SAMPLE_RATE)
+    if audio_input:
+        try:
+            audio, _ = torchaudio.load(audio_input)
+        except Exception:
+            logger.exception("Error loading audio file %s", audio_input)
+    return plot_spectrogram(audio)
+def add_user_query(chatbot_history: list[dict], chat_input: str) -> list[dict]:
+    """Add user message to chat history.
+    Parameters
+    ----------
+    chatbot_history : list[dict]
+        Current chat history.
+    chat_input : str
+        User's input text.
+    Returns
+    -------
+    list[dict]
+        Updated chat history with the user message appended.
+    """
+    # Validate input
+    if not chat_input.strip():
+        return chatbot_history
+    chatbot_history.append({"role": "user", "content": chat_input.strip()})
+    return chatbot_history
+def log_to_hub(chatbot_history: list[dict], audio: str, session_id: str) -> None:
+    """Upload data to hub."""
+    if not chatbot_history or len(chatbot_history) < 2:
+        return
+    user_text = chatbot_history[-2]["content"]
+    model_response = chatbot_history[-1]["content"]
+    upload_data(audio, user_text, model_response, session_id, model_version=MODEL_VERSION)
+def main() -> gr.Blocks:
+    # Create placeholder audio files if they don't exist
+    laz_audio = ASSETS_DIR / "Lazuli_Bunting_yell-YELLLAZB20160625SM303143.mp3"
+    frog_audio = ASSETS_DIR / "nri-GreenTreeFrogEvergladesNP.mp3"
+    robin_audio = ASSETS_DIR / "yell-YELLAMRO20160506SM3.mp3"
+    whale_audio = ASSETS_DIR / "Humpback Whale - Megaptera novaeangliae.wav"
+    crow_audio = ASSETS_DIR / "American Crow - Corvus brachyrhynchos.mp3"
+    examples = {
+        "Identifying Focal Species (Lazuli Bunting)": [
+            str(laz_audio),
+            "What is the common name for the focal species in the audio?",
+        ],
+        "Caption the audio (Green Tree Frog)": [
+            str(frog_audio),
+            "Caption the audio, using the common name for any animal species.",
+        ],
+        "Caption the audio (American Robin)": [
+            str(robin_audio),
+            "Caption the audio, using the scientific name for any animal species.",
+        ],
+        "Identifying Focal Species (Megaptera novaeangliae)": [
+            str(whale_audio),
+            "What is the scientific name for the focal species in the audio?",
+        ],
+        "Speaker Count (American Crow)": [
+            str(crow_audio),
+            "How many individuals are vocalizing in this audio?",
+        ],
+        "Caption the audio (Humpback Whale)": [str(whale_audio), "Caption the audio."],
+    }
+    gr.set_static_paths(paths=[ASSETS_DIR])
+    theme = gr.themes.Base(primary_hue="blue", font=[gr.themes.GoogleFont("Noto Sans")])
+    with gr.Blocks(
+        title="NatureLM-audio",
+    ) as app:
+        with gr.Row():
+            gr.HTML((STATIC_DIR / "header.html").read_text())
+        with gr.Tabs():
+            with gr.Tab("Analyze Audio"):
+                session_id = gr.State(str(uuid.uuid4()))
+                # uploaded_audio = gr.State()
+                # Status indicator
+                # status_text = gr.Textbox(
+                #     value=model_manager.get_status(),
+                #     label="Model Status",
+                #     interactive=False,
+                #     visible=True,
+                # )
+                with gr.Column(visible=True) as onboarding_message:
+                    gr.HTML(
+                        (STATIC_DIR / "onboarding.html").read_text(),
+                        padding=False,
+                    )
+                with gr.Column(visible=True) as upload_section:
+                    audio_input = gr.Audio(
+                        container=True,
+                        interactive=True,
+                        sources=["upload"],
+                    )
+                    # check that audio duration is greater than MIN_AUDIO_DURATION
+                    # raise
+                    audio_input.change(
+                        fn=validate_audio_duration,
+                        inputs=[audio_input],
+                        outputs=[],
+                    )
+                with gr.Accordion(label="Toggle Spectrogram", open=False, visible=False) as spectrogram:
+                    plotter = gr.Plot(
+                        plot_spectrogram(torch.zeros(1, SAMPLE_RATE)),
+                        label="Spectrogram",
+                        visible=False,
+                        elem_id="spectrogram-plot",
+                    )
+                with gr.Column(visible=False) as tasks:
+                    task_dropdown = gr.Dropdown(
+                        [
+                            "What are the common names for the species in the audio, if any?",
+                            "Caption the audio, using the scientific name for any animal species.",
+                            "Caption the audio, using the common name for any animal species.",
+                            "What is the scientific name for the focal species in the audio?",
+                            "What is the common name for the focal species in the audio?",
+                            "What is the family of the focal species in the audio?",
+                            "What is the genus of the focal species in the audio?",
+                            "What is the taxonomic name of the focal species in the audio?",
+                            "What call types are heard from the focal species in the audio?",
+                            "What is the life stage of the focal species in the audio?",
+                        ],
+                        label="Pre-Loaded Tasks",
+                        info="Select a task, or write your own prompt below.",
+                        allow_custom_value=False,
+                        value=None,
+                    )
+                with gr.Group(visible=False) as chat:
+                    chatbot = gr.Chatbot(
+                        elem_id="chatbot",
+                        height=250,
+                        label="Chat",
+                        render_markdown=False,
+                        group_consecutive_messages=False,
+                        feedback_options=[
+                            "like",
+                            "dislike",
+                            "wrong species",
+                            "incorrect response",
+                            "other",
+                        ],
+                        resizable=True,
+                    )
+                    with gr.Column():
+                        chat_input = gr.Textbox(
+                            placeholder="Type your message and press Enter to send",
+                            lines=1,
+                            show_label=False,
+                            submit_btn="Send",
+                            container=True,
+                            autofocus=False,
+                            elem_id="chat-input",
+                        )
+                with gr.Column():
+                    gr.Examples(
+                        list(examples.values()),
+                        [audio_input, chat_input],
+                        [audio_input, chat_input],
+                        example_labels=list(examples.keys()),
+                        examples_per_page=20,
+                    )
+                def validate_and_submit(chatbot_history: list[dict], chat_input: str) -> tuple[list[dict], str]:
+                    if not chat_input or not chat_input.strip():
+                        gr.Warning("Please enter a question or message before sending.")
+                        return chatbot_history, chat_input
+                    updated_history = add_user_query(chatbot_history, chat_input)
+                    return updated_history, ""
+                clear_button = gr.ClearButton(
+                    components=[chatbot, chat_input, audio_input, plotter],
+                    visible=False,
+                )
+                # if task_dropdown is selected, set chat_input to that value
+                def set_query(task: str | None) -> dict:
+                    if task:
+                        return gr.update(value=task)
+                    return gr.update(value="")
+                task_dropdown.select(
+                    fn=set_query,
+                    inputs=[task_dropdown],
+                    outputs=[chat_input],
+                )
+                def start_chat_interface(audio_path: str) -> tuple:
+                    return (
+                        gr.update(visible=False),  # hide onboarding message
+                        gr.update(visible=True),  # show upload section
+                        gr.update(visible=True),  # show spectrogram
+                        gr.update(visible=True),  # show tasks
+                        gr.update(visible=True),  # show chat box
+                        gr.update(visible=True),  # show plotter
+                    )
+                # When audio added, set spectrogram
+                audio_input.change(
+                    fn=start_chat_interface,
+                    inputs=[audio_input],
+                    outputs=[
+                        onboarding_message,
+                        upload_section,
+                        spectrogram,
+                        tasks,
+                        chat,
+                        plotter,
+                    ],
+                ).then(
+                    fn=make_spectrogram_figure,
+                    inputs=[audio_input],
+                    outputs=[plotter],
+                )
+                # When submit clicked first:
+                # 1. Validate and add user query to chat history
+                # 2. Get response from model
+                # 3. Clear the chat input box
+                # 4. Show clear button
+                chat_input.submit(
+                    validate_and_submit,
+                    inputs=[chatbot, chat_input],
+                    outputs=[chatbot, chat_input],
+                ).then(
+                    get_response,
+                    inputs=[chatbot, audio_input],
+                    outputs=[chatbot],
+                ).then(
+                    lambda: gr.update(visible=True),  # Show clear button
+                    None,
+                    [clear_button],
+                ).then(
+                    log_to_hub,
+                    [chatbot, audio_input, session_id],
+                    None,
+                )
+                clear_button.click(lambda: gr.ClearButton(visible=False), None, [clear_button])
+            with gr.Tab("Sample Library"):
+                with gr.Row():
+                    with gr.Column():
+                        gr.Markdown("### Download Sample Audio")
+                        gr.Markdown(
+                            "Feel free to explore these sample audio files."
+                            " To download, click the button in the"
+                            " top-right corner of each audio file."
+                            " You can also find a large collection of"
+                            " publicly available animal sounds on"
+                            " [Xenocanto](https://xeno-canto.org/explore/taxonomy)"
+                            " and [Watkins Marine Mammal Sound Database]"
+                            "(https://whoicf2.whoi.edu/science/B/whalesounds/index.cfm)."
+                        )
+                        samples = [
+                            (
+                                str(ASSETS_DIR / "Lazuli_Bunting_yell-YELLLAZB20160625SM303143.m4a"),
+                                "Lazuli Bunting",
+                            ),
+                            (
+                                str(ASSETS_DIR / "nri-GreenTreeFrogEvergladesNP.mp3"),
+                                "Green Tree Frog",
+                            ),
+                            (
+                                str(ASSETS_DIR / "American Crow - Corvus brachyrhynchos.mp3"),
+                                "American Crow",
+                            ),
+                            (
+                                str(ASSETS_DIR / "Gray Wolf - Canis lupus italicus.m4a"),
+                                "Gray Wolf",
+                            ),
+                            (
+                                str(ASSETS_DIR / "Humpback Whale - Megaptera novaeangliae.wav"),
+                                "Humpback Whale",
+                            ),
+                            (str(ASSETS_DIR / "Walrus - Odobenus rosmarus.wav"), "Walrus"),
+                        ]
+                        for row_i in range(0, len(samples), 3):
+                            with gr.Row():
+                                for filepath, label in samples[row_i : row_i + 3]:
+                                    with gr.Column():
+                                        gr.Audio(
+                                            filepath,
+                                            label=label,
+                                        )
+            with gr.Tab("💡 Help"):
+                gr.HTML((STATIC_DIR / "help.html").read_text())
+            app.css = (STATIC_DIR / "style.css").read_text()
+    return app, theme
+# Create and launch the app
+if __name__ == "__main__":
+    app, theme = main()
+    # Docker-based HF Spaces require root_path so Gradio generates correct
+    # URLs behind the reverse proxy (the Gradio SDK sets this automatically).
+    root_path = os.environ.get("GRADIO_ROOT_PATH", "")
+    app.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        theme=theme,
+        root_path=root_path,
+        allowed_paths=[str(ASSETS_DIR)],
+    )

assets/484366__spacejoe__bird-3.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d21ce7228fd9fc3277b89fad5d54ff039d45da93c426459212fccbba776a75e
+size 272820

assets/American Crow - Corvus brachyrhynchos.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d0f76bff28d3e3021be495754b28ef3924bc32ff0c657b67bd4ee6bb177a1f8e
+size 2164626

assets/ESP_logo_white.png ADDED Viewed

Git LFS Details

SHA256: 08477bf0160a9b9eedaed4e2898b0a708256bb6104e84e57d28c65a37c27a63d
Pointer size: 131 Bytes
Size of remote file: 150 kB

assets/Eastern Gray Squirrel - Sciurus carolinensis.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:65e0d72b3979b371e45852af73037c009c93a30c7d8ea64ab18616f1947d4101
+size 1447652

assets/Gray Wolf - Canis lupus italicus.m4a ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a1bc16a34573163e262741561278fe610144235ca95c9c4a6172b2b41feb5f52
+size 125428

assets/Humpback Whale - Megaptera novaeangliae.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d9afb0de912a926ebac971c9ca6923fa03fd64cd029b04a195b69d79c0b7dc7
+size 272560

assets/Lazuli_Bunting_yell-YELLLAZB20160625SM303143.m4a ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:822d5b59ef6f6a6f1e84465da23e926a7ce0393ac7f0bdcf81cbabe1c52c1112
+size 333009

assets/Lazuli_Bunting_yell-YELLLAZB20160625SM303143.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6a67960286021e58ffab2d3e4b67b7e20d08b530018c64c6afefe4aae5ff28be
+size 316920

assets/Sample_Audio_Files_NatureLM_audio.zip ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4655b9f1354485fd71480c6030da53b334f8552bf9a9afff4b9320192eb7a7a
+size 2002662

assets/Walrus - Odobenus rosmarus.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:14926a9ee914ba512009e116f1bbb4424cdcae52fafe52068d197e056f04c567
+size 305710

assets/esp_favicon.png ADDED Viewed

Git LFS Details

SHA256: c584444dc70faaa19d002aeb7104cf13ef9c226f910a27a542774499256f3810
Pointer size: 129 Bytes
Size of remote file: 3.57 kB

assets/esp_logo.png ADDED Viewed

assets/naturelm-audio-overiew.png ADDED Viewed

Git LFS Details

SHA256: 0f2d1d4d68e34caf630f1a11859ab3a7d370ea8a64829ea906c8c7aa274a56c0
Pointer size: 131 Bytes
Size of remote file: 286 kB

assets/nri-GreenTreeFrogEvergladesNP.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3004b02bd1793db81f5e6ddfe2f805dbd587af3c0d03edbedec2ad23e92660dd
+size 162234

assets/nri-SensationJazz.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4dfada309221e16f7c38b569b6c46e78ecd181b4d3bc7a7114bb2384e24b797f
+size 134772

assets/nri-StreamMUWO.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d55ce7e299b7d2d9ab50aee2d28233f05662886cbb57e792aa210d39dd73744
+size 63536

assets/nri-battlesounds.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:86d491e1b41cfb9f75ce1a51aea3e06b558aef91fb9a88991de0d89cdffd72ae
+size 87838

assets/yell-YELLAMRO20160506SM3.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a2700bbe2233505ccf592e9e06a4b196a0feb4d2d7a4773ed5f2f110696a001
+size 598352

assets/yell-YELLFLBCSACR20075171.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:23371e93ed2dc6c43cfe8ada4125a2f15bcff19946e9efe969c8ca03caa60df8
+size 390212

assets/yell-YELLWolfvCar20160111T22ms2.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1520267dbb85294fbca670c02404d7e64248cec02b29167def519f2e35194a0d
+size 638311

hub_logger.py ADDED Viewed

	@@ -0,0 +1,66 @@

+import json
+import os
+import uuid
+from pathlib import Path
+from huggingface_hub import HfApi, HfFileSystem
+DATASET_REPO = "EarthSpeciesProject/naturelm-audio-space-logs"
+SPLIT = "test"
+TESTING = os.getenv("TESTING", "0") == "1"
+api = HfApi(token=os.getenv("HF_TOKEN", None))
+# Upload audio
+# check if file exists
+hf_fs = HfFileSystem(token=os.getenv("HF_TOKEN", None))
+def upload_data(
+    audio: str | Path,
+    user_text: str,
+    model_response: str,
+    session_id: str = "",
+    model_version: str = "",
+) -> None:
+    data_id = str(uuid.uuid4())
+    if TESTING:
+        data_id = "test-" + data_id
+        session_id = "test-" + session_id
+    # Audio path in repo
+    suffix = Path(audio).suffix
+    audio_p = f"{SPLIT}/audio/" + session_id + suffix
+    if not hf_fs.exists(f"datasets/{DATASET_REPO}/{audio_p}"):
+        api.upload_file(
+            path_or_fileobj=str(audio),
+            path_in_repo=audio_p,
+            repo_id=DATASET_REPO,
+            repo_type="dataset",
+        )
+    text = {
+        "user_message": user_text,
+        "model_response": model_response,
+        "file_name": "audio/" + session_id + suffix,  # has to be relative to metadata.jsonl
+        "original_fn": os.path.basename(audio),
+        "id": data_id,
+        "session_id": session_id,
+        "model_version": model_version,
+    }
+    # Append to a jsonl file in the repo
+    # APPEND DOESN'T WORK, have to open first
+    if hf_fs.exists(f"datasets/{DATASET_REPO}/{SPLIT}/metadata.jsonl"):
+        with hf_fs.open(f"datasets/{DATASET_REPO}/{SPLIT}/metadata.jsonl", "r") as f:
+            lines = f.readlines()
+        lines.append(json.dumps(text) + "\n")
+        with hf_fs.open(f"datasets/{DATASET_REPO}/{SPLIT}/metadata.jsonl", "w") as f:
+            f.writelines(lines)
+    else:
+        with hf_fs.open(f"datasets/{DATASET_REPO}/{SPLIT}/metadata.jsonl", "w") as f:
+            f.write(json.dumps(text) + "\n")
+    # Write a separate file instead
+    # with hf_fs.open(f"datasets/{DATASET_REPO}/{data_id}.json", "w") as f:
+    #     json.dump(text, f)

infer.py ADDED Viewed

	@@ -0,0 +1,347 @@

+# """Run NatureLM-audio over a set of audio files paths or a directory with audio files."""
+# import argparse
+# from pathlib import Path
+# import librosa
+# import numpy as np
+# import pandas as pd
+# import torch
+# from NatureLM.config import Config
+# from NatureLM.models import NatureLM
+# from NatureLM.processors import NatureLMAudioProcessor
+# from NatureLM.utils import move_to_device
+# _MAX_LENGTH_SECONDS = 10
+# _MIN_CHUNK_LENGTH_SECONDS = 0.5
+# _SAMPLE_RATE = 16000  # Assuming the model uses a sample rate of 16kHz
+# _AUDIO_FILE_EXTENSIONS = [".wav", ".mp3", ".flac", ".ogg", ".mp4"]  # Add other audio file formats as needed
+# _DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+# __root_dir = Path(__file__).parent.parent
+# _DEFAULT_CONFIG_PATH = __root_dir / "configs" / "inference.yml"
+# def load_model_and_config(
+#     cfg_path: str | Path = _DEFAULT_CONFIG_PATH, device: str = _DEVICE
+# ) -> tuple[NatureLM, Config]:
+#     """Load the NatureLM model and configuration.
+#     Returns:
+#         tuple: The loaded model and configuration.
+#     """
+#     model = NatureLM.from_pretrained("EarthSpeciesProject/NatureLM-audio")
+#     model = model.to(device).eval()
+#     model.llama_tokenizer.pad_token_id = model.llama_tokenizer.eos_token_id
+#     model.llama_model.generation_config.pad_token_id = model.llama_tokenizer.pad_token_id
+#     cfg = Config.from_sources(cfg_path)
+#     return model, cfg
+# def output_template(model_output: str, start_time: float, end_time: float) -> str:
+#     """Format the output of the model.
+#     Returns
+#     -------
+#     str
+#         Formatted string with timestamps and model output.
+#     """
+#     return f"#{start_time:.2f}s - {end_time:.2f}s#: {model_output}\n"
+# def sliding_window_inference(
+#     audio: str | Path | np.ndarray,
+#     query: str,
+#     processor: NatureLMAudioProcessor,
+#     model: NatureLM,
+#     cfg: Config,
+#     window_length_seconds: float = 10.0,
+#     hop_length_seconds: float = 10.0,
+#     input_sr: int = _SAMPLE_RATE,
+#     device: str = _DEVICE,
+# ) -> list[dict[str, any]]:
+#     """Run inference on a long audio file using sliding window approach.
+#     Args:
+#         audio (str | Path | np.ndarray): Path to the audio file.
+#         query (str): Query for the model.
+#         processor (NatureLMAudioProcessor): Audio processor.
+#         model (NatureLM): NatureLM model.
+#         cfg (Config): Model configuration.
+#         window_length_seconds (float): Length of the sliding window in seconds.
+#         hop_length_seconds (float): Hop length for the sliding window in seconds.
+#         input_sr (int): Sample rate of the audio file.
+#     Returns:
+#         str: The output of the model.
+#     Raises:
+#         ValueError: If the audio file is too short or if the audio file path is invalid.
+#     """
+#     if isinstance(audio, str) or isinstance(audio, Path):
+#         audio_array, input_sr = librosa.load(str(audio), sr=None, mono=False)
+#     elif isinstance(audio, np.ndarray):
+#         audio_array = audio
+#         print(f"Using provided sample rate: {input_sr}")
+#     audio_array = audio_array.squeeze()
+#     if audio_array.ndim > 1:
+#         axis_to_average = int(np.argmin(audio_array.shape))
+#         audio_array = audio_array.mean(axis=axis_to_average)
+#         audio_array = audio_array.squeeze()
+#     # Do initial check that the audio is long enough
+#     if audio_array.shape[-1] < int(_MIN_CHUNK_LENGTH_SECONDS * input_sr):
+#         raise ValueError(f"Audio is too short. Minimum length is {_MIN_CHUNK_LENGTH_SECONDS} seconds.")
+#     start = 0
+#     stride = int(hop_length_seconds * input_sr)
+#     window_length = int(window_length_seconds * input_sr)
+#     window_id = 0
+#     output = []  # Initialize output list
+#     while True:
+#         chunk = audio_array[start : start + window_length]
+#         if chunk.shape[-1] < int(_MIN_CHUNK_LENGTH_SECONDS * input_sr):
+#             break
+#         # Resamples, pads, truncates and creates torch Tensor
+#         audio_tensor, prompt_list = processor([chunk], [query], [input_sr])
+#         input_to_model = {
+#             "raw_wav": audio_tensor,
+#             "prompt": prompt_list[0],
+#             "audio_chunk_sizes": 1,
+#             "padding_mask": torch.zeros_like(audio_tensor).to(torch.bool),
+#         }
+#         input_to_model = move_to_device(input_to_model, device)
+#         # generate
+#         prediction: str = model.generate(input_to_model, cfg.generate, prompt_list)[0]
+#         # Post-process the prediction
+#         # prediction = output_template(prediction, start / input_sr, (start + window_length) / input_sr)
+#         # output += prediction
+#         output.append(
+#             {
+#                 "start_time": start / input_sr,
+#                 "end_time": (start + window_length) / input_sr,
+#                 "prediction": prediction,
+#                 "window_number": window_id,
+#             }
+#         )
+#         # Move the window
+#         start += stride
+#         if start + window_length > audio_array.shape[-1]:
+#             break
+#     return output
+# class Pipeline:
+#     """Pipeline for running NatureLM-audio inference on a list of audio files or audio arrays"""
+#     def __init__(self, model: NatureLM = None, cfg_path: str | Path = _DEFAULT_CONFIG_PATH) -> None:
+#         self.cfg_path = cfg_path
+#         # Load model and config
+#         if model is not None:
+#             self.cfg = Config.from_sources(cfg_path)
+#             self.model = model
+#         else:
+#             # Download model from hub
+#             self.model, self.cfg = load_model_and_config(cfg_path)
+#         self.processor = NatureLMAudioProcessor(sample_rate=_SAMPLE_RATE, max_length_seconds=_MAX_LENGTH_SECONDS)
+#     def __call__(
+#         self,
+#         audios: list[str | Path | np.ndarray],
+#         queries: str | list[str],
+#         window_length_seconds: float = 10.0,
+#         hop_length_seconds: float = 10.0,
+#         input_sample_rate: int = _SAMPLE_RATE,
+#         verbose: bool = False,
+#     ) -> list[str]:
+#         """Run inference on a list of audio file paths or a single audio file with a
+#         single query or a list of queries. If multiple queries are provided,
+#         we assume that they are in the same order as the audio files. If a single query
+#         is provided, it will be used for all audio files.
+#         Args:
+#             audios (list[str | Path | np.ndarray]): List of audio file paths or a single audio
+#                 file path or audio array(s)
+#             queries (str | list[str]): Queries for the model.
+#             window_length_seconds (float): Length of the sliding window in seconds. Defaults to 10.0.
+#             hop_length_seconds (float): Hop length for the sliding window in seconds. Defaults to 10.0.
+#             input_sample_rate (int): Sample rate of the audio. Defaults to 16000, which is the model's sample rate.
+#             verbose (bool): If True, print the output of the model for each audio file.
+#             Defaults to False.
+#         Returns:
+#             list[list[dict]]: List of model outputs for each audio file. Each output is a list of dictionaries
+#             containing the start time, end time, and prediction for each chunk of audio.
+#         Raises:
+#             ValueError: If the number of audio files and queries do not match.
+#         """
+#         if isinstance(audios, str) or isinstance(audios, Path):
+#             audios = [audios]
+#         if isinstance(queries, str):
+#             queries = [queries] * len(audios)
+#         if len(audios) != len(queries):
+#             raise ValueError("Number of audio files and queries must match.")
+#         # Run inference
+#         results = []
+#         for audio, query in zip(audios, queries, strict=False):
+#             output = sliding_window_inference(
+#                 audio,
+#                 query,
+#                 self.processor,
+#                 self.model,
+#                 self.cfg,
+#                 window_length_seconds,
+#                 hop_length_seconds,
+#                 input_sr=input_sample_rate,
+#             )
+#             results.append(output)
+#             if verbose:
+#                 print(f"Processed {audio}, model output:\n=======\n{output}\n=======")
+#         return results
+# def parse_args() -> argparse.Namespace:
+#     parser = argparse.ArgumentParser("Run NatureLM-audio inference")
+#     parser.add_argument(
+#         "-a",
+#         "--audio",
+#         type=str,
+#         required=True,
+#         help="Path to an audio file or a directory containing audio files",
+#     )
+#     parser.add_argument("-q", "--query", type=str, required=True, help="Query for the model")
+#     parser.add_argument(
+#         "--cfg-path",
+#         type=str,
+#         default="configs/inference.yml",
+#         help="Path to the configuration file for the model",
+#     )
+#     parser.add_argument(
+#         "--output_path",
+#         type=str,
+#         default="inference_output.jsonl",
+#         help="Output path for the results",
+#     )
+#     parser.add_argument(
+#         "--window_length_seconds",
+#         type=float,
+#         default=10.0,
+#         help="Length of the sliding window in seconds",
+#     )
+#     parser.add_argument(
+#         "--hop_length_seconds",
+#         type=float,
+#         default=10.0,
+#         help="Hop length for the sliding window in seconds",
+#     )
+#     args = parser.parse_args()
+#     return args
+# def main(
+#     cfg_path: str | Path,
+#     audio_path: str | Path,
+#     query: str,
+#     output_path: str,
+#     window_length_seconds: float,
+#     hop_length_seconds: float,
+# ) -> None:
+#     """Main function to run the NatureLM-audio inference script.
+#     It takes command line arguments for audio file path, query, output path,
+#     window length, and hop length. It processes the audio files and saves the
+#     results to a CSV file.
+#     Args:
+#         cfg_path (str | Path): Path to the configuration file.
+#         audio_path (str | Path): Path to the audio file or directory.
+#         query (str): Query for the model.
+#         output_path (str): Path to save the output results.
+#         window_length_seconds (float): Length of the sliding window in seconds.
+#         hop_length_seconds (float): Hop length for the sliding window in seconds.
+#     Raises:
+#         ValueError: If the audio file path is invalid or if the query is empty.
+#         ValueError: If no audio files are found.
+#         ValueError: If the audio file extension is not supported.
+#     """
+#     # Prepare sample
+#     audio_path = Path(audio_path)
+#     if audio_path.is_dir():
+#         audio_paths = []
+#         print(f"Searching for audio files in {str(audio_path)} with extensions {', '.join(_AUDIO_FILE_EXTENSIONS)}")
+#         for ext in _AUDIO_FILE_EXTENSIONS:
+#             audio_paths.extend(list(audio_path.rglob(f"*{ext}")))
+#         print(f"Found {len(audio_paths)} audio files in {str(audio_path)}")
+#     else:
+#         # check that the extension is valid
+#         if not any(audio_path.suffix == ext for ext in _AUDIO_FILE_EXTENSIONS):
+#             raise ValueError(
+#                 f"Invalid audio file extension. Supported extensions are: {', '.join(_AUDIO_FILE_EXTENSIONS)}"
+#             )
+#         audio_paths = [audio_path]
+#     # check that query is not empty
+#     if not query:
+#         raise ValueError("Query cannot be empty")
+#     if not audio_paths:
+#         raise ValueError("No audio files found. Please check the path or file extensions.")
+#     # Load model and config
+#     model, cfg = load_model_and_config(cfg_path)
+#     # Load audio processor
+#     processor = NatureLMAudioProcessor(sample_rate=_SAMPLE_RATE, max_length_seconds=_MAX_LENGTH_SECONDS)
+#     # Run inference
+#     results = {"audio_path": [], "output": []}
+#     for path in audio_paths:
+#         output = sliding_window_inference(
+#             path,
+#             query,
+#             processor,
+#             model,
+#             cfg,
+#             window_length_seconds,
+#             hop_length_seconds,
+#         )
+#         results["audio_path"].append(str(path))
+#         results["output"].append(output)
+#         print(f"Processed {path}, model output:\n=======\n{output}\n=======\n")
+#     # Save results as a csv
+#     output_path = Path(output_path)
+#     output_path.parent.mkdir(parents=True, exist_ok=True)
+#     df = pd.DataFrame(results)
+#     df.to_json(output_path, orient="records", lines=True)
+#     print(f"Results saved to {output_path}")
+# if __name__ == "__main__":
+#     args = parse_args()
+#     main(
+#         cfg_path=args.cfg_path,
+#         audio_path=args.audio,
+#         query=args.query,
+#         output_path=args.output_path,
+#         window_length_seconds=args.window_length_seconds,
+#         hop_length_seconds=args.hop_length_seconds,
+#     )

pyproject.toml ADDED Viewed

	@@ -0,0 +1,22 @@

+[project]
+name = "naturelm-audio-hf-app"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.12"
+dependencies = [
+    "esp-research",
+    "naturelm-audio",
+    "gradio>=6.9.0",
+    "spaces>=0.47.0",
+    "huggingface-hub>=1.5.0",
+    "soundfile>=0.13.1",
+    "torch>=2.7.1",
+    "torchaudio>=2.7.1",
+    "matplotlib>=3.10.8",
+    "numpy>=2.3.5",
+]
+[tool.uv.sources]
+esp-research = { workspace = true }
+naturelm-audio = { workspace = true } #TODO

static/header.html ADDED Viewed

	@@ -0,0 +1,16 @@

+<div style="display: flex; align-items: center; gap: 12px;">
+    <picture>
+        <source srcset="/gradio_api/file=assets/ESP_logo_white.png"
+                media="(prefers-color-scheme: dark)">
+        <source srcset="/gradio_api/file=assets/esp_logo.png"
+                media="(prefers-color-scheme: light)">
+        <img src="/gradio_api/file=assets/esp_logo.png"
+             alt="ESP Logo"
+             style="height: 40px; width: auto;">
+    </picture>
+    <h2 style="margin: 0;">NatureLM-audio<span style="
+        font-size: 0.55em; color: #28a745; background: #e6f4ea;
+        padding: 2px 6px; border-radius: 4px; margin-left: 8px;
+        display: inline-block; vertical-align: top;"
+    >BETA</span></h2>
+</div>

static/help.html ADDED Viewed

	@@ -0,0 +1,119 @@

+<div class="banner">
+  <div style="display: flex; padding: 0px; align-items: center; flex: 1;">
+    <div style="font-size: 20px; margin-right: 12px;"></div>
+    <div style="flex: 1;">
+      <div class="banner-header">Help us improve the model!</div>
+      <div class="banner-text">
+        Found an issue or have suggestions?
+        Join us on Discourse to share feedback and questions.
+      </div>
+    </div>
+  </div>
+  <a href="https://earthspeciesproject.discourse.group/t/feedback-for-naturelm-audio-ui-hugging-face-spaces-demo/17"
+     target="_blank" class="link-btn">Share Feedback</a>
+</div>
+<div class="guide-section">
+  <h3>Getting Started</h3>
+  <ol style="margin-top: 12px; padding-left: 20px;
+             color: #6b7280; font-size: 14px; line-height: 1.6;">
+    <li style="margin-bottom: 8px;">
+      <strong>Upload your audio</strong> or click on a pre-loaded example.
+      Drag and drop your audio file containing animal vocalizations,
+      or click on an example.
+    </li>
+    <li style="margin-bottom: 8px;">
+      <strong>Trim your audio (if needed)</strong> by clicking the scissors
+      icon on the bottom right of the audio panel. Try to keep your audio
+      to 10 seconds or less.
+    </li>
+    <li style="margin-bottom: 8px;">
+      <strong>View the Spectrogram (optional)</strong>. You can easily
+      view/hide the spectrogram of your audio for closer analysis.
+    </li>
+    <li style="margin-bottom: 8px;">
+      <strong>Select a task or write your own</strong>. Select an option
+      from pre-loaded tasks. This will auto-fill the text box with a prompt,
+      so all you have to do is hit Send. Or, type a custom prompt directly
+      into the chat.
+    </li>
+    <li style="margin-bottom: 0;">
+      <strong>Send and Analyze Audio</strong>. Press "Send" or type Enter
+      to begin processing your audio. Ask follow-up questions or press
+      "Clear" to start a new conversation.
+    </li>
+  </ol>
+</div>
+<div class="guide-section">
+  <h3>Tips</h3>
+  <b>Prompting Best Practices</b>
+  <ul style="margin-top: 12px; padding-left: 20px;
+             color: #6b7280; font-size: 14px; line-height: 1.6;">
+    <li>
+      When possible, use scientific or taxonomic names and mention
+      the context if known (geographic area/location, time of day
+      or year, habitat type)
+    </li>
+    <li>Ask one question at a time, and be specific about what
+        you want to know</li>
+    <ul>&#10060; Don't ask:
+      <i>"Analyze this audio and tell me all you know about it."</i>
+    </ul>
+    <ul>&#9989; Do ask:
+      <i>"What species made this sound?"</i>
+    </ul>
+    <li>Keep prompts more open-ended and avoid asking Yes/No
+        or very targeted questions</li>
+    <ul>&#10060; Don't ask:
+      <i>"Is there a bottlenose dolphin vocalizing in the audio?
+      Yes or No."</i>
+    </ul>
+    <ul>&#9989; Do ask:
+      <i>"What focal species, if any, are heard in the audio?"</i>
+    </ul>
+    <li>Giving the model options to choose works well for broader
+        categories (less so for specific species)</li>
+    <ul>&#10060; Don't ask:
+      <i>"Classify the audio into one of the following species:
+      Bottlenose Dolphin, Orca, Great Gray Owl"</i>
+    </ul>
+    <ul>&#9989; Do ask:
+      <i>"Classify the audio into one of the following categories:
+      Cetaceans, Aves, or None."</i>
+    </ul>
+  </ul>
+  <br>
+  <b>Audio Files</b>
+  <ul style="margin-top: 12px; padding-left: 20px;
+             color: #6b7280; font-size: 14px; line-height: 1.6;">
+    <li>Supported formats: .wav, .mp3, .aac, .flac, .ogg, .webm,
+        .midi, .aiff, .wma, .opus, .amr</li>
+    <li>If you are uploading an .mp4, please check that it is not
+        an MPEG-4 Movie file.</li>
+    <li>For best results, use high-quality recordings with minimal
+        background noise.</li>
+  </ul>
+</div>
+<div class="guide-section">
+  <h3>Learn More</h3>
+  <ul style="margin-top: 12px; padding-left: 20px;
+             color: #6b7280; font-size: 14px; line-height: 1.6;">
+    <li>Read our
+      <a href="https://huggingface.co/blog/EarthSpeciesProject/nature-lm-audio-ui-demo/"
+         target="_blank">recent blog post</a>
+      with a step-by-step tutorial</li>
+    <li>Check out the
+      <a href="https://arxiv.org/abs/2411.07186"
+         target="_blank">published paper</a>
+      for a deeper technical dive on NatureLM-audio.</li>
+    <li>Visit the
+      <a href="https://earthspecies.github.io/naturelm-audio-demo/"
+         target="_blank">NatureLM-audio Demo Page</a>
+      for additional context, a demo video, and more examples
+      of the model in action.</li>
+    <li>Sign up for our
+      <a href="https://forms.gle/WjrbmFhKkzmEgwvY7"
+         target="_blank">closed beta waitlist</a>,
+      if you're interested in testing upcoming features like
+      longer audio files and batch processing.</li>
+  </ul>
+</div>

static/onboarding.html ADDED Viewed

	@@ -0,0 +1,13 @@

+<div class="banner">
+    <div style="display: flex; padding: 0px; align-items: center; flex: 1;">
+        <div style="font-size: 20px; margin-right: 12px;">&#128075;</div>
+        <div style="flex: 1;">
+            <div class="banner-header">Welcome to NatureLM-audio!</div>
+            <div class="banner-text">
+                Upload your first audio file below or select a pre-loaded example below.
+            </div>
+        </div>
+    </div>
+    <a href="https://huggingface.co/blog/EarthSpeciesProject/nature-lm-audio-ui-demo/"
+       target="_blank" class="link-btn">View Tutorial</a>
+</div>

static/style.css ADDED Viewed

	@@ -0,0 +1,87 @@

+#chat-input textarea {
+    background: white;
+    flex: 1;
+}
+#chat-input .submit-button {
+    padding: 10px;
+    margin: 2px 6px;
+    align-self: center;
+}
+#spectrogram-plot {
+    padding: 12px;
+    margin: 12px;
+}
+.banner {
+    background: white;
+    border: 1px solid #e5e7eb;
+    border-radius: 8px;
+    padding: 16px 20px;
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    margin-bottom: 16px;
+    margin-left: 0;
+    margin-right: 0;
+    box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1);
+}
+.banner .banner-header {
+    font-size: 16px;
+    font-weight: 600;
+    color: #374151;
+    margin-bottom: 4px;
+}
+.banner .banner-text {
+    font-size: 14px;
+    color: #6b7280;
+    line-height: 1.4;
+}
+.link-btn {
+    padding: 6px 12px;
+    border-radius: 6px;
+    font-size: 13px;
+    font-weight: 500;
+    cursor: pointer;
+    border: none;
+    background: #3b82f6;
+    color: white;
+    text-decoration: none;
+    display: inline-block;
+    transition: background 0.2s ease;
+}
+.link-btn:hover {
+    background: #2563eb;
+}
+.guide-section {
+    margin-bottom: 32px;
+    border-radius: 8px;
+    padding: 14px;
+    border: 1px solid #e5e7eb;
+}
+.guide-section h3 {
+    margin-top: 4px;
+    margin-bottom: 16px;
+    border-bottom: 1px solid #e5e7eb;
+    padding-bottom: 12px;
+}
+.guide-section h4 {
+    color: #1f2937;
+    margin-top: 4px;
+}
+@media (prefers-color-scheme: dark) {
+    #chat-input {
+        background: #1e1e1e;
+    }
+    #chat-input textarea {
+        background: #1e1e1e;
+        color: white;
+    }
+    .banner {
+        background: #1e1e1e;
+        color: white;
+    }
+    .banner .banner-header {
+        color: white;
+    }
+}