Spaces:

Saving-Willy
/

saving-willy-dev

Sleeping

App Files Files Community

vancauwe commited on Apr 22, 2025

Commit

2ae74ed

unverified ·

2 Parent(s): fe38b4f 68f1407

Merge pull request #41 from sdsc-ordes/feat/multipage

Browse files

Files changed (47) hide show

.streamlit/config.toml +6 -0
README.md +2 -2
docs/{hotdog.md → classifier_hotdog.md} +0 -0
docs/dataset_cleaner.md +3 -0
docs/dataset_download.md +3 -0
docs/dataset_fake_data.md +3 -0
docs/{hf_push_observations.md → dataset_hf_push_observations.md} +1 -1
docs/dataset_requests.md +3 -0
docs/{main.md → home.md} +1 -1
docs/pages.md +12 -0
docs/release_protocol.md +32 -0
docs/{fix_tabrender.md → utils_fix_tabrender.md} +0 -0
docs/{grid_maker.md → utils_grid_maker.md} +0 -0
docs/{metadata_handler.md → utils_metadata_handler.md} +0 -0
mkdocs.yaml +20 -18
requirements.txt +8 -0
src/apptest/demo_input_sidebar.py +2 -0
src/classifier/classifier_image.py +2 -95
docs/index.md → src/dataset/__init__.py +0 -0
src/dataset/cleaner.py +30 -0
src/dataset/data_requests.py +72 -0
src/dataset/download.py +87 -0
src/dataset/fake_data.py +49 -0
src/{hf_push_observations.py → dataset/hf_push_observations.py} +3 -48
src/home.py +84 -0
src/images/design/challenge1.png +3 -0
src/images/design/challenge2.png +3 -0
src/images/design/leaderboard.png +3 -0
src/images/logo/sdsc-horizontal.png +3 -0
src/input/input_handling.py +94 -41
src/main.py +0 -319
src/maps/obs_map.py +4 -68
src/old_main.py +313 -0
src/pages/1_🐋_about.py +46 -0
src/pages/2_🌍_map.py +36 -0
src/pages/3_🤝_data requests.py +73 -0
src/pages/4_🔥_classifiers.py +198 -0
src/pages/5_📐_benchmarking.py +15 -0
src/pages/6_🏆_challenges.py +24 -0
src/pages/7_🌊_gallery.py +17 -0
src/pages/8_🚧_coordinates.py +28 -0
src/pages/📊_logs.py +17 -0
src/utils/metadata_handler.py +2 -1
src/utils/workflow_ui.py +5 -0
src/whale_viewer.py +3 -1
tests/{test_obs_map.py → test_dataset_download.py} +12 -18
tests/test_demo_input_sidebar.py +4 -4

.streamlit/config.toml ADDED Viewed

	@@ -0,0 +1,6 @@

+[theme]
+primaryColor="#2CA3DF"
+backgroundColor="#0F418C"
+secondaryBackgroundColor="#0A326D"
+textColor="#F5F7FA"
+font="sans serif"

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ colorTo: blue
 sdk: streamlit
 sdk_version: 1.39.0
 python_version: "3.10"
-app_file: src/main.py
 pinned: false
 license: apache-2.0
 short_description: 'SDSC Hackathon - Project 10. '
@@ -28,7 +28,7 @@ pip install -r requirements.txt
 ```
 ```
-streamlit run src/main.py
 ```

 sdk: streamlit
 sdk_version: 1.39.0
 python_version: "3.10"
+app_file: src/home.py
 pinned: false
 license: apache-2.0
 short_description: 'SDSC Hackathon - Project 10. '
 ```
 ```
+streamlit run src/home.py
 ```

docs/{hotdog.md → classifier_hotdog.md} RENAMED Viewed

File without changes

docs/dataset_cleaner.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This module provides basic cleaning checks for the dataset that has been downloaded, any row which does not have the expected types is discarded.
2	+
3	+ ::: src.dataset.cleaner

docs/dataset_download.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This module provides a download function for accessing the hugging face Dataset.
2	+
3	+ ::: src.dataset.download

docs/dataset_fake_data.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This module takes care of generating some fake data.
2	+
3	+ ::: src.dataset.fake_data

docs/{hf_push_observations.md → dataset_hf_push_observations.md} RENAMED Viewed

@@ -1,3 +1,3 @@
 This module writes an observation into a temporary JSON file, in order to add this JSON file to the Saving-Willy Dataset in the Saving-Willy Hugging Face Community.
-::: src.hf_push_observations


1	This module writes an observation into a temporary JSON file, in order to add this JSON file to the Saving-Willy Dataset in the Saving-Willy Hugging Face Community.
2
3	+ ::: src.dataset.hf_push_observations

docs/dataset_requests.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This module provides functions for filtering the data by localisation and time and for rendering the search possibilities as well as the search results.
2	+
3	+ ::: src.dataset.requests

docs/{main.md → home.md} RENAMED Viewed

@@ -7,4 +7,4 @@ The session state is used to retain values from one interaction to the next, sin
 See streamlit [docs](https://docs.streamlit.io/develop/api-reference/caching-and-state/st.session_state).
-::: src.main


7	See streamlit [docs](https://docs.streamlit.io/develop/api-reference/caching-and-state/st.session_state).
8
9
10	+ ::: src.home

docs/pages.md ADDED Viewed

	@@ -0,0 +1,12 @@

+The UI is organized into a multipage streamlit app.
+The pages cover the main functionalities of the code.
+Some pages do not yet have code implemented for them: they represent a concept more than a functionality. Such pages are `About`, `Benchmarking`, `Challenges` which are currently only writing, markdown and images and do not require further documentation.
+Pages that have fully implemented code and functionality are the following:
+- Maps
+- Classifiers
+- Gallery
+- Logs

docs/release_protocol.md ADDED Viewed

	@@ -0,0 +1,32 @@

+# Release Protocol
+We use 2 spaces on hugging face: one for the development of the interface and the main space for showcasing the most recent stable release. The main branch is protected and deploys to the main space when a PR is accepted.
+We wish to enforce strict commits from the dev branch to the main branch when a PR is made to create a new release.
+Dev to Main PR Checklist:
+1. Open a PR from dev branch to main branch
+2. Commit: in `dataset/download` change the `dataset_id` to point to the main dataset : `Saving-Willy/main_dataset`
+3. Commit: in the ReadMe, to avoid merge conflict, change the header to this  :
+```
+---
+title: Saving Willy
+emoji: 🐋
+colorFrom: indigo
+colorTo: blue
+sdk: streamlit
+sdk_version: 1.39.0
+python_version: "3.10"
+app_file: src/home.py
+pinned: false
+license: apache-2.0
+short_description: 'SDSC Hackathon - Project 10. '
+---
+```
+4. Ask for Review
+5. Merge to main upon approval
+6. Make a new tag for a major version change (semantic versioning) i.e. `vX.0.0`
+7. Make a new release of the code, associated to this tag

docs/{fix_tabrender.md → utils_fix_tabrender.md} RENAMED Viewed

File without changes

docs/{grid_maker.md → utils_grid_maker.md} RENAMED Viewed

File without changes

docs/{metadata_handler.md → utils_metadata_handler.md} RENAMED Viewed

File without changes

mkdocs.yaml CHANGED Viewed

@@ -22,32 +22,34 @@ plugins:
 nav:
   - README: index.md
-  #- Quickstart:
-    #- Installation: installation.md
-    #- Usage: usage.md
-  - API:
-    - Main app: main.md
     - Modules:
-      - Data entry handling:
-        - Data input: input_handling.md
-        - Data extraction and validation: input_validator.md
         - Data Object Class: input_observation.md
-      - Classifiers:
         - Cetacean Fluke & Fin Recognition: classifier_image.md
-        - (temporary) Hotdog Classifier: hotdog.md
-      - Hugging Face Integration:
-        - Push Observations to Dataset: hf_push_observations.md
       - Map of observations: obs_map.md
       - Whale gallery: whale_gallery.md
       - Whale viewer: whale_viewer.md
       - Logging: st_logs.md
       - Utils:
-        - Tab-rendering fix (js): fix_tabrender.md
-        - Metadata handling: metadata_handler.md
-        - Grid maker: grid_maker.md
     - Development clutter:
       - Demo app: app.md
-  - How to contribute:
-    - Dev Notes: dev_notes.md

 nav:
   - README: index.md
+  - Release Protocol: release_protocol.md
+  - How to contribute:
+    - Dev Notes: dev_notes.md
+  - App:
+    - Main App & Home Page: home.md
+    - Multipages Notes: pages.md
     - Modules:
+      - Data Entry Handling:
+        - Data Input: input_handling.md
+        - Data Extraction & Validation: input_validator.md
         - Data Object Class: input_observation.md
+      - Hugging Face Dataset:
+        - Download: dataset_download.md
+        - Cleaning: dataset_cleaner.md
+        - Push Observations to Dataset: dataset_hf_push_observations.md
+        - Data Requests: dataset_requests.md
+        - Fake data: dataset_fake_data.md
+      - Hugging Face Classifiers:
         - Cetacean Fluke & Fin Recognition: classifier_image.md
+        - (temporary) Hotdog Classifier: classifier_hotdog.md
       - Map of observations: obs_map.md
       - Whale gallery: whale_gallery.md
       - Whale viewer: whale_viewer.md
       - Logging: st_logs.md
       - Utils:
+        - Tab-rendering fix (js): utils_fix_tabrender.md
+        - Metadata handling: utils_metadata_handler.md
+        - Grid maker: utils_grid_maker.md
     - Development clutter:
       - Demo app: app.md

requirements.txt CHANGED Viewed

@@ -13,6 +13,9 @@ datasets==3.0.2
 ## FSM
 transitions==0.9.2
 # running ML models
 ## to use ML models hosted on HF
@@ -28,8 +31,13 @@ pillow==10.4.0
 opencv-python-headless==4.5.5.64
 albumentations==1.1.0
 # documentation: mkdocs
 mkdocs~=1.6.0
 mkdocstrings[python]>=0.25.1
 mkdocs-material~=9.5.27
 mkdocs-homepage-copier~=1.0.0

 ## FSM
 transitions==0.9.2
+# data manipulation
+pandas==2.2.3
 # running ML models
 ## to use ML models hosted on HF
 opencv-python-headless==4.5.5.64
 albumentations==1.1.0
+# for env variables
+python-dotenv==1.1.0
 # documentation: mkdocs
 mkdocs~=1.6.0
 mkdocstrings[python]>=0.25.1
 mkdocs-material~=9.5.27
 mkdocs-homepage-copier~=1.0.0

src/apptest/demo_input_sidebar.py CHANGED Viewed

@@ -25,6 +25,8 @@ from apptest.demo_elements import show_uploaded_file_info
 if __name__ == "__main__":
     init_input_data_session_states()
     init_input_container_states()
     init_workflow_session_states()

 if __name__ == "__main__":
+    if "input_author_email" not in st.session_state:
+        st.session_state.input_author_email = ""
     init_input_data_session_states()
     init_input_container_states()
     init_workflow_session_states()

src/classifier/classifier_image.py CHANGED Viewed

@@ -7,7 +7,6 @@ g_logger = logging.getLogger(__name__)
 g_logger.setLevel(LOG_LEVEL)
 import whale_viewer as viewer
-from hf_push_observations import push_observations
 from utils.grid_maker import gridder
 from utils.metadata_handler import metadata2md
 from input.input_observation import InputObservation
@@ -107,20 +106,15 @@ def cetacean_show_results_and_review() -> None:
                 print(f"[D] {o:3} pred1: {pred1:30} | {hash}")
                 ix = viewer.WHALE_CLASSES.index(pred1) if pred1 in viewer.WHALE_CLASSES else None
                 selected_class = st.selectbox(f"Species for observation {str(o)}", viewer.WHALE_CLASSES, index=ix)
             _observation.set_selected_class(selected_class)
-            #observation['predicted_class'] = selected_class
-            # this logic is now in the InputObservation class automatially
-            #if selected_class != st.session_state.whale_prediction1[hash]:
-            #    observation['class_overriden'] = selected_class # TODO: this should be boolean!
             # store the elements of the observation that will be transmitted (not image)
             observation = _observation.to_dict()
             st.session_state.public_observations[hash] = observation
-            #st.button(f"Upload observation {str(o)} to THE INTERNET!", on_click=push_observations)
             # TODO: the metadata only fills properly if `validate` was clicked.
-            st.markdown(metadata2md(hash, debug=True))
             msg = f"[D] full observation after inference: {observation}"
             g_logger.debug(msg)
@@ -163,27 +157,6 @@ def cetacean_show_results():
         with grid[col]:
             st.image(image, use_column_width=True)
-            # # dropdown for selecting/overriding the species prediction
-            # if not st.session_state.classify_whale_done[hash]:
-            #     selected_class = st.sidebar.selectbox("Species", viewer.WHALE_CLASSES,
-            #                                                     index=None, placeholder="Species not yet identified...",
-            #                                                     disabled=True)
-            # else:
-            #     pred1 = st.session_state.whale_prediction1[hash]
-            #     # get index of pred1 from WHALE_CLASSES, none if not present
-            #     print(f"[D] pred1: {pred1}")
-            #     ix = viewer.WHALE_CLASSES.index(pred1) if pred1 in viewer.WHALE_CLASSES else None
-            #     selected_class = st.selectbox(f"Species for observation {str(o)}", viewer.WHALE_CLASSES, index=ix)
-            # observation['predicted_class'] = selected_class
-            # if selected_class != st.session_state.whale_prediction1[hash]:
-            #     observation['class_overriden'] = selected_class # TODO: this should be boolean!
-            # st.session_state.public_observation = observation
-            #st.button(f"Upload observation {str(o)} to THE INTERNET!", on_click=push_observations)
-            #
             st.markdown(metadata2md(hash, debug=True))
             msg = f"[D] full observation after inference: {observation}"
@@ -199,69 +172,3 @@ def cetacean_show_results():
                 viewer.display_whale(whale_classes, i)
         o += 1
         col = (col + 1) % row_size
-# func to do all in one
-def cetacean_classify_show_and_review(cetacean_classifier):
-    """Cetacean classifier using the saving-willy model from Saving Willy Hugging Face space.
-    For each image in the session state, classify the image and display the top 3 predictions.
-    Args:
-        cetacean_classifier ([type]):  saving-willy model from Saving Willy Hugging Face space
-    """
-    raise DeprecationWarning("This function is deprecated. Use individual steps instead")
-    images = st.session_state.images
-    observations = st.session_state.observations
-    hashes = st.session_state.image_hashes
-    batch_size, row_size, page = gridder(hashes)
-    grid = st.columns(row_size)
-    col = 0
-    o=1
-    for hash in hashes:
-        image = images[hash]
-        with grid[col]:
-            st.image(image, use_column_width=True)
-            observation = observations[hash].to_dict()
-            # run classifier model on `image`, and persistently store the output
-            out = cetacean_classifier(image) # get top 3 matches
-            st.session_state.whale_prediction1[hash] = out['predictions'][0]
-            st.session_state.classify_whale_done[hash] = True
-            msg = f"[D]2 classify_whale_done for {hash}: {st.session_state.classify_whale_done[hash]}, whale_prediction1: {st.session_state.whale_prediction1[hash]}"
-            g_logger.info(msg)
-            # dropdown for selecting/overriding the species prediction
-            if not st.session_state.classify_whale_done[hash]:
-                selected_class = st.sidebar.selectbox("Species", viewer.WHALE_CLASSES,
-                                                                index=None, placeholder="Species not yet identified...",
-                                                                disabled=True)
-            else:
-                pred1 = st.session_state.whale_prediction1[hash]
-                # get index of pred1 from WHALE_CLASSES, none if not present
-                print(f"[D] pred1: {pred1}")
-                ix = viewer.WHALE_CLASSES.index(pred1) if pred1 in viewer.WHALE_CLASSES else None
-                selected_class = st.selectbox(f"Species for observation {str(o)}", viewer.WHALE_CLASSES, index=ix)
-            observation['predicted_class'] = selected_class
-            if selected_class != st.session_state.whale_prediction1[hash]:
-                observation['class_overriden'] = selected_class
-            st.session_state.public_observation = observation
-            st.button(f"Upload observation {str(o)} to THE INTERNET!", on_click=push_observations)
-            # TODO: the metadata only fills properly if `validate` was clicked.
-            st.markdown(metadata2md())
-            msg = f"[D] full observation after inference: {observation}"
-            g_logger.debug(msg)
-            print(msg)
-            # TODO: add a link to more info on the model, next to the button.
-            whale_classes = out['predictions'][:]
-            # render images for the top 3 (that is what the model api returns)
-            st.markdown(f"Top 3 Predictions for observation {str(o)}")
-            for i in range(len(whale_classes)):
-                viewer.display_whale(whale_classes, i)
-        o += 1
-        col = (col + 1) % row_size

 g_logger.setLevel(LOG_LEVEL)
 import whale_viewer as viewer
 from utils.grid_maker import gridder
 from utils.metadata_handler import metadata2md
 from input.input_observation import InputObservation
                 print(f"[D] {o:3} pred1: {pred1:30} | {hash}")
                 ix = viewer.WHALE_CLASSES.index(pred1) if pred1 in viewer.WHALE_CLASSES else None
                 selected_class = st.selectbox(f"Species for observation {str(o)}", viewer.WHALE_CLASSES, index=ix)
             _observation.set_selected_class(selected_class)
             # store the elements of the observation that will be transmitted (not image)
             observation = _observation.to_dict()
             st.session_state.public_observations[hash] = observation
             # TODO: the metadata only fills properly if `validate` was clicked.
+            # TODO put condition on the debug
+            st.markdown(metadata2md(hash, debug=False))
             msg = f"[D] full observation after inference: {observation}"
             g_logger.debug(msg)
         with grid[col]:
             st.image(image, use_column_width=True)
             st.markdown(metadata2md(hash, debug=True))
             msg = f"[D] full observation after inference: {observation}"
                 viewer.display_whale(whale_classes, i)
         o += 1
         col = (col + 1) % row_size

docs/index.md → src/dataset/__init__.py RENAMED Viewed

File without changes

src/dataset/cleaner.py ADDED Viewed

	@@ -0,0 +1,30 @@

+import pandas as pd
+def clean_lat_long(df) -> pd.DataFrame:
+    """
+    Clean latitude and longitude columns in the DataFrame.
+    Ensure lat and lon are numeric, coerce errors to NaN
+    Args:
+        df (pd.DataFrame): DataFrame containing latitude and longitude columns.
+    Returns:
+        pd.DataFrame: DataFrame with cleaned latitude and longitude columns.
+    """
+    df['lat'] = pd.to_numeric(df['lat'], errors='coerce')
+    df['lon'] = pd.to_numeric(df['lon'], errors='coerce')
+    # Drop rows with NaN in lat or lon
+    df = df.dropna(subset=['lat', 'lon']).reset_index(drop=True)
+    return df
+def clean_date(df) -> pd.DataFrame: # Ensure lat and lon are numeric, coerce errors to NaN
+    """
+    Clean date column in the DataFrame.
+    Args:
+        df (pd.DataFrame): DataFrame containing date column.
+    Returns:
+        pd.DataFrame: DataFrame with cleaned date column.
+    """
+    df['date'] = pd.to_datetime(df['date'], errors='coerce')
+    # Drop rows with NaN in lat or lon
+    df = df.dropna(subset=['date']).reset_index(drop=True)
+    return df

src/dataset/data_requests.py ADDED Viewed

	@@ -0,0 +1,72 @@

+import streamlit as st
+import pandas as pd
+from dataset.cleaner import clean_lat_long, clean_date
+from dataset.download import get_dataset
+from dataset.fake_data import generate_fake_data
+def data_prep() -> pd.DataFrame:
+    """
+    Prepares the dataset for use in the application.
+    Downloads the dataset and cleans the data (and generates fake data if needed).
+    Returns:
+        pd.DataFrame: A DataFrame containing the cleaned dataset.
+    """
+    df = get_dataset()
+    # uncomment to generate some fake data
+    # df = generate_fake_data(df, 100)
+    df = clean_lat_long(df)
+    df = clean_date(df)
+    return df
+def filter_data(df:pd.DataFrame) -> pd.DataFrame:
+    """
+    Filter the DataFrame based on user-selected ranges for latitude, longitude, and date.
+    Args:
+        df (pd.DataFrame): DataFrame to filter.
+    Returns:
+        pd.DataFrame: Filtered DataFrame.
+    """
+    df_filtered = df[
+    (df['date'] >= pd.to_datetime(st.session_state.date_range[0])) &
+        (df['date'] <= pd.to_datetime(st.session_state.date_range[1])) &
+    (df['lon'] >= st.session_state.lon_range[0]) &
+        (df['lon'] <= st.session_state.lon_range[1]) &
+    (df['lat'] >= st.session_state.lat_range[0]) &
+        (df['lat'] <= st.session_state.lat_range[1])
+    ]
+    return df_filtered
+def show_specie_author(df:pd.DataFrame):
+    """
+    Display a list of species and their corresponding authors with checkboxes.
+    Args:
+        df (pd.DataFrame): DataFrame containing species and author information.
+    """
+    df = df.groupby(['species', 'author_email']).size().reset_index(name='counts')
+    for specie in df["species"].unique():
+        st.subheader(f"Species: {specie}")
+        specie_data = df[df['species'] == specie]
+        for _, row in specie_data.iterrows():
+            key = f"{specie}_{row['author_email']}"
+            label = f"{row['author_email']} ({row['counts']})"
+            st.session_state.checkbox_states[key] = st.checkbox(label, key=key)
+def show_new_data_view(df:pd.DataFrame) -> pd.DataFrame:
+    """
+    Show the new filtered data view on the UI.
+    Filter the dataframe based on the state of the localisation sliders and selected timeframe by the user.
+    Then, show the results of the filtering grouped by species then by authors.
+    Authors are matched to a checkbox component so the user can click it if he/she/they wish to request data from this author.
+    Args:
+        df (pd.DataFrame): DataFrame to filter and display.
+    Returns:
+        pd.DataFrame: Filtered and grouped DataFrame.
+    """
+    df = filter_data(df)
+    df_ordered = show_specie_author(df)
+    return df_ordered

src/dataset/download.py ADDED Viewed

	@@ -0,0 +1,87 @@

+import streamlit as st
+import time
+import logging
+import pandas as pd
+from datasets import load_dataset
+from datasets import DatasetDict
+############################################################
+# the dataset of observations (hf dataset in our space)
+dataset_id = "Saving-Willy/temp_dataset"
+data_files = "data/train-00000-of-00001.parquet"
+############################################################
+m_logger = logging.getLogger(__name__)
+# we can set the log level locally for funcs in this module
+#g_m_logger.setLevel(logging.DEBUG)
+m_logger.setLevel(logging.INFO)
+presentation_data_schema = {
+    'lat': 'float',
+    'lon': 'float',
+    'species': 'str',
+    'author_email': 'str',
+    'date' : 'timestamp',
+}
+def try_download_dataset(dataset_id:str, data_files:str) -> dict:
+    """
+    Attempts to download a dataset from Hugging Face, catching any errors that occur.
+    Args:
+        dataset_id (str): The ID of the dataset to download.
+        data_files (str): The data files associated with the dataset.
+    Returns:
+        dict: A dictionary containing the dataset metadata if the download is successful,
+              or an empty dictionary if an error occurs.
+    """
+    m_logger.info(f"Starting to download dataset {dataset_id} from Hugging Face")
+    t1 = time.time()
+    try:
+        metadata:DatasetDict = load_dataset(dataset_id, data_files=data_files)
+        t2 = time.time(); elap = t2 - t1
+    except ValueError as e:
+        t2 = time.time(); elap = t2 - t1
+        msg = f"Error downloading dataset: {e}.  (after {elap:.2f}s)."
+        st.error(msg)
+        m_logger.error(msg)
+        metadata = {}
+    except Exception as e:
+        # catch all (other) exceptions and log them, handle them once isolated
+        t2 = time.time(); elap = t2 - t1
+        msg = f"!!Unknown Error!! downloading dataset: {e}.  (after {elap:.2f}s)."
+        st.error(msg)
+        m_logger.error(msg)
+        metadata = {}
+    msg = f"Downloaded dataset: (after {elap:.2f}s). "
+    m_logger.info(msg)
+    #st.write(msg)
+    return metadata
+def get_dataset() -> pd.DataFrame:
+    """
+    Downloads the dataset from Hugging Face and prepares it for use.
+    If the dataset is not available, it creates an empty DataFrame with the specified schema.
+    Returns:
+        pd.DataFrame: A DataFrame containing the dataset, or an empty DataFrame if the dataset is not available.
+    """
+    # load/download data from huggingface dataset
+    metadata = try_download_dataset(dataset_id, data_files)
+    if not metadata:
+        # create an empty, but compliant dataframe
+        df = pd.DataFrame(columns=presentation_data_schema).astype(presentation_data_schema)
+    else:
+        # make a pandas df that is compliant with folium/streamlit maps
+        df = pd.DataFrame({
+            'lat': metadata["train"]["latitude"],
+            'lon': metadata["train"]["longitude"],
+            'species': metadata["train"]["selected_class"],
+            'author_email': metadata["train"]["author_email"],
+            'date': metadata["train"]["date"],}
+        )
+    return df

src/dataset/fake_data.py ADDED Viewed

	@@ -0,0 +1,49 @@

+from typing import Tuple
+import pandas as pd
+import random
+from datetime import datetime, timedelta
+from dataset.download import presentation_data_schema
+from whale_viewer import WHALE_CLASSES
+def generate_fake_data(df:pd.DataFrame, num_fake:int) -> pd.DataFrame:
+    """
+    Generate fake data for the dataset.
+    Args:
+        df (pd.DataFrame): Original DataFrame to append fake data to.
+        num_fake (int): Number of fake observations to generate.
+    Returns:
+        pd.DataFrame: DataFrame with the original and fake data.
+    """
+    # Options for random generation
+    species_options = WHALE_CLASSES
+    email_options = [
+        'dr.marine@oceanic.org', 'whale.research@deepblue.org',
+        'observer@sea.net', 'super@whale.org'
+    ]
+    def random_ocean_coord() -> Tuple[float, float]:
+        """Generate random ocean-friendly coordinates."""
+        lat = random.uniform(-60, 60)  # avoid poles
+        lon = random.uniform(-180, 180)
+        return lat, lon
+    def random_date(start_year:int=2018, end_year:int=2025) -> datetime:
+        """Generate a random date."""
+        start = datetime(start_year, 1, 1)
+        end = datetime(end_year, 1, 1)
+        return start + timedelta(days=random.randint(0, (end - start).days))
+    new_data = []
+    for _ in range(num_fake):
+        lat, lon = random_ocean_coord()
+        species = random.choice(species_options)
+        email = random.choice(email_options)
+        date = random_date()
+        new_data.append([lat, lon, species, email, date])
+    new_df = pd.DataFrame(new_data, columns=presentation_data_schema).astype(presentation_data_schema)
+    df = pd.concat([df, new_df], ignore_index=True)
+    return df

src/{hf_push_observations.py → dataset/hf_push_observations.py} RENAMED Viewed

@@ -7,6 +7,7 @@ from streamlit.delta_generator import DeltaGenerator
 import streamlit as st
 from huggingface_hub import HfApi, CommitInfo
 # get a global var for logger accessor in this module
 LOG_LEVEL = logging.DEBUG
@@ -48,7 +49,7 @@ def push_observation(image_hash:str, api:HfApi, enable_push:False) -> CommitInfo
         rv = api.upload_file(
             path_or_fileobj=f.name,
             path_in_repo=path_in_repo,
-            repo_id="Saving-Willy/temp_dataset",
             repo_type="dataset",
         )
         print(rv)
@@ -73,50 +74,4 @@ def push_all_observations(enable_push:bool=False):
     # iterate over the list of observations
     for hash in st.session_state.public_observations.keys():
-        rv = push_observation(hash, api, enable_push=enable_push)
-def push_observations(tab_log:DeltaGenerator=None):
-    """
-    Push the observations to the Hugging Face dataset
-    Args:
-        tab_log (streamlit.container): The container to log messages to. If not provided,
-            log messages are in any case written to the global logger (TODO: test - didn't
-            push any observation since generating the logger)
-    """
-    raise DeprecationWarning("This function is deprecated. Use push_all_observations instead.")
-    # we get the observation from session state: 1 is the dict 2 is the image.
-    # first, lets do an info display (popup)
-    metadata_str = json.dumps(st.session_state.public_observation)
-    st.toast(f"Uploading observations: {metadata_str}", icon="🦭")
-    g_logger.info(f"Uploading observations: {metadata_str}")
-    # get huggingface api
-    token = os.environ.get("HF_TOKEN", None)
-    api = HfApi(token=token)
-    f = tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False)
-    f.write(metadata_str)
-    f.close()
-    st.info(f"temp file: {f.name} with metadata written...")
-    path_in_repo= f"metadata/{st.session_state.public_observation['author_email']}/{st.session_state.public_observation['image_md5']}.json"
-    msg = f"fname: {f.name} | path: {path_in_repo}"
-    print(msg)
-    st.warning(msg)
-    # rv = api.upload_file(
-    #     path_or_fileobj=f.name,
-    #     path_in_repo=path_in_repo,
-    #     repo_id="Saving-Willy/temp_dataset",
-    #     repo_type="dataset",
-    # )
-    # print(rv)
-    # msg = f"observation attempted tx to repo happy walrus: {rv}"
-    g_logger.info(msg)
-    st.info(msg)

 import streamlit as st
 from huggingface_hub import HfApi, CommitInfo
+from dataset.download import dataset_id
 # get a global var for logger accessor in this module
 LOG_LEVEL = logging.DEBUG
         rv = api.upload_file(
             path_or_fileobj=f.name,
             path_in_repo=path_in_repo,
+            repo_id=dataset_id,
             repo_type="dataset",
         )
         print(rv)
     # iterate over the list of observations
     for hash in st.session_state.public_observations.keys():
+        rv = push_observation(hash, api, enable_push=enable_push)

src/home.py ADDED Viewed

	@@ -0,0 +1,84 @@

+import streamlit as st
+import os
+import logging
+st.set_page_config(
+    page_title="Home",
+    page_icon="🐳",
+)
+# get a global var for logger accessor in this module
+LOG_LEVEL = logging.DEBUG
+g_logger = logging.getLogger(__name__)
+g_logger.setLevel(LOG_LEVEL)
+# one toggle for all the extra debug text
+if "MODE_DEV_STATEFUL" not in st.session_state:
+    st.session_state.MODE_DEV_STATEFUL = False
+from utils.st_logs import init_logging_session_states
+init_logging_session_states() # logging init should be early
+# set email state var to exist, to permit persistence across page switches
+if "input_author_email" not in st.session_state:
+    st.session_state.input_author_email = ""
+st.write("""
+         # Welcome ! 🐬˚✧˚.⋆🐋
+         # Cetacean Conservation Community
+        """)
+st.sidebar.success("Explore the pages: there are machine learning models, data requests, maps and more !")
+st.sidebar.image(
+    "src/images/logo/sdsc-horizontal.png",
+    width=200
+)
+st.markdown(
+    """
+    ## 💙 Research Data Infrastructure
+    ˖°𓇼🌊⋆🐚🫧 This interface is a Proof of Concept of a Community-driven Research Data Infrastructure (RDI) for the Cetacean Conservation Community.
+    This PoC will happily be made into a production-ready RDI if the community is interested.
+    👤 The intended users of this interface are the researchers and conservationists working on cetacean conservation.
+    In its current state, the interface is designed to be user-friendly, allowing users to upload images of cetaceans and receive species classification results.
+    🤝 We value community-contributions and encourage anyone interested to reach out on [the main repository's Github issues](https://github.com/sdsc-ordes/saving-willy/issues).
+    🌍 The goal of this RDI is to explore community methods for sharing code and data.
+    ## 💻 Sharing Code
+    Through the platform of Hugging Face 🤗, machine learning models are published so they can be used for inference on this UI or by other users.
+    Currently, a demonstration model is available for cetacean species classification.
+    The model is based on the [HappyWhale](https://www.kaggle.com/competitions/happy-whale-and-dolphin) competition with the most recent weights.
+    Since part of the model was not made public, the classifier should not be used for inference and is purely demonstrative.
+    🏆 Ideally, through new Kaggle challenges or ongoing development in research groups, new models can be brought to Hugging Face and onto the UI.
+    ## 💎 Sharing Data
+    The dataset is hosted on Hugging Face 🤗 as well, in order to share the metadata of the images which have been classified by the model.
+    Making the metadata public is under the choice of the researcher, who can choose to use the model for inference without making the image metadata public afterwards.
+    Of course, we encourage open data. Please note that the original images are never made public in the current-state RDI.
+    💪 The RDI also explores how to share data after inference, with a simple data request page where researchers can filter the existing metadata from the Hugging Face dataset, and then easily select those of interest for them.
+    Ideally, the Request button would either start a Discord channel discussion between concerned parties of the data request, or generate an e-mail with interested parties. This design is still under conception.
+"""
+)
+g_logger.info("App started.")
+g_logger.warning(f"[D] Streamlit version: {st.__version__}. Python version: {os.sys.version}")
+#g_logger.debug("debug message")
+#g_logger.info("info message")
+#g_logger.warning("warning message")

src/images/design/challenge1.png ADDED Viewed

Git LFS Details

SHA256: 1dd2aa78e98b48b2a4e9eba9a8ebc6a2245848c2499949c2b0670ff65d1dff89
Pointer size: 131 Bytes
Size of remote file: 324 kB

src/images/design/challenge2.png ADDED Viewed

Git LFS Details

SHA256: 0e85a6600b8ed5037feb0ff811086e03dac8dc5e9b9fd7e3caf9c9b9ac02ccc4
Pointer size: 131 Bytes
Size of remote file: 230 kB

src/images/design/leaderboard.png ADDED Viewed

Git LFS Details

SHA256: 1205d84eeb588f3285890f26e65fc44677db75d58481b91da8e6f69806c89bc4
Pointer size: 131 Bytes
Size of remote file: 233 kB

src/images/logo/sdsc-horizontal.png ADDED Viewed

Git LFS Details

SHA256: a4a40e28f815045ff6251fbc937edf4423da7e36ad9b0418458f5e1eb767f6e2
Pointer size: 130 Bytes
Size of remote file: 37.4 kB

src/input/input_handling.py CHANGED Viewed

@@ -5,7 +5,7 @@ import hashlib
 import os
 import streamlit as st
-from streamlit.delta_generator import DeltaGenerator
 from streamlit.runtime.uploaded_file_manager import UploadedFile
 import cv2
@@ -202,7 +202,13 @@ def metadata_inputs_one_file(file:UploadedFile, image_hash:str, dbg_ix:int=0) ->
         m_logger.warning("[W] `container_metadata_inputs` is None, using sidebar")
     author_email = st.session_state["input_author_email"]
     filename = file.name
     image_datetime_raw = get_image_datetime(file)
@@ -211,6 +217,23 @@ def metadata_inputs_one_file(file:UploadedFile, image_hash:str, dbg_ix:int=0) ->
     msg = f"[D] {filename}: lat, lon from image metadata: {latitude0}, {longitude0}"
     m_logger.debug(msg)
     if spoof_metadata:
         if latitude0 is None: # get some default values if not found in exifdata
             latitude0:float = spoof_metadata.get('latitude', 0) + dbg_ix
@@ -219,20 +242,16 @@ def metadata_inputs_one_file(file:UploadedFile, image_hash:str, dbg_ix:int=0) ->
     image = st.session_state.images.get(image_hash, None)
     # add the UI elements
-    #viewcontainer.title(f"Metadata for {filename}")
     viewcontainer = _viewcontainer.expander(f"Metadata for {file.name}", expanded=True)
-    # TODO: use session state so any changes are persisted within session -- currently I think
-    # we are going to take the defaults over and over again -- if the user adjusts coords, or date, it will get lost
-    # - it is a bit complicated, if no values change, they persist (the widget definition: params, name, key, etc)
-    #   even if the code is re-run. but if the value changes, it is lost.
     # 3. Latitude Entry Box
     latitude = viewcontainer.text_input(
         "Latitude for " + filename,
         latitude0,
-        key=f"input_latitude_{image_hash}")
     if latitude and not is_valid_number(latitude):
         viewcontainer.error("Please enter a valid latitude (numerical only).")
         m_logger.error(f"Invalid latitude entered: {latitude}.")
@@ -240,40 +259,71 @@ def metadata_inputs_one_file(file:UploadedFile, image_hash:str, dbg_ix:int=0) ->
     longitude = viewcontainer.text_input(
         "Longitude for " + filename,
         longitude0,
-        key=f"input_longitude_{image_hash}")
     if longitude and not is_valid_number(longitude):
         viewcontainer.error("Please enter a valid longitude (numerical only).")
         m_logger.error(f"Invalid latitude entered: {latitude}.")
     # 5. Date/time
-    ## first from image metadata
-    if image_datetime_raw is not None:
-        # if we have a timezone let's use it (but only if we also have datetime)
-        time_fmt = '%Y:%m:%d %H:%M:%S'
-        if image_timezone_raw is not None:
-            image_datetime_raw += f" {image_timezone_raw}"
-            time_fmt += ' %z'
-        #
-        dt = datetime.datetime.strptime(image_datetime_raw, time_fmt)
         date_value = dt.date()
         time_value = dt.time()
-        #time_value = datetime.datetime.strptime(image_datetime_raw, '%Y:%m:%d %H:%M:%S').time()
-        #date_value = datetime.datetime.strptime(image_datetime_raw, '%Y:%m:%d %H:%M:%S').date()
     else:
-        # get current time, with user timezone (or is it server timezone?! TODO: test with different zones)
-        dt = datetime.datetime.now().astimezone().replace(microsecond=0)
-        time_value = dt.time()
-        date_value = dt.date()
-        #time_value = datetime.datetime.now().time()  # Default to current time
-        #date_value = datetime.datetime.now().date()
     ## either way, give user the option to enter manually (or correct, e.g. if camera has no rtc clock)
-    date = viewcontainer.date_input("Date for "+filename, value=date_value, key=f"input_date_{image_hash}")
-    time = viewcontainer.time_input("Time for "+filename, time_value, key=f"input_time_{image_hash}")
     tz_str = dt.strftime('%z') # this is numeric, otherwise the info isn't consistent.
     observation = InputObservation(image=image, latitude=latitude, longitude=longitude,
@@ -339,8 +389,15 @@ def _setup_oneoff_inputs() -> None:
     with container_file_uploader:
         # 1. Input the author email
-        author_email = st.text_input("Author Email", spoof_metadata.get('author_email', ""),
-                                                key="input_author_email")
         if author_email and not is_valid_email(author_email):
             st.error("Please enter a valid email address.")
@@ -348,14 +405,10 @@ def _setup_oneoff_inputs() -> None:
         st.file_uploader(
             "Upload one or more images", type=["png", 'jpg', 'jpeg', 'webp'],
             accept_multiple_files=True,
             key="file_uploader_data", on_change=buffer_uploaded_files)
 def setup_input() -> None:
     '''
     Set up the user input handling (files and metadata)
@@ -424,7 +477,7 @@ def add_input_UI_elements() -> None:
     # which are not created in the same order.
     st.divider()
-    st.title("Input image and data")
     # create and style a container for the file uploader/other one-off inputs
     st.markdown('<style>.st-key-container_file_uploader_id { border: 1px solid skyblue; border-radius: 5px; }</style>', unsafe_allow_html=True)

 import os
 import streamlit as st
+#from streamlit.delta_generator import DeltaGenerator
 from streamlit.runtime.uploaded_file_manager import UploadedFile
 import cv2
         m_logger.warning("[W] `container_metadata_inputs` is None, using sidebar")
+    # logic for the precedence of lat/lon values (descending importance)
+    # 1) if something was already entered, take that value (can have arrived from 2 or 3 in previous round)
+    # 2) if file metadata, take that value
+    # 3) if spoof metadata flag is up, take that value
+    # 4) else, empty (None)
+    # - and similarly for date/time
     author_email = st.session_state["input_author_email"]
     filename = file.name
     image_datetime_raw = get_image_datetime(file)
     msg = f"[D] {filename}: lat, lon from image metadata: {latitude0}, {longitude0}"
     m_logger.debug(msg)
+    # let's see if there was a value that was already entered for latitude and/or longitude
+    key_lon=f"input_longitude_{image_hash}"
+    key_lat=f"input_latitude_{image_hash}"
+    present_lat = key_lat in st.session_state
+    present_lon = key_lon in st.session_state
+    latitude_prior = st.session_state.get(key_lat, None)
+    longitude_prior = st.session_state.get(key_lon, None)
+    m_logger.debug(f"[D] {key_lat}: key present? {int(present_lat)} | prior value: {latitude_prior} | metadata value: {latitude0}")
+    m_logger.debug(f"[D] {key_lon}: key present? {int(present_lon)} | prior value: {longitude_prior} | metadata value: {longitude0}")
+    if latitude_prior is not None:
+        latitude0 = latitude_prior
+    if longitude_prior is not None:
+        longitude0 = longitude_prior
     if spoof_metadata:
         if latitude0 is None: # get some default values if not found in exifdata
             latitude0:float = spoof_metadata.get('latitude', 0) + dbg_ix
     image = st.session_state.images.get(image_hash, None)
     # add the UI elements
     viewcontainer = _viewcontainer.expander(f"Metadata for {file.name}", expanded=True)
     # 3. Latitude Entry Box
     latitude = viewcontainer.text_input(
         "Latitude for " + filename,
         latitude0,
+        disabled=st.session_state.get("input_disabled", False),
+        key=f"input_latitude_anchor_{image_hash}",
+    )
     if latitude and not is_valid_number(latitude):
         viewcontainer.error("Please enter a valid latitude (numerical only).")
         m_logger.error(f"Invalid latitude entered: {latitude}.")
     longitude = viewcontainer.text_input(
         "Longitude for " + filename,
         longitude0,
+        disabled=st.session_state.get("input_disabled", False),
+        key=f"input_longitude_anchor_{image_hash}",
+    )
     if longitude and not is_valid_number(longitude):
         viewcontainer.error("Please enter a valid longitude (numerical only).")
         m_logger.error(f"Invalid latitude entered: {latitude}.")
+    # now store the latitude and longitude into the session state (persists across page switches)
+    st.session_state[key_lat] = latitude
+    st.session_state[key_lon] = longitude
     # 5. Date/time
+    ## first from state, if previously set/modified
+    key_date = f"input_date_{image_hash}"
+    key_time = f"input_time_{image_hash}"
+    present_date = key_date in st.session_state
+    present_time = key_time in st.session_state
+    date_prior:datetime.date = st.session_state.get(key_date, None)
+    time_prior:datetime.time = st.session_state.get(key_time, None)
+    m_logger.debug(f"[D] {key_date}: key present? {int(present_date)} | prior value: {date_prior} | metadata value: {image_datetime_raw}")
+    m_logger.debug(f"[D] {key_time}: key present? {int(present_time)} | prior value: {time_prior} | metadata value: {image_datetime_raw}")
+    if date_prior is not None and time_prior is not None:
+        # we should use these values
+        dt = datetime.datetime.combine(date_prior, time_prior)
         date_value = dt.date()
         time_value = dt.time()
     else:
+        ## second from image metadata
+        if image_datetime_raw is not None:
+            # if we have a timezone let's use it (but only if we also have datetime)
+            time_fmt = '%Y:%m:%d %H:%M:%S'
+            if image_timezone_raw is not None:
+                image_datetime_raw += f" {image_timezone_raw}"
+                time_fmt += ' %z'
+            #
+            dt = datetime.datetime.strptime(image_datetime_raw, time_fmt)
+            date_value = dt.date()
+            time_value = dt.time()
+            #time_value = datetime.datetime.strptime(image_datetime_raw, '%Y:%m:%d %H:%M:%S').time()
+            #date_value = datetime.datetime.strptime(image_datetime_raw, '%Y:%m:%d %H:%M:%S').date()
+        else:
+            # get current time, with user timezone (or is it server timezone?! TODO: test with different zones)
+            dt = datetime.datetime.now().astimezone().replace(microsecond=0)
+            time_value = dt.time()
+            date_value = dt.date()
     ## either way, give user the option to enter manually (or correct, e.g. if camera has no rtc clock)
+    date = viewcontainer.date_input(
+        "Date for "+filename, value=date_value,
+        key=f"input_date_anchor_{image_hash}",
+        disabled=st.session_state.get("input_disabled", False), )
+    time = viewcontainer.time_input(
+        "Time for "+filename, time_value,
+        key=f"input_time_anchor_{image_hash}",
+        disabled=st.session_state.get("input_disabled", False),)
+    # now store the date and time into the session state (persists across page switches)
+    st.session_state[key_date] = date
+    st.session_state[key_time] = time
     tz_str = dt.strftime('%z') # this is numeric, otherwise the info isn't consistent.
     observation = InputObservation(image=image, latitude=latitude, longitude=longitude,
     with container_file_uploader:
         # 1. Input the author email
+        text0 = st.session_state.get("input_author_email", "None")
+        #print(f"[D] author email: {text0}")
+        author_email = st.text_input("Author Email",
+                                     value=st.session_state.get("input_author_email", None),
+                                     disabled=st.session_state.get("input_disabled", False),
+        )
+        # store the email in session state
+        st.session_state["input_author_email"] = author_email
         if author_email and not is_valid_email(author_email):
             st.error("Please enter a valid email address.")
         st.file_uploader(
             "Upload one or more images", type=["png", 'jpg', 'jpeg', 'webp'],
             accept_multiple_files=True,
+            disabled=st.session_state.get("input_disabled", False),
             key="file_uploader_data", on_change=buffer_uploaded_files)
 def setup_input() -> None:
     '''
     Set up the user input handling (files and metadata)
     # which are not created in the same order.
     st.divider()
+    st.title("Input your images")
     # create and style a container for the file uploader/other one-off inputs
     st.markdown('<style>.st-key-container_file_uploader_id { border: 1px solid skyblue; border-radius: 5px; }</style>', unsafe_allow_html=True)

src/main.py DELETED Viewed

@@ -1,319 +0,0 @@
-import logging
-import os
-import pandas as pd
-import streamlit as st
-import folium
-from streamlit_folium import st_folium
-from transformers import pipeline
-from transformers import AutoModelForImageClassification
-from maps.obs_map import add_obs_map_header
-from classifier.classifier_image import add_classifier_header
-from datasets import disable_caching
-disable_caching()
-import whale_gallery as gallery
-import whale_viewer as viewer
-from input.input_handling import setup_input, check_inputs_are_set
-from input.input_handling import init_input_container_states, add_input_UI_elements, init_input_data_session_states
-from input.input_handling import dbg_show_observation_hashes
-from maps.alps_map import present_alps_map
-from maps.obs_map import present_obs_map
-from utils.st_logs import parse_log_buffer, init_logging_session_states
-from utils.workflow_ui import refresh_progress_display, init_workflow_viz, init_workflow_session_states
-from hf_push_observations import push_all_observations
-from classifier.classifier_image import cetacean_just_classify, cetacean_show_results_and_review, cetacean_show_results, init_classifier_session_states
-from classifier.classifier_hotdog import hotdog_classify
-# setup for the ML model on huggingface (our wrapper)
-os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"
-#classifier_revision = '0f9c15e2db4d64e7f622ade518854b488d8d35e6'
-classifier_revision = 'main' # default/latest version
-# and the dataset of observations (hf dataset in our space)
-dataset_id = "Saving-Willy/temp_dataset"
-data_files = "data/train-00000-of-00001.parquet"
-USE_BASIC_MAP = False
-DEV_SIDEBAR_LIB = True
-# one toggle for all the extra debug text
-if "MODE_DEV_STATEFUL" not in st.session_state:
-    st.session_state.MODE_DEV_STATEFUL = False
-# get a global var for logger accessor in this module
-LOG_LEVEL = logging.DEBUG
-g_logger = logging.getLogger(__name__)
-g_logger.setLevel(LOG_LEVEL)
-st.set_page_config(layout="wide")
-# initialise various session state variables
-init_logging_session_states() # logging init should be early
-init_workflow_session_states()
-init_input_data_session_states()
-init_input_container_states()
-init_workflow_viz()
-init_classifier_session_states()
-def main() -> None:
-    """
-    Main entry point to set up the streamlit UI and run the application.
-    The organisation is as follows:
-    1. observation input (a new observations) is handled in the sidebar
-    2. the rest of the interface is organised in tabs:
-        - cetean classifier
-        - hotdog classifier
-        - map to present the obersvations
-        - table of recent log entries
-        - gallery of whale images
-    The majority of the tabs are instantiated from modules. Currently the two
-    classifiers are still in-line here.
-    """
-    g_logger.info("App started.")
-    g_logger.warning(f"[D] Streamlit version: {st.__version__}. Python version: {os.sys.version}")
-    #g_logger.debug("debug message")
-    #g_logger.info("info message")
-    #g_logger.warning("warning message")
-    # Streamlit app
-    tab_inference, tab_hotdogs, tab_map, tab_coords, tab_log, tab_gallery = \
-        st.tabs(["Cetecean classifier", "Hotdog classifier", "Map", "*:gray[Dev:coordinates]*", "Log", "Beautiful cetaceans"])
-    # put this early so the progress indicator is at the top (also refreshed at end)
-    refresh_progress_display()
-    # create a sidebar, and parse all the input (returned as `observations` object)
-    with st.sidebar:
-        # layout handling
-        add_input_UI_elements()
-        # input elements (file upload, text input, etc)
-        setup_input()
-    with tab_map:
-        # visual structure: a couple of toggles at the top, then the map inlcuding a
-        # dropdown for tileset selection.
-        add_obs_map_header()
-        tab_map_ui_cols = st.columns(2)
-        with tab_map_ui_cols[0]:
-            show_db_points = st.toggle("Show Points from DB", True)
-        with tab_map_ui_cols[1]:
-            dbg_show_extra = st.toggle("Show Extra points (test)", False)
-        if show_db_points:
-            # show a nicer map, observations marked, tileset selectable.
-            st_observation = present_obs_map(
-                dataset_id=dataset_id, data_files=data_files,
-                dbg_show_extra=dbg_show_extra)
-        else:
-            # development map.
-            st_observation = present_alps_map()
-    with tab_log:
-        handler = st.session_state['handler']
-        if handler is not None:
-            records = parse_log_buffer(handler.buffer)
-            st.dataframe(records[::-1], use_container_width=True,)
-            st.info(f"Length of records: {len(records)}")
-        else:
-            st.error("⚠️ No log handler found!")
-    with tab_coords:
-        # the goal of this tab is to allow selection of the new obsvation's location by map click/adjust.
-        st.markdown("Coming later! :construction:")
-        st.markdown(
-            """*The goal is to allow interactive definition for the coordinates of a new
-            observation, by click/drag points on the map.*""")
-        st.write("Click on the map to capture a location.")
-        #m = folium.Map(location=visp_loc, zoom_start=7)
-        mm = folium.Map(location=[39.949610, -75.150282], zoom_start=16)
-        folium.Marker( [39.949610, -75.150282], popup="Liberty Bell", tooltip="Liberty Bell"
-    ).add_to(mm)
-        st_data2 = st_folium(mm, width=725)
-        st.write("below the map...")
-        if st_data2['last_clicked'] is not None:
-            print(st_data2)
-            st.info(st_data2['last_clicked'])
-    with tab_gallery:
-        # here we make a container to allow filtering css properties
-        # specific to the gallery (otherwise we get side effects)
-        tg_cont = st.container(key="swgallery")
-        with tg_cont:
-            gallery.render_whale_gallery(n_cols=4)
-    # state handling re data_entry phases
-    # 0. no data entered yet -> display the file uploader thing
-    # 1. we have some images, but not all the metadata fields are done -> validate button shown, disabled
-    # 2. all data entered -> validate button enabled
-    # 3. validation button pressed, validation done -> enable the inference button.
-    #    - at this point do we also want to disable changes to the metadata selectors?
-    #    anyway, simple first.
-    if st.session_state.workflow_fsm.is_in_state('doing_data_entry'):
-        # can we advance state? - only when all inputs are set for all uploaded files
-        all_inputs_set = check_inputs_are_set(debug=True, empty_ok=False)
-        if all_inputs_set:
-            st.session_state.workflow_fsm.complete_current_state()
-            # -> data_entry_complete
-        else:
-            # button, disabled; no state change yet.
-            st.sidebar.button(":gray[*Validate*]", disabled=True, help="Please fill in all fields.")
-    if st.session_state.workflow_fsm.is_in_state('data_entry_complete'):
-        # can we advance state? - only when the validate button is pressed
-        if st.sidebar.button(":white_check_mark:[**Validate**]"):
-            # create a dictionary with the submitted observation
-            tab_log.info(f"{st.session_state.observations}")
-            df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
-            #df = pd.DataFrame(st.session_state.observations, index=[0])
-            with tab_coords:
-                st.table(df)
-            # there doesn't seem to be any actual validation here?? TODO: find validator function (each element is validated by the input box, but is there something at the whole image level?)
-            # hmm, maybe it should actually just be "I'm done with data entry"
-            st.session_state.workflow_fsm.complete_current_state()
-            # -> data_entry_validated
-    # state handling re inference phases (tab_inference)
-    # 3. validation button pressed, validation done -> enable the inference button.
-    # 4. inference button pressed -> ML started. | let's cut this one out, since it would only
-    #      make sense if we did it as an async action
-    # 5. ML done -> show results, and manual validation options
-    # 6. manual validation done -> enable the upload buttons
-    #
-    with tab_inference:
-        # inside the inference tab, on button press we call the model (on huggingface hub)
-        # which will be run locally.
-        # - the model predicts the top 3 most likely species from the input image
-        # - these species are shown
-        # - the user can override the species prediction using the dropdown
-        # - an observation is uploaded if the user chooses.
-        if st.session_state.MODE_DEV_STATEFUL:
-            dbg_show_observation_hashes()
-        add_classifier_header()
-        # if we are before data_entry_validated, show the button, disabled.
-        if not st.session_state.workflow_fsm.is_in_state_or_beyond('data_entry_validated'):
-            tab_inference.button(":gray[*Identify with cetacean classifier*]", disabled=True,
-                                help="Please validate inputs before proceeding",
-                                key="button_infer_ceteans")
-        if st.session_state.workflow_fsm.is_in_state('data_entry_validated'):
-            # show the button, enabled. If pressed, we start the ML model (And advance state)
-            if tab_inference.button("Identify with cetacean classifier",
-                                    key="button_infer_ceteans"):
-                cetacean_classifier = AutoModelForImageClassification.from_pretrained(
-                    "Saving-Willy/cetacean-classifier",
-                    revision=classifier_revision,
-                    trust_remote_code=True)
-                cetacean_just_classify(cetacean_classifier)
-                st.session_state.workflow_fsm.complete_current_state()
-                # trigger a refresh too (refreshhing the prog indicator means the script reruns and
-                # we can enter the next state - visualising the results / review)
-                # ok it doesn't if done programmatically. maybe interacting with teh button? check docs.
-                refresh_progress_display()
-                #TODO: validate this doesn't harm performance adversely.
-                st.rerun()
-        elif st.session_state.workflow_fsm.is_in_state('ml_classification_completed'):
-            # show the results, and allow manual validation
-            st.markdown("""### Inference results and manual validation/adjustment """)
-            if st.session_state.MODE_DEV_STATEFUL:
-                s = ""
-                for k, v in st.session_state.whale_prediction1.items():
-                    s += f"* Image {k}: {v}\n"
-                st.markdown(s)
-            # add a button to advance the state
-            if st.button("Confirm species predictions", help="Confirm that all species are selected correctly"):
-                st.session_state.workflow_fsm.complete_current_state()
-                # -> manual_inspection_completed
-                st.rerun()
-            cetacean_show_results_and_review()
-        elif st.session_state.workflow_fsm.is_in_state('manual_inspection_completed'):
-            # show the ML results, and allow the user to upload the observation
-            st.markdown("""### Inference Results (after manual validation) """)
-            if st.button("Upload all observations to THE INTERNET!"):
-                # let this go through to the push_all func, since it just reports to log for now.
-                push_all_observations(enable_push=False)
-                st.session_state.workflow_fsm.complete_current_state()
-                # -> data_uploaded
-                st.rerun()
-            cetacean_show_results()
-        elif st.session_state.workflow_fsm.is_in_state('data_uploaded'):
-            # the data has been sent. Lets show the observations again
-            # but no buttons to upload (or greyed out ok)
-            st.markdown("""### Observation(s) uploaded - thank you!""")
-            cetacean_show_results()
-            st.divider()
-            #df = pd.DataFrame(st.session_state.observations, index=[0])
-            df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
-            st.table(df)
-            # didn't decide what the next state is here - I think we are in the terminal state.
-            #st.session_state.workflow_fsm.complete_current_state()
-    # inside the hotdog tab, on button press we call a 2nd model (totally unrelated at present, just for demo
-    # purposes, an hotdog image classifier) which will be run locally.
-    # - this model predicts if the image is a hotdog or not, and returns probabilities
-    # - the input image is the same as for the ceteacean classifier - defined in the sidebar
-    tab_hotdogs.title("Hot Dog? Or Not?")
-    tab_hotdogs.write("""
-                *Run alternative classifer on input images. Here we are using
-                a binary classifier - hotdog or not - from
-                huggingface.co/julien-c/hotdog-not-hotdog.*""")
-    if tab_hotdogs.button("Get Hotdog Prediction"):
-        pipeline_hot_dog = pipeline(task="image-classification", model="julien-c/hotdog-not-hotdog")
-        if st.session_state.image is None:
-            st.info("Please upload an image first.")
-            #st.info(str(observations.to_dict()))
-        else:
-            hotdog_classify(pipeline_hot_dog, tab_hotdogs)
-    # after all other processing, we can show the stage/state
-    refresh_progress_display()
-if __name__ == "__main__":
-    main()

src/maps/obs_map.py CHANGED Viewed

@@ -1,18 +1,13 @@
 from typing import Tuple
 import logging
-import pandas as pd
-from datasets import load_dataset
-from datasets import DatasetDict, Dataset
-import time
 import streamlit as st
 import folium
 from streamlit_folium import st_folium
 import whale_viewer as viewer
 from utils.fix_tabrender import js_show_zeroheight_iframe
 m_logger = logging.getLogger(__name__)
 # we can set the log level locally for funcs in this module
@@ -66,13 +61,6 @@ _colors = [
 whale2color = {k: v for k, v in zip(viewer.WHALE_CLASSES, _colors)}
-presentation_data_schema = {
-    'lat': 'float',
-    'lon': 'float',
-    'species': 'str',
-}
 def create_map(tile_name:str, location:Tuple[float], zoom_start: int = 7) -> folium.Map:
     """
     Create a folium map with the specified tile layer
@@ -124,48 +112,8 @@ def create_map(tile_name:str, location:Tuple[float], zoom_start: int = 7) -> fol
     #folium.LayerControl().add_to(m)
     return m
-def try_download_dataset(dataset_id:str, data_files:str) -> dict:
-    """
-    Attempts to download a dataset from Hugging Face, catching any errors that occur.
-    Args:
-        dataset_id (str): The ID of the dataset to download.
-        data_files (str): The data files associated with the dataset.
-    Returns:
-        dict: A dictionary containing the dataset metadata if the download is successful,
-              or an empty dictionary if an error occurs.
-    """
-    m_logger.info(f"Starting to download dataset {dataset_id} from Hugging Face")
-    t1 = time.time()
-    try:
-        metadata:DatasetDict = load_dataset(dataset_id, data_files=data_files)
-        t2 = time.time(); elap = t2 - t1
-    except ValueError as e:
-        t2 = time.time(); elap = t2 - t1
-        msg = f"Error downloading dataset: {e}.  (after {elap:.2f}s)."
-        st.error(msg)
-        m_logger.error(msg)
-        metadata = {}
-    except Exception as e:
-        # catch all (other) exceptions and log them, handle them once isolated
-        t2 = time.time(); elap = t2 - t1
-        msg = f"!!Unknown Error!! downloading dataset: {e}.  (after {elap:.2f}s)."
-        st.error(msg)
-        m_logger.error(msg)
-        metadata = {}
-    msg = f"Downloaded dataset: (after {elap:.2f}s). "
-    m_logger.info(msg)
-    st.write(msg)
-    return metadata
-def present_obs_map(dataset_id:str = "Saving-Willy/Happywhale-kaggle",
-                    data_files:str = "data/train-00000-of-00001.parquet",
-                    dbg_show_extra:bool = False) -> dict:
     """
     Render map plus tile selector, with markers for whale observations
@@ -186,20 +134,8 @@ def present_obs_map(dataset_id:str = "Saving-Willy/Happywhale-kaggle",
     """
-    # load/download data from huggingface dataset
-    metadata = try_download_dataset(dataset_id, data_files)
-    if not metadata:
-        # create an empty, but compliant dataframe
-        _df = pd.DataFrame(columns=presentation_data_schema).astype(presentation_data_schema)
-    else:
-        # make a pandas df that is compliant with folium/streamlit maps
-        _df = pd.DataFrame({
-            'lat': metadata["train"]["latitude"],
-            'lon': metadata["train"]["longitude"],
-            'species': metadata["train"]["predicted_class"],}
-        )
     if dbg_show_extra:
         # add a few samples to visualise colours
         _df.loc[len(_df)] = {'lat': 0, 'lon': 0, 'species': 'rough_toothed_dolphin'}

 from typing import Tuple
 import logging
 import streamlit as st
 import folium
 from streamlit_folium import st_folium
 import whale_viewer as viewer
 from utils.fix_tabrender import js_show_zeroheight_iframe
+from dataset.download import get_dataset
 m_logger = logging.getLogger(__name__)
 # we can set the log level locally for funcs in this module
 whale2color = {k: v for k, v in zip(viewer.WHALE_CLASSES, _colors)}
 def create_map(tile_name:str, location:Tuple[float], zoom_start: int = 7) -> folium.Map:
     """
     Create a folium map with the specified tile layer
     #folium.LayerControl().add_to(m)
     return m
+def present_obs_map(dbg_show_extra:bool = False) -> dict:
     """
     Render map plus tile selector, with markers for whale observations
     """
+    _df = get_dataset()
+    print(_df)
     if dbg_show_extra:
         # add a few samples to visualise colours
         _df.loc[len(_df)] = {'lat': 0, 'lon': 0, 'species': 'rough_toothed_dolphin'}

src/old_main.py ADDED Viewed

	@@ -0,0 +1,313 @@

+import logging
+import os
+import pandas as pd
+import streamlit as st
+import folium
+from streamlit_folium import st_folium
+# from transformers import pipeline
+# from transformers import AutoModelForImageClassification
+# from maps.obs_map import add_obs_map_header
+# from datasets import disable_caching
+# disable_caching()
+# import whale_gallery as gallery
+# import whale_viewer as viewer
+# from input.input_handling import setup_input, check_inputs_are_set
+# from input.input_handling import init_input_container_states, add_input_UI_elements, init_input_data_session_states
+# from input.input_handling import dbg_show_observation_hashes
+# from maps.alps_map import present_alps_map
+# from maps.obs_map import present_obs_map
+# from utils.st_logs import parse_log_buffer, init_logging_session_states
+# from utils.workflow_ui import refresh_progress_display, init_workflow_viz, init_workflow_session_states
+# from hf_push_observations import push_all_observations
+# from classifier.classifier_image import cetacean_just_classify, cetacean_show_results_and_review, cetacean_show_results, init_classifier_session_states
+# from classifier.classifier_hotdog import hotdog_classify
+# # setup for the ML model on huggingface (our wrapper)
+# os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"
+#classifier_revision = '0f9c15e2db4d64e7f622ade518854b488d8d35e6'
+# classifier_revision = 'main' # default/latest version
+# # and the dataset of observations (hf dataset in our space)
+# dataset_id = "Saving-Willy/temp_dataset"
+# data_files = "data/train-00000-of-00001.parquet"
+# USE_BASIC_MAP = False
+# DEV_SIDEBAR_LIB = True
+# # one toggle for all the extra debug text
+# if "MODE_DEV_STATEFUL" not in st.session_state:
+#     st.session_state.MODE_DEV_STATEFUL = False
+# get a global var for logger accessor in this module
+# LOG_LEVEL = logging.DEBUG
+# g_logger = logging.getLogger(__name__)
+# g_logger.setLevel(LOG_LEVEL)
+# st.set_page_config(layout="wide")
+def main() -> None:
+    """
+    Main entry point to set up the streamlit UI and run the application.
+    The organisation is as follows:
+    1. observation input (a new observations) is handled in the sidebar
+    2. the rest of the interface is organised in tabs:
+        - cetean classifier
+        - hotdog classifier
+        - map to present the obersvations
+        - table of recent log entries
+        - gallery of whale images
+    The majority of the tabs are instantiated from modules. Currently the two
+    classifiers are still in-line here.
+    """
+    # g_logger.info("App started.")
+    # g_logger.warning(f"[D] Streamlit version: {st.__version__}. Python version: {os.sys.version}")
+    #g_logger.debug("debug message")
+    #g_logger.info("info message")
+    #g_logger.warning("warning message")
+    # Streamlit app
+    # tab_inference, tab_hotdogs, tab_map, tab_coords, tab_log, tab_gallery = \
+    #     st.tabs(["Cetecean classifier", "Hotdog classifier", "Map", "*:gray[Dev:coordinates]*", "Log", "Beautiful cetaceans"])
+    # # put this early so the progress indicator is at the top (also refreshed at end)
+    # refresh_progress_display()
+    # # create a sidebar, and parse all the input (returned as `observations` object)
+    # with st.sidebar:
+    #     # layout handling
+    #     add_input_UI_elements()
+    #     # input elements (file upload, text input, etc)
+    #     setup_input()
+    # with tab_map:
+    #     # visual structure: a couple of toggles at the top, then the map inlcuding a
+    #     # dropdown for tileset selection.
+    #     add_obs_map_header()
+    #     tab_map_ui_cols = st.columns(2)
+    #     with tab_map_ui_cols[0]:
+    #         show_db_points = st.toggle("Show Points from DB", True)
+    #     with tab_map_ui_cols[1]:
+    #         dbg_show_extra = st.toggle("Show Extra points (test)", False)
+    #     if show_db_points:
+    #         # show a nicer map, observations marked, tileset selectable.
+    #         st_observation = present_obs_map(
+    #             dataset_id=dataset_id, data_files=data_files,
+    #             dbg_show_extra=dbg_show_extra)
+    #     else:
+    #         # development map.
+    #         st_observation = present_alps_map()
+    # with tab_log:
+    #     handler = st.session_state['handler']
+    #     if handler is not None:
+    #         records = parse_log_buffer(handler.buffer)
+    #         st.dataframe(records[::-1], use_container_width=True,)
+    #         st.info(f"Length of records: {len(records)}")
+    #     else:
+    #         st.error("⚠️ No log handler found!")
+    # with tab_coords:
+    #     # the goal of this tab is to allow selection of the new obsvation's location by map click/adjust.
+    #     st.markdown("Coming later! :construction:")
+    #     st.markdown(
+    #         """*The goal is to allow interactive definition for the coordinates of a new
+    #         observation, by click/drag points on the map.*""")
+    #     st.write("Click on the map to capture a location.")
+    #     #m = folium.Map(location=visp_loc, zoom_start=7)
+    #     mm = folium.Map(location=[39.949610, -75.150282], zoom_start=16)
+    #     folium.Marker( [39.949610, -75.150282], popup="Liberty Bell", tooltip="Liberty Bell"
+    # ).add_to(mm)
+    #     st_data2 = st_folium(mm, width=725)
+    #     st.write("below the map...")
+    #     if st_data2['last_clicked'] is not None:
+    #         print(st_data2)
+    #         st.info(st_data2['last_clicked'])
+    # with tab_gallery:
+        # # here we make a container to allow filtering css properties
+        # # specific to the gallery (otherwise we get side effects)
+        # tg_cont = st.container(key="swgallery")
+        # with tg_cont:
+        #     gallery.render_whale_gallery(n_cols=4)
+    # state handling re data_entry phases
+    # 0. no data entered yet -> display the file uploader thing
+    # 1. we have some images, but not all the metadata fields are done -> validate button shown, disabled
+    # 2. all data entered -> validate button enabled
+    # 3. validation button pressed, validation done -> enable the inference button.
+    #    - at this point do we also want to disable changes to the metadata selectors?
+    #    anyway, simple first.
+    # if st.session_state.workflow_fsm.is_in_state('doing_data_entry'):
+    #     # can we advance state? - only when all inputs are set for all uploaded files
+    #     all_inputs_set = check_inputs_are_set(debug=True, empty_ok=False)
+    #     if all_inputs_set:
+    #         st.session_state.workflow_fsm.complete_current_state()
+    #         # -> data_entry_complete
+    #     else:
+    #         # button, disabled; no state change yet.
+    #         st.sidebar.button(":gray[*Validate*]", disabled=True, help="Please fill in all fields.")
+    # if st.session_state.workflow_fsm.is_in_state('data_entry_complete'):
+    #     # can we advance state? - only when the validate button is pressed
+    #     if st.sidebar.button(":white_check_mark:[**Validate**]"):
+    #         # create a dictionary with the submitted observation
+    #         tab_log.info(f"{st.session_state.observations}")
+    #         df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
+    #         #df = pd.DataFrame(st.session_state.observations, index=[0])
+    #         with tab_coords:
+    #             st.table(df)
+    #         # there doesn't seem to be any actual validation here?? TODO: find validator function (each element is validated by the input box, but is there something at the whole image level?)
+    #         # hmm, maybe it should actually just be "I'm done with data entry"
+    #         st.session_state.workflow_fsm.complete_current_state()
+    #         # -> data_entry_validated
+    # state handling re inference phases (tab_inference)
+    # 3. validation button pressed, validation done -> enable the inference button.
+    # 4. inference button pressed -> ML started. | let's cut this one out, since it would only
+    #      make sense if we did it as an async action
+    # 5. ML done -> show results, and manual validation options
+    # 6. manual validation done -> enable the upload buttons
+    #
+    # with tab_inference:
+    #     # inside the inference tab, on button press we call the model (on huggingface hub)
+    #     # which will be run locally.
+    #     # - the model predicts the top 3 most likely species from the input image
+    #     # - these species are shown
+    #     # - the user can override the species prediction using the dropdown
+    #     # - an observation is uploaded if the user chooses.
+    #     if st.session_state.MODE_DEV_STATEFUL:
+    #         dbg_show_observation_hashes()
+    #     add_classifier_header()
+    #     # if we are before data_entry_validated, show the button, disabled.
+    #     if not st.session_state.workflow_fsm.is_in_state_or_beyond('data_entry_validated'):
+    #         tab_inference.button(":gray[*Identify with cetacean classifier*]", disabled=True,
+    #                             help="Please validate inputs before proceeding",
+    #                             key="button_infer_ceteans")
+    #     if st.session_state.workflow_fsm.is_in_state('data_entry_validated'):
+    #         # show the button, enabled. If pressed, we start the ML model (And advance state)
+    #         if tab_inference.button("Identify with cetacean classifier",
+    #                                 key="button_infer_ceteans"):
+    #             cetacean_classifier = AutoModelForImageClassification.from_pretrained(
+    #                 "Saving-Willy/cetacean-classifier",
+    #                 revision=classifier_revision,
+    #                 trust_remote_code=True)
+    #             cetacean_just_classify(cetacean_classifier)
+    #             st.session_state.workflow_fsm.complete_current_state()
+    #             # trigger a refresh too (refreshhing the prog indicator means the script reruns and
+    #             # we can enter the next state - visualising the results / review)
+    #             # ok it doesn't if done programmatically. maybe interacting with teh button? check docs.
+    #             refresh_progress_display()
+    #             #TODO: validate this doesn't harm performance adversely.
+    #             st.rerun()
+    #     elif st.session_state.workflow_fsm.is_in_state('ml_classification_completed'):
+    #         # show the results, and allow manual validation
+    #         st.markdown("""### Inference results and manual validation/adjustment """)
+    #         if st.session_state.MODE_DEV_STATEFUL:
+    #             s = ""
+    #             for k, v in st.session_state.whale_prediction1.items():
+    #                 s += f"* Image {k}: {v}\n"
+    #             st.markdown(s)
+    #         # add a button to advance the state
+    #         if st.button("Confirm species predictions", help="Confirm that all species are selected correctly"):
+    #             st.session_state.workflow_fsm.complete_current_state()
+    #             # -> manual_inspection_completed
+    #             st.rerun()
+    #         cetacean_show_results_and_review()
+    #     elif st.session_state.workflow_fsm.is_in_state('manual_inspection_completed'):
+    #         # show the ML results, and allow the user to upload the observation
+    #         st.markdown("""### Inference Results (after manual validation) """)
+    #         if st.button("Upload all observations to THE INTERNET!"):
+    #             # let this go through to the push_all func, since it just reports to log for now.
+    #             push_all_observations(enable_push=False)
+    #             st.session_state.workflow_fsm.complete_current_state()
+    #             # -> data_uploaded
+    #             st.rerun()
+    #         cetacean_show_results()
+    #     elif st.session_state.workflow_fsm.is_in_state('data_uploaded'):
+    #         # the data has been sent. Lets show the observations again
+    #         # but no buttons to upload (or greyed out ok)
+    #         st.markdown("""### Observation(s) uploaded - thank you!""")
+    #         cetacean_show_results()
+    #         st.divider()
+    #         #df = pd.DataFrame(st.session_state.observations, index=[0])
+    #         df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
+    #         st.table(df)
+    #         # didn't decide what the next state is here - I think we are in the terminal state.
+    #         #st.session_state.workflow_fsm.complete_current_state()
+    # # inside the hotdog tab, on button press we call a 2nd model (totally unrelated at present, just for demo
+    # # purposes, an hotdog image classifier) which will be run locally.
+    # # - this model predicts if the image is a hotdog or not, and returns probabilities
+    # # - the input image is the same as for the ceteacean classifier - defined in the sidebar
+    # tab_hotdogs.title("Hot Dog? Or Not?")
+    # tab_hotdogs.write("""
+    #             *Run alternative classifer on input images. Here we are using
+    #             a binary classifier - hotdog or not - from
+    #             huggingface.co/julien-c/hotdog-not-hotdog.*""")
+    # if tab_hotdogs.button("Get Hotdog Prediction"):
+    #     pipeline_hot_dog = pipeline(task="image-classification", model="julien-c/hotdog-not-hotdog")
+    #     if st.session_state.image is None:
+    #         st.info("Please upload an image first.")
+    #         #st.info(str(observations.to_dict()))
+    #     else:
+    #         hotdog_classify(pipeline_hot_dog, tab_hotdogs)
+    # # after all other processing, we can show the stage/state
+    # refresh_progress_display()
+if __name__ == "__main__":
+    main()

src/pages/1_🐋_about.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import streamlit as st
+st.set_page_config(
+    page_title="About",
+    page_icon="🐋",
+)
+st.markdown(
+    """
+# About
+We created this web app in [a hackathon](https://sdsc-hackathons.ch/projectPage?projectRef=vUt8BfDJXaAs0UfOesXI|XyWLFpqjq3CX3zrM4uz8).
+This interface is a Proof of Concept of a Community-driven Research Data Infrastructure for the Cetacean Conservation Community.
+Please reach out on [the project Github issues](https://github.com/sdsc-ordes/saving-willy/issues) for feedback, suggestions, or if you want to join the project.
+# Open Source Resources
+## UI Code
+- The [space is hosted on Hugging Face](https://huggingface.co/spaces/Saving-Willy/saving-willy-space).
+- The [UI code is available on Github](https://github.com/sdsc-ordes/saving-willy).
+- The [development space](https://huggingface.co/spaces/Saving-Willy/saving-willy-dev) is also hosted publically on Hugging Face.
+## The Machine Learning Models
+- The [model](https://huggingface.co/Saving-Willy/cetacean-classifier) is hosted on Hugging Face.
+- The [original Kaggle model code](https://github.com/knshnb/kaggle-happywhale-1st-place) is open on Github as well.
+## The Data
+(temporary setup, a more stable database is probably desired.)
+- The dataset is hosted on Hugging Face.
+- The [dataset syncing code](https://github.com/vancauwe/saving-willy-data-sync) is available on Github.
+# Credits and Thanks
+## Developers
+- [Rob Mills](https://github.com/rmm-ch)
+- [Laure Vancauwenberghe](https://github.com/vancauwe)
+## Special Thanks
+- [EDMAKTUB](https://edmaktub.org) for their advice.
+- [Swiss Data Science Center](https://www.datascience.ch) for [the hackathon that started the project](https://sdsc-hackathons.ch/projectPage?projectRef=vUt8BfDJXaAs0UfOesXI|XyWLFpqjq3CX3zrM4uz8).
+- [HappyWhale](https://happywhale.com) for launching [the Kaggle challenge that led to model development](https://www.kaggle.com/competitions/happy-whale-and-dolphin).
+"""
+)

src/pages/2_🌍_map.py ADDED Viewed

	@@ -0,0 +1,36 @@

+import streamlit as st
+import logging
+from datasets import disable_caching
+disable_caching()
+st.set_page_config(
+    page_title="About",
+    page_icon="🌍",
+    layout="wide",
+)
+from maps.obs_map import add_obs_map_header
+from maps.alps_map import present_alps_map
+from maps.obs_map import present_obs_map
+############################################################
+g_logger = logging.getLogger(__name__)
+USE_BASIC_MAP = False
+DEV_SIDEBAR_LIB = True
+############################################################
+# visual structure: a couple of toggles at the top, then the map inlcuding a
+# dropdown for tileset selection.
+add_obs_map_header()
+tab_map_ui_cols = st.columns(2)
+with tab_map_ui_cols[0]:
+    show_db_points = st.toggle("Show Points from DB", True)
+with tab_map_ui_cols[1]:
+    dbg_show_extra = st.toggle("Show Extra points (test)", False)
+if show_db_points:
+    # show a nicer map, observations marked, tileset selectable.
+    st_observation = present_obs_map(dbg_show_extra=dbg_show_extra)
+else:
+    # development map.
+    st_observation = present_alps_map()

src/pages/3_🤝_data requests.py ADDED Viewed

	@@ -0,0 +1,73 @@

+import streamlit as st
+st.set_page_config(
+    page_title="Requests",
+    page_icon="🤝",
+)
+from dataset.data_requests import data_prep, show_new_data_view
+st.title("Data Requests")
+st.write("This page is ensure findability of data across the community.")
+st.write("You can filter the metadata by longitude, latitude and date. You can select data from multiple actors, for multiple species and make a grouped request. " \
+"The request for the relevant data will be adressed individually to each owner. ")
+# Initialize the default data view
+df = data_prep()
+if 'checkbox_states' not in st.session_state:
+    st.session_state.checkbox_states = {}
+if 'lat_range' not in st.session_state:
+    st.session_state.lat_range = (float(df['lat'].min()), float(df['lat'].max()))
+if 'lon_range' not in st.session_state:
+    st.session_state.lon_range = (df['lon'].min(), df['lon'].max())
+if 'date_range' not in st.session_state:
+    st.session_state.date_range = (df['date'].min(), df['date'].max())
+# Request button at the bottom
+if st.button("REQUEST DATA",
+             type="primary",
+             icon="🐚"):
+    selected = [k for k, v in st.session_state.checkbox_states.items() if v]
+    if selected:
+        st.success(f"Request submitted for: the specie {', '.join(selected)}")
+    else:
+        st.warning("No selections made.")
+# Latitude range filter
+lat_min, lat_max = float(df['lat'].min()), float(df['lat'].max())
+lat_range = st.sidebar.slider(
+    "Latitude range",
+    min_value=float(df['lat'].min()),
+    max_value=float(df['lat'].max()),
+    value=st.session_state.get("lat_range", (df['lat'].min(), df['lat'].max()))
+)
+st.session_state.lat_range = lat_range
+# Longitude range filter
+lon_min, lon_max = float(df['lon'].min()), float(df['lon'].max())
+lon_range = st.sidebar.slider(
+    "Longitude range",
+    min_value=float(df['lon'].min()),
+    max_value=float(df['lon'].max()),
+    value=st.session_state.get("lon_range", (df['lon'].min(), df['lon'].max()))
+)
+st.session_state.lon_range = lon_range
+# Date range filter
+date_range = st.sidebar.date_input(
+    "Date range",
+    value=st.session_state.get("date_range", (df['date'].min(), df['date'].max())),
+    min_value=df['date'].min(),
+    max_value=df['date'].max()
+)
+st.session_state.date_range = date_range
+# Show authors per specie
+show_new_data_view(df)

src/pages/4_🔥_classifiers.py ADDED Viewed

	@@ -0,0 +1,198 @@

+import streamlit as st
+import os
+import pandas as pd
+import logging
+st.set_page_config(
+    page_title="ML Models",
+    page_icon="🔥",
+)
+from utils.st_logs import init_logging_session_states
+from transformers import pipeline
+from transformers import AutoModelForImageClassification
+from classifier.classifier_image import add_classifier_header
+from input.input_handling import setup_input, check_inputs_are_set
+from input.input_handling import init_input_container_states, add_input_UI_elements, init_input_data_session_states
+from input.input_handling import dbg_show_observation_hashes
+from utils.workflow_ui import refresh_progress_display, init_workflow_viz, init_workflow_session_states
+from dataset.hf_push_observations import push_all_observations
+from classifier.classifier_image import cetacean_just_classify, cetacean_show_results_and_review, cetacean_show_results, init_classifier_session_states
+from classifier.classifier_hotdog import hotdog_classify
+############################################################
+classifier_name = "Saving-Willy/cetacean-classifier"
+#classifier_revision = '0f9c15e2db4d64e7f622ade518854b488d8d35e6'
+classifier_revision = 'main' # default/latest version
+############################################################
+g_logger = logging.getLogger(__name__)
+# setup for the ML model on huggingface (our wrapper)
+os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"
+# one toggle for all the extra debug text
+if "MODE_DEV_STATEFUL" not in st.session_state:
+    st.session_state.MODE_DEV_STATEFUL = False
+############################################################
+ # Streamlit app
+tab_inference, tab_hotdogs= \
+    st.tabs(["Cetecean classifier", "Hotdog classifier"])
+# initialise various session state variables
+init_logging_session_states() # logging init should be early
+init_workflow_session_states()
+init_input_data_session_states()
+init_input_container_states()
+init_workflow_viz()
+init_classifier_session_states()
+# put this early so the progress indicator is at the top (also refreshed at end)
+refresh_progress_display()
+# create a sidebar, and parse all the input (returned as `observations` object)
+with st.sidebar:
+    # layout handling
+    add_input_UI_elements()
+    # input elements (file upload, text input, etc)
+    setup_input()
+with tab_inference:
+    if st.session_state.workflow_fsm.is_in_state('doing_data_entry'):
+            # can we advance state? - only when all inputs are set for all uploaded files
+            all_inputs_set = check_inputs_are_set(debug=True, empty_ok=False)
+            if all_inputs_set:
+                st.session_state.workflow_fsm.complete_current_state()
+                # -> data_entry_complete
+            else:
+                # button, disabled; no state change yet.
+                st.sidebar.button(":gray[*Validate*]", disabled=True, help="Please fill in all fields.")
+    if st.session_state.workflow_fsm.is_in_state('data_entry_complete'):
+        # can we advance state? - only when the validate button is pressed
+        if st.sidebar.button(":white_check_mark:[**Validate**]"):
+            # create a dictionary with the submitted observation
+            g_logger.info(f"{st.session_state.observations}")
+            df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
+            # with tab_coords:
+            #     st.table(df)
+            # now disable all the input boxes / widgets
+            st.session_state.input_disabled = True
+            # there doesn't seem to be any actual validation here?? TODO: find validator function (each element is validated by the input box, but is there something at the whole image level?)
+            # hmm, maybe it should actually just be "I'm done with data entry"
+            st.session_state.workflow_fsm.complete_current_state()
+            # -> data_entry_validated
+            st.rerun() # refresh so the input widgets are immediately disabled
+    if st.session_state.MODE_DEV_STATEFUL:
+                dbg_show_observation_hashes()
+    add_classifier_header()
+    # if we are before data_entry_validated, show the button, disabled.
+    if not st.session_state.workflow_fsm.is_in_state_or_beyond('data_entry_validated'):
+        tab_inference.button(":gray[*Identify with cetacean classifier*]", disabled=True,
+                            help="Please validate inputs before proceeding",
+                            key="button_infer_ceteans")
+    if st.session_state.workflow_fsm.is_in_state('data_entry_validated'):
+        # show the button, enabled. If pressed, we start the ML model (And advance state)
+        if tab_inference.button("Identify with cetacean classifier",
+                                key="button_infer_ceteans"):
+            cetacean_classifier = AutoModelForImageClassification.from_pretrained(
+                classifier_name,
+                revision=classifier_revision,
+                trust_remote_code=True)
+            cetacean_just_classify(cetacean_classifier)
+            st.session_state.workflow_fsm.complete_current_state()
+            # trigger a refresh too (refreshhing the prog indicator means the script reruns and
+            # we can enter the next state - visualising the results / review)
+            # ok it doesn't if done programmatically. maybe interacting with teh button? check docs.
+            refresh_progress_display()
+            #TODO: validate this doesn't harm performance adversely.
+            st.rerun()
+    elif st.session_state.workflow_fsm.is_in_state('ml_classification_completed'):
+        # show the results, and allow manual validation
+        st.markdown("""### Inference results and manual validation/adjustment """)
+        if st.session_state.MODE_DEV_STATEFUL:
+            s = ""
+            for k, v in st.session_state.whale_prediction1.items():
+                s += f"* Image {k}: {v}\n"
+            st.markdown(s)
+        # add a button to advance the state
+        if st.button("I have looked over predictions and confirm correct species", icon= "👀",
+                    type="primary",
+                    help="Confirm that all species are selected correctly"):
+            st.session_state.workflow_fsm.complete_current_state()
+            # -> manual_inspection_completed
+            st.rerun()
+        cetacean_show_results_and_review()
+    elif st.session_state.workflow_fsm.is_in_state('manual_inspection_completed'):
+        # show the ML results, and allow the user to upload the observation
+        st.markdown("""### Inference Results (after manual validation) """)
+        if st.button("Upload all observations to THE INTERNET!", icon= "⬆️",
+                    type="primary",):
+            # let this go through to the push_all func, since it just reports to log for now.
+            push_all_observations(enable_push=False)
+            st.session_state.workflow_fsm.complete_current_state()
+            # -> data_uploaded
+            st.rerun()
+        cetacean_show_results()
+    elif st.session_state.workflow_fsm.is_in_state('data_uploaded'):
+        # the data has been sent. Lets show the observations again
+        # but no buttons to upload (or greyed out ok)
+        st.markdown("""### Observation(s) uploaded - thank you!""")
+        cetacean_show_results()
+        st.divider()
+        df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
+        st.table(df)
+        # didn't decide what the next state is here - I think we are in the terminal state.
+        #st.session_state.workflow_fsm.complete_current_state()
+with tab_hotdogs:
+    # inside the hotdog tab, on button press we call a 2nd model (totally unrelated at present, just for demo
+    # purposes, an hotdog image classifier) which will be run locally.
+    # - this model predicts if the image is a hotdog or not, and returns probabilities
+    # - the input image is the same as for the ceteacean classifier - defined in the sidebar
+    tab_hotdogs.title("Hot Dog? Or Not?")
+    tab_hotdogs.write("""
+                *Run alternative classifer on input images. Here we are using
+                a binary classifier - hotdog or not - from
+                huggingface.co/julien-c/hotdog-not-hotdog.*""")
+    if tab_hotdogs.button("Get Hotdog Prediction"):
+        pipeline_hot_dog = pipeline(task="image-classification", model="julien-c/hotdog-not-hotdog")
+        if st.session_state.image is None:
+            st.info("Please upload an image first.")
+            #st.info(str(observations.to_dict()))
+        else:
+            hotdog_classify(pipeline_hot_dog, tab_hotdogs)
+# after all other processing, we can show the stage/state
+refresh_progress_display()

src/pages/5_📐_benchmarking.py ADDED Viewed

	@@ -0,0 +1,15 @@

+import streamlit as st
+st.set_page_config(
+    page_title="Benchmarking",
+    page_icon="📐",
+    layout="wide",
+)
+st.title("Benchmark of ML models")
+st.write("All credits go to the original Leaderboard on hugging face: https://huggingface.co/spaces/opencompass/opencompass-llm-leaderboard"
+)
+st.write("This image serves as a pure placeholder to illustrate benchmarking possibilities.")
+st.image("src/images/design/leaderboard.png", caption="Benchmarking models")

src/pages/6_🏆_challenges.py ADDED Viewed

	@@ -0,0 +1,24 @@

+import streamlit as st
+st.set_page_config(
+    page_title="Challenges",
+    page_icon="🏆",
+    layout="wide",
+)
+st.title("Research Challenges (Kaggle)")
+st.write("Working together to innovate is essential. Here are the current and past challenges on Kaggle organized around cetacean conservation.")
+st.link_button("Click here for the full challenge.",
+               url = "https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.kaggle.com/competitions/happy-whale-and-dolphin&ved=2ahUKEwiIoPjCicaMAxVrzgIHHaDYH6MQFnoECAoQAQ&usg=AOvVaw3Cl2cK7ZwU_jTyDeA5Yg1m"
+               )
+st.image("src/images/design/challenge2.png",
+    caption=  "Ted Cheeseman, Ken Southerland, Walter Reade, and Addison Howard. Happywhale - Whale and Dolphin Identification. https://kaggle.com/competitions/happy-whale-and-dolphin, 2022. Kaggle.")
+st.link_button("Click here for the full challenge.",
+                url="https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.kaggle.com/competitions/humpback-whale-identification&ved=2ahUKEwiIoPjCicaMAxVrzgIHHaDYH6MQFnoECB8QAQ&usg=AOvVaw0IdiKQC3GpODtI-fBt-yV3"
+                )
+st.image("src/images/design/challenge1.png",
+    caption ="Addison Howard, inversion, Ken Southerland, and Ted Cheeseman. Humpback Whale Identification. https://kaggle.com/competitions/humpback-whale-identification, 2018. Kaggle.")

src/pages/7_🌊_gallery.py ADDED Viewed

	@@ -0,0 +1,17 @@

+import streamlit as st
+st.set_page_config(
+    page_title="ML Models",
+    page_icon="🌊",
+    layout="wide",
+)
+from utils.st_logs import parse_log_buffer, init_logging_session_states
+import whale_gallery as gallery
+import whale_viewer as viewer
+# here we make a container to allow filtering css properties
+# specific to the gallery (otherwise we get side effects)
+tg_cont = st.container(key="swgallery")
+with tg_cont:
+    gallery.render_whale_gallery(n_cols=4)

src/pages/8_🚧_coordinates.py ADDED Viewed

	@@ -0,0 +1,28 @@

+import streamlit as st
+import folium
+from streamlit_folium import st_folium
+st.set_page_config(
+    page_title="Coordinates",
+    page_icon="🚧",
+    layout="wide",
+)
+# the goal of this tab is to allow selection of the new obsvation's location by map click/adjust.
+st.markdown("Coming later! :construction:")
+st.markdown(
+    """*The goal is to allow interactive definition for the coordinates of a new
+    observation, by click/drag points on the map.*""")
+st.write("Click on the map to capture a location.")
+#m = folium.Map(location=visp_loc, zoom_start=7)
+mm = folium.Map(location=[39.949610, -75.150282], zoom_start=16)
+folium.Marker( [39.949610, -75.150282], popup="Liberty Bell", tooltip="Liberty Bell"
+).add_to(mm)
+st_data2 = st_folium(mm, width=725)
+st.write("below the map...")
+if st_data2['last_clicked'] is not None:
+    print(st_data2)
+    st.info(st_data2['last_clicked'])

src/pages/📊_logs.py ADDED Viewed

	@@ -0,0 +1,17 @@

+import streamlit as st
+import os
+st.set_page_config(
+    page_title="Logs",
+    page_icon="📊",
+)
+from utils.st_logs import parse_log_buffer
+handler = st.session_state['handler']
+if handler is not None:
+    records = parse_log_buffer(handler.buffer)
+    st.dataframe(records[::-1], use_container_width=True,)
+    st.info(f"Length of records: {len(records)}")
+else:
+    st.error("⚠️ No log handler found!")

src/utils/metadata_handler.py CHANGED Viewed

@@ -11,10 +11,11 @@ def metadata2md(image_hash:str, debug:bool=False) -> str:
         str: Markdown-formatted key-value list of metadata
     """
     markdown_str = "\n"
     keys_to_print = ["author_email", "latitude", "longitude", "date", "time"]
     if debug:
-        keys_to_print += ["iamge_md5", "selected_class", "top_prediction", "class_overriden"]
     observation = st.session_state.public_observations.get(image_hash, {})

         str: Markdown-formatted key-value list of metadata
     """
+    print(debug)
     markdown_str = "\n"
     keys_to_print = ["author_email", "latitude", "longitude", "date", "time"]
     if debug:
+        keys_to_print += ["image_md5", "selected_class", "top_prediction", "class_overriden"]
     observation = st.session_state.public_observations.get(image_hash, {})

src/utils/workflow_ui.py CHANGED Viewed

@@ -9,6 +9,11 @@ def init_workflow_session_states():
     if "workflow_fsm" not in st.session_state:
         # create and init the state machine
         st.session_state.workflow_fsm = WorkflowFSM(FSM_STATES)
 def refresh_progress_display() -> None:
     """

     if "workflow_fsm" not in st.session_state:
         # create and init the state machine
         st.session_state.workflow_fsm = WorkflowFSM(FSM_STATES)
+    if "input_disabled" not in st.session_state:
+        # after workflow reaches some stage, disable chance to change inputs
+        st.session_state.input_disabled = False
 def refresh_progress_display() -> None:
     """

src/whale_viewer.py CHANGED Viewed

@@ -157,4 +157,6 @@ def display_whale(whale_classes:List[str], i:int, viewcontainer:DeltaGenerator=N
     image_path = os.path.join(current_dir, "src/images/references/")
     image = Image.open(image_path + df_whale_img_ref.loc[whale_classes[i], "WHALE_IMAGES"])
-    viewcontainer.image(image, caption=df_whale_img_ref.loc[whale_classes[i], "WHALE_REFERENCES"], use_column_width=True)

     image_path = os.path.join(current_dir, "src/images/references/")
     image = Image.open(image_path + df_whale_img_ref.loc[whale_classes[i], "WHALE_IMAGES"])
+    viewcontainer.image(image,
+                        caption=df_whale_img_ref.loc[whale_classes[i], "WHALE_REFERENCES"],
+                        use_column_width=True)

tests/{test_obs_map.py → test_dataset_download.py} RENAMED Viewed

@@ -1,6 +1,6 @@
 import pytest
 from unittest.mock import patch, MagicMock
-from maps.obs_map import try_download_dataset
 # tests for try_download_dataset
 # - the main aim here is to mock the function load_dataset which makes external HTTP requests,
@@ -9,10 +9,11 @@ from maps.obs_map import try_download_dataset
 #   is the return value, which should have similar form but change according to if an exception was raised or not
 # since this function uses st and m_logger to keep track of the download status, we need to mock them too
-@patch('maps.obs_map.load_dataset')
-@patch('maps.obs_map.st')
-@patch('maps.obs_map.m_logger')
-def test_try_download_dataset_success(mock_logger, mock_st, mock_load_dataset):
     # Mock the return value of load_dataset
     mock_load_dataset.return_value = {'train': {'latitude': [1], 'longitude': [2], 'predicted_class': ['whale']}}
@@ -25,13 +26,11 @@ def test_try_download_dataset_success(mock_logger, mock_st, mock_load_dataset):
     mock_load_dataset.assert_called_once_with(dataset_id, data_files=data_files)
     assert result == {'train': {'latitude': [1], 'longitude': [2], 'predicted_class': ['whale']}}
     mock_logger.info.assert_called_with("Downloaded dataset: (after 0.00s). ")
-    mock_st.write.assert_called_with("Downloaded dataset: (after 0.00s). ")
-@patch('maps.obs_map.load_dataset', side_effect=ValueError("Download failed"))
-@patch('maps.obs_map.st')
-@patch('maps.obs_map.m_logger')
-def test_try_download_dataset_failure_known(mock_logger, mock_st, mock_load_dataset):
     # testing the case where we've found (can reproduce by removing network connection)
     dataset_id = "test_dataset"
     data_files = "test_file"
@@ -41,15 +40,12 @@ def test_try_download_dataset_failure_known(mock_logger, mock_st, mock_load_data
     mock_logger.info.assert_any_call(f"Starting to download dataset {dataset_id} from Hugging Face")
     mock_load_dataset.assert_called_once_with(dataset_id, data_files=data_files)
     mock_logger.error.assert_called_with("Error downloading dataset: Download failed.  (after 0.00s).")
-    mock_st.error.assert_called_with("Error downloading dataset: Download failed.  (after 0.00s).")
     assert result == {}
     mock_logger.info.assert_called_with("Downloaded dataset: (after 0.00s). ")
-    mock_st.write.assert_called_with("Downloaded dataset: (after 0.00s). ")
-@patch('maps.obs_map.load_dataset', side_effect=Exception("Download engine corrupt"))
-@patch('maps.obs_map.st')
-@patch('maps.obs_map.m_logger')
-def test_try_download_dataset_failure_unknown(mock_logger, mock_st, mock_load_dataset):
     # the cases we haven't found, but should still be handled (maybe network error, etc)
     dataset_id = "test_dataset"
     data_files = "test_file"
@@ -59,7 +55,5 @@ def test_try_download_dataset_failure_unknown(mock_logger, mock_st, mock_load_da
     mock_logger.info.assert_any_call(f"Starting to download dataset {dataset_id} from Hugging Face")
     mock_load_dataset.assert_called_once_with(dataset_id, data_files=data_files)
     mock_logger.error.assert_called_with("!!Unknown Error!! downloading dataset: Download engine corrupt.  (after 0.00s).")
-    mock_st.error.assert_called_with("!!Unknown Error!! downloading dataset: Download engine corrupt.  (after 0.00s).")
     assert result == {}
     mock_logger.info.assert_called_with("Downloaded dataset: (after 0.00s). ")
-    mock_st.write.assert_called_with("Downloaded dataset: (after 0.00s). ")

 import pytest
 from unittest.mock import patch, MagicMock
+from dataset.download import try_download_dataset
 # tests for try_download_dataset
 # - the main aim here is to mock the function load_dataset which makes external HTTP requests,
 #   is the return value, which should have similar form but change according to if an exception was raised or not
 # since this function uses st and m_logger to keep track of the download status, we need to mock them too
+#@patch('maps.obs_map.load_dataset')
+#@patch('maps.obs_map.m_logger')
+@patch('dataset.download.load_dataset')
+@patch('dataset.download.m_logger')
+def test_try_download_dataset_success(mock_logger, mock_load_dataset):
     # Mock the return value of load_dataset
     mock_load_dataset.return_value = {'train': {'latitude': [1], 'longitude': [2], 'predicted_class': ['whale']}}
     mock_load_dataset.assert_called_once_with(dataset_id, data_files=data_files)
     assert result == {'train': {'latitude': [1], 'longitude': [2], 'predicted_class': ['whale']}}
     mock_logger.info.assert_called_with("Downloaded dataset: (after 0.00s). ")
+@patch('dataset.download.load_dataset', side_effect=ValueError("Download failed"))
+@patch('dataset.download.m_logger')
+def test_try_download_dataset_failure_known(mock_logger, mock_load_dataset):
     # testing the case where we've found (can reproduce by removing network connection)
     dataset_id = "test_dataset"
     data_files = "test_file"
     mock_logger.info.assert_any_call(f"Starting to download dataset {dataset_id} from Hugging Face")
     mock_load_dataset.assert_called_once_with(dataset_id, data_files=data_files)
     mock_logger.error.assert_called_with("Error downloading dataset: Download failed.  (after 0.00s).")
     assert result == {}
     mock_logger.info.assert_called_with("Downloaded dataset: (after 0.00s). ")
+@patch('dataset.download.load_dataset', side_effect=Exception("Download engine corrupt"))
+@patch('dataset.download.m_logger')
+def test_try_download_dataset_failure_unknown(mock_logger, mock_load_dataset):
     # the cases we haven't found, but should still be handled (maybe network error, etc)
     dataset_id = "test_dataset"
     data_files = "test_file"
     mock_logger.info.assert_any_call(f"Starting to download dataset {dataset_id} from Hugging Face")
     mock_load_dataset.assert_called_once_with(dataset_id, data_files=data_files)
     mock_logger.error.assert_called_with("!!Unknown Error!! downloading dataset: Download engine corrupt.  (after 0.00s).")
     assert result == {}
     mock_logger.info.assert_called_with("Downloaded dataset: (after 0.00s). ")

tests/test_demo_input_sidebar.py CHANGED Viewed

@@ -262,10 +262,10 @@ def test_two_input_files_realdata(mock_file_rv: MagicMock, mock_uploadedFile_Lis
     # and then there are plenty of visual elements, based on the image hashes.
     for hash in at.session_state.image_hashes:
         # check that each of the 4 inputs is present
-        assert at.sidebar.text_input(key=f"input_latitude_{hash}") is not None
-        assert at.sidebar.text_input(key=f"input_longitude_{hash}") is not None
-        assert at.sidebar.date_input(key=f"input_date_{hash}") is not None
-        assert at.sidebar.time_input(key=f"input_time_{hash}") is not None
     if 'demo_input_sidebar' in SCRIPT_UNDER_TEST:
         verify_metadata_in_demo_display(at, num_files)

     # and then there are plenty of visual elements, based on the image hashes.
     for hash in at.session_state.image_hashes:
         # check that each of the 4 inputs is present
+        assert at.sidebar.text_input(key=f"input_latitude_anchor_{hash}") is not None
+        assert at.sidebar.text_input(key=f"input_longitude_anchor_{hash}") is not None
+        assert at.sidebar.date_input(key=f"input_date_anchor_{hash}") is not None
+        assert at.sidebar.time_input(key=f"input_time_anchor_{hash}") is not None
     if 'demo_input_sidebar' in SCRIPT_UNDER_TEST:
         verify_metadata_in_demo_display(at, num_files)