Spaces:

nus-project
/

annotation-dashboard

Sleeping

App Files Files Community

Gintarė Zokaitytė commited on Jan 30

Commit

1a492df

1 Parent(s): 974c830

Fix task count on update

Browse files

Files changed (3) hide show

.env.example +2 -0
DEPLOY.md +0 -188
app.py +23 -5

.env.example ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ LABEL_STUDIO_URL=
2	+ LABEL_STUDIO_API_KEY=

DEPLOY.md DELETED Viewed

@@ -1,188 +0,0 @@
-# Deployment Guide
-Two easy options: **HuggingFace Spaces** or **Streamlit Cloud**
----
-## Option 1: HuggingFace Spaces (Recommended)
-### Step 1: Create Space
-1. Go to https://huggingface.co/new-space
-2. Choose a name (e.g., `annotation-dashboard`)
-3. Select **Streamlit** as the SDK
-4. Choose visibility (Public or Private)
-5. Click **Create Space**
-### Step 2: Upload Files
-Upload these 3 files:
-- ✅ `app.py`
-- ✅ `requirements.txt`
-- ✅ `.streamlit/config.toml`
-**How to upload:**
-- Click **Files** tab → **Add file** → Upload each file
-- Or use Git (see below)
-### Step 3: Add Secrets
-1. Go to **Settings** tab
-2. Scroll to **Repository secrets**
-3. Click **New secret**
-4. Add two secrets:
-```
-Name: LABEL_STUDIO_URL
-Value: https://your-labelstudio-instance.com
-```
-```
-Name: LABEL_STUDIO_API_KEY
-Value: your-api-key-here
-```
-### Step 4: Wait for Build
-- HuggingFace automatically builds your Space
-- Check **Logs** tab if there are issues
-- Dashboard will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME`
-### Using Git (Alternative)
-```bash
-# Clone your Space
-git clone https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
-cd SPACE_NAME
-# Copy files
-cp /path/to/annotation-dashboard/app.py .
-cp /path/to/annotation-dashboard/requirements.txt .
-mkdir -p .streamlit
-cp /path/to/annotation-dashboard/.streamlit/config.toml .streamlit/
-# Push
-git add .
-git commit -m "Deploy dashboard"
-git push
-```
----
-## Option 2: Streamlit Cloud
-### Step 1: Push to GitHub
-Your dashboard needs to be in a GitHub repository.
-```bash
-cd annotation-dashboard
-# Initialize git if needed
-git init
-git add app.py requirements.txt .streamlit/config.toml .gitignore
-git commit -m "Initial dashboard"
-# Create repo on GitHub (via web UI), then:
-git remote add origin https://github.com/YOUR_USERNAME/REPO_NAME.git
-git push -u origin main
-```
-### Step 2: Deploy on Streamlit Cloud
-1. Go to https://share.streamlit.io/
-2. Click **New app**
-3. Connect your GitHub account (if first time)
-4. Select:
-   - **Repository**: Your dashboard repo
-   - **Branch**: `main`
-   - **Main file path**: `app.py`
-5. Click **Deploy**
-### Step 3: Add Secrets
-1. Click **Advanced settings** (before deploying) or **⋮** → **Settings** (after)
-2. Go to **Secrets** section
-3. Add in TOML format:
-```toml
-LABEL_STUDIO_URL = "https://your-labelstudio-instance.com"
-LABEL_STUDIO_API_KEY = "your-api-key-here"
-```
-4. Click **Save**
-### Step 4: Access Dashboard
-Your app will be at: `https://YOUR_USERNAME-REPO_NAME.streamlit.app`
----
-## Comparison
-| Feature | HuggingFace Spaces | Streamlit Cloud |
-|---------|-------------------|-----------------|
-| **Setup** | Easier (upload files) | Requires GitHub repo |
-| **Free tier** | Generous | Limited hours/month |
-| **Custom domain** | Yes (paid) | Yes (paid) |
-| **Cache persistence** | ❌ No (ephemeral storage) | ❌ No (ephemeral storage) |
-| **Community** | ML/AI focused | Data science focused |
-| **Speed** | Fast | Fast |
-**Note**: Cache file (`.cache.pkl`) won't persist on either platform. It rebuilds on each cold start (~30s). For persistent cache, you'd need a database or external storage.
----
-## Get Your Label Studio API Key
-1. Log into Label Studio
-2. Click your profile (top right)
-3. **Account & Settings**
-4. Scroll to **Access Token**
-5. Copy the token
----
-## Troubleshooting
-### "Missing credentials" error
-**Fix**: Check secrets are correctly set
-- HF Spaces: Settings → Repository secrets
-- Streamlit Cloud: App settings → Secrets
-### Dashboard loads slowly
-**Expected**: First load ~30s (fetches all data)
-- Subsequent loads: <5 minutes (cache refresh)
-- Cache doesn't persist on free hosting
-### Build fails
-**Check**:
-1. All 3 files uploaded (`app.py`, `requirements.txt`, `.streamlit/config.toml`)
-2. Check build logs for errors
-3. Verify Python dependencies in `requirements.txt`
-### Can't access Label Studio from cloud
-**Common issue**: Label Studio must be publicly accessible
-- If running locally, cloud can't reach it
-- Use a public URL or cloud-hosted Label Studio instance
----
-## Quick Decision Guide
-**Choose HuggingFace Spaces if:**
-- ✅ You want the easiest setup
-- ✅ You don't have a GitHub repo
-- ✅ You prefer ML-focused platform
-**Choose Streamlit Cloud if:**
-- ✅ Your code is already on GitHub
-- ✅ You prefer Streamlit's native platform
-- ✅ You want tight GitHub integration
-Both are excellent choices! 🚀

app.py CHANGED Viewed

@@ -36,6 +36,7 @@ def fetch_project_data(proj, url, headers):
     group = "Our Team" if pid in OUR_TEAM_PROJECT_IDS else "Others"
     rows = []
     page = 1
     while True:
         resp = requests.get(f"{url}/api/projects/{pid}/tasks", headers=headers, params={"page": page, "page_size": 100}, timeout=30)
@@ -66,6 +67,9 @@ def fetch_project_data(proj, url, headers):
                 )
                 continue
             ann = annots[0]
             date = ann.get("created_at", "")[:10] or None
@@ -95,7 +99,7 @@ def fetch_project_data(proj, url, headers):
             break
         page += 1
-    return pid, task_count, rows
 @st.cache_data(ttl=300)
@@ -135,10 +139,24 @@ def load_data():
     for proj in projects:
         pid = proj["id"]
         task_count = proj.get("task_number", 0)
         cache_key = f"project_{pid}"
-        # Use cache if task count unchanged
-        if cache_key in cache and cache[cache_key]["task_count"] == task_count:
             all_rows.extend(cache[cache_key]["rows"])
         else:
             projects_to_fetch.append(proj)
@@ -150,9 +168,9 @@ def load_data():
             progress = st.progress(0, text=f"Loading {len(projects_to_fetch)} projects...")
             for i, future in enumerate(futures):
-                pid, task_count, rows = future.result()
                 all_rows.extend(rows)
-                cache[f"project_{pid}"] = {"task_count": task_count, "rows": rows}
                 progress.progress((i + 1) / len(futures), text=f"Loaded {i + 1}/{len(futures)} projects")
             progress.empty()

     group = "Our Team" if pid in OUR_TEAM_PROJECT_IDS else "Others"
     rows = []
+    submitted_count = 0  # Track submitted (annotated) tasks
     page = 1
     while True:
         resp = requests.get(f"{url}/api/projects/{pid}/tasks", headers=headers, params={"page": page, "page_size": 100}, timeout=30)
                 )
                 continue
+            # Task has annotations - count as submitted
+            submitted_count += 1
             ann = annots[0]
             date = ann.get("created_at", "")[:10] or None
             break
         page += 1
+    return pid, task_count, submitted_count, rows
 @st.cache_data(ttl=300)
     for proj in projects:
         pid = proj["id"]
         task_count = proj.get("task_number", 0)
+        # Get submitted task count from Label Studio API
+        api_submitted_count = proj.get("num_tasks_with_annotations", 0)
         cache_key = f"project_{pid}"
+        # Invalidate cache if:
+        # 1. No cache exists for this project
+        # 2. Total task count changed (new tasks added/removed)
+        # 3. Submitted task count changed (new annotations/submissions)
+        use_cache = False
+        if cache_key in cache:
+            cached = cache[cache_key]
+            # Use cache only if BOTH counts match
+            if (cached.get("task_count") == task_count and
+                cached.get("submitted_count") == api_submitted_count):
+                use_cache = True
+        if use_cache:
             all_rows.extend(cache[cache_key]["rows"])
         else:
             projects_to_fetch.append(proj)
             progress = st.progress(0, text=f"Loading {len(projects_to_fetch)} projects...")
             for i, future in enumerate(futures):
+                pid, task_count, submitted_count, rows = future.result()
                 all_rows.extend(rows)
+                cache[f"project_{pid}"] = {"task_count": task_count, "submitted_count": submitted_count, "rows": rows}
                 progress.progress((i + 1) / len(futures), text=f"Loaded {i + 1}/{len(futures)} projects")
             progress.empty()