Gintarė Zokaitytė
commited on
Commit
·
1a492df
1
Parent(s):
974c830
Fix task count on update
Browse files- .env.example +2 -0
- DEPLOY.md +0 -188
- app.py +23 -5
.env.example
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
LABEL_STUDIO_URL=
|
| 2 |
+
LABEL_STUDIO_API_KEY=
|
DEPLOY.md
DELETED
|
@@ -1,188 +0,0 @@
|
|
| 1 |
-
# Deployment Guide
|
| 2 |
-
|
| 3 |
-
Two easy options: **HuggingFace Spaces** or **Streamlit Cloud**
|
| 4 |
-
|
| 5 |
-
---
|
| 6 |
-
|
| 7 |
-
## Option 1: HuggingFace Spaces (Recommended)
|
| 8 |
-
|
| 9 |
-
### Step 1: Create Space
|
| 10 |
-
|
| 11 |
-
1. Go to https://huggingface.co/new-space
|
| 12 |
-
2. Choose a name (e.g., `annotation-dashboard`)
|
| 13 |
-
3. Select **Streamlit** as the SDK
|
| 14 |
-
4. Choose visibility (Public or Private)
|
| 15 |
-
5. Click **Create Space**
|
| 16 |
-
|
| 17 |
-
### Step 2: Upload Files
|
| 18 |
-
|
| 19 |
-
Upload these 3 files:
|
| 20 |
-
|
| 21 |
-
- ✅ `app.py`
|
| 22 |
-
- ✅ `requirements.txt`
|
| 23 |
-
- ✅ `.streamlit/config.toml`
|
| 24 |
-
|
| 25 |
-
**How to upload:**
|
| 26 |
-
- Click **Files** tab → **Add file** → Upload each file
|
| 27 |
-
- Or use Git (see below)
|
| 28 |
-
|
| 29 |
-
### Step 3: Add Secrets
|
| 30 |
-
|
| 31 |
-
1. Go to **Settings** tab
|
| 32 |
-
2. Scroll to **Repository secrets**
|
| 33 |
-
3. Click **New secret**
|
| 34 |
-
4. Add two secrets:
|
| 35 |
-
|
| 36 |
-
```
|
| 37 |
-
Name: LABEL_STUDIO_URL
|
| 38 |
-
Value: https://your-labelstudio-instance.com
|
| 39 |
-
```
|
| 40 |
-
|
| 41 |
-
```
|
| 42 |
-
Name: LABEL_STUDIO_API_KEY
|
| 43 |
-
Value: your-api-key-here
|
| 44 |
-
```
|
| 45 |
-
|
| 46 |
-
### Step 4: Wait for Build
|
| 47 |
-
|
| 48 |
-
- HuggingFace automatically builds your Space
|
| 49 |
-
- Check **Logs** tab if there are issues
|
| 50 |
-
- Dashboard will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME`
|
| 51 |
-
|
| 52 |
-
### Using Git (Alternative)
|
| 53 |
-
|
| 54 |
-
```bash
|
| 55 |
-
# Clone your Space
|
| 56 |
-
git clone https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
|
| 57 |
-
cd SPACE_NAME
|
| 58 |
-
|
| 59 |
-
# Copy files
|
| 60 |
-
cp /path/to/annotation-dashboard/app.py .
|
| 61 |
-
cp /path/to/annotation-dashboard/requirements.txt .
|
| 62 |
-
mkdir -p .streamlit
|
| 63 |
-
cp /path/to/annotation-dashboard/.streamlit/config.toml .streamlit/
|
| 64 |
-
|
| 65 |
-
# Push
|
| 66 |
-
git add .
|
| 67 |
-
git commit -m "Deploy dashboard"
|
| 68 |
-
git push
|
| 69 |
-
```
|
| 70 |
-
|
| 71 |
-
---
|
| 72 |
-
|
| 73 |
-
## Option 2: Streamlit Cloud
|
| 74 |
-
|
| 75 |
-
### Step 1: Push to GitHub
|
| 76 |
-
|
| 77 |
-
Your dashboard needs to be in a GitHub repository.
|
| 78 |
-
|
| 79 |
-
```bash
|
| 80 |
-
cd annotation-dashboard
|
| 81 |
-
|
| 82 |
-
# Initialize git if needed
|
| 83 |
-
git init
|
| 84 |
-
git add app.py requirements.txt .streamlit/config.toml .gitignore
|
| 85 |
-
git commit -m "Initial dashboard"
|
| 86 |
-
|
| 87 |
-
# Create repo on GitHub (via web UI), then:
|
| 88 |
-
git remote add origin https://github.com/YOUR_USERNAME/REPO_NAME.git
|
| 89 |
-
git push -u origin main
|
| 90 |
-
```
|
| 91 |
-
|
| 92 |
-
### Step 2: Deploy on Streamlit Cloud
|
| 93 |
-
|
| 94 |
-
1. Go to https://share.streamlit.io/
|
| 95 |
-
2. Click **New app**
|
| 96 |
-
3. Connect your GitHub account (if first time)
|
| 97 |
-
4. Select:
|
| 98 |
-
- **Repository**: Your dashboard repo
|
| 99 |
-
- **Branch**: `main`
|
| 100 |
-
- **Main file path**: `app.py`
|
| 101 |
-
5. Click **Deploy**
|
| 102 |
-
|
| 103 |
-
### Step 3: Add Secrets
|
| 104 |
-
|
| 105 |
-
1. Click **Advanced settings** (before deploying) or **⋮** → **Settings** (after)
|
| 106 |
-
2. Go to **Secrets** section
|
| 107 |
-
3. Add in TOML format:
|
| 108 |
-
|
| 109 |
-
```toml
|
| 110 |
-
LABEL_STUDIO_URL = "https://your-labelstudio-instance.com"
|
| 111 |
-
LABEL_STUDIO_API_KEY = "your-api-key-here"
|
| 112 |
-
```
|
| 113 |
-
|
| 114 |
-
4. Click **Save**
|
| 115 |
-
|
| 116 |
-
### Step 4: Access Dashboard
|
| 117 |
-
|
| 118 |
-
Your app will be at: `https://YOUR_USERNAME-REPO_NAME.streamlit.app`
|
| 119 |
-
|
| 120 |
-
---
|
| 121 |
-
|
| 122 |
-
## Comparison
|
| 123 |
-
|
| 124 |
-
| Feature | HuggingFace Spaces | Streamlit Cloud |
|
| 125 |
-
|---------|-------------------|-----------------|
|
| 126 |
-
| **Setup** | Easier (upload files) | Requires GitHub repo |
|
| 127 |
-
| **Free tier** | Generous | Limited hours/month |
|
| 128 |
-
| **Custom domain** | Yes (paid) | Yes (paid) |
|
| 129 |
-
| **Cache persistence** | ❌ No (ephemeral storage) | ❌ No (ephemeral storage) |
|
| 130 |
-
| **Community** | ML/AI focused | Data science focused |
|
| 131 |
-
| **Speed** | Fast | Fast |
|
| 132 |
-
|
| 133 |
-
**Note**: Cache file (`.cache.pkl`) won't persist on either platform. It rebuilds on each cold start (~30s). For persistent cache, you'd need a database or external storage.
|
| 134 |
-
|
| 135 |
-
---
|
| 136 |
-
|
| 137 |
-
## Get Your Label Studio API Key
|
| 138 |
-
|
| 139 |
-
1. Log into Label Studio
|
| 140 |
-
2. Click your profile (top right)
|
| 141 |
-
3. **Account & Settings**
|
| 142 |
-
4. Scroll to **Access Token**
|
| 143 |
-
5. Copy the token
|
| 144 |
-
|
| 145 |
-
---
|
| 146 |
-
|
| 147 |
-
## Troubleshooting
|
| 148 |
-
|
| 149 |
-
### "Missing credentials" error
|
| 150 |
-
|
| 151 |
-
**Fix**: Check secrets are correctly set
|
| 152 |
-
- HF Spaces: Settings → Repository secrets
|
| 153 |
-
- Streamlit Cloud: App settings → Secrets
|
| 154 |
-
|
| 155 |
-
### Dashboard loads slowly
|
| 156 |
-
|
| 157 |
-
**Expected**: First load ~30s (fetches all data)
|
| 158 |
-
- Subsequent loads: <5 minutes (cache refresh)
|
| 159 |
-
- Cache doesn't persist on free hosting
|
| 160 |
-
|
| 161 |
-
### Build fails
|
| 162 |
-
|
| 163 |
-
**Check**:
|
| 164 |
-
1. All 3 files uploaded (`app.py`, `requirements.txt`, `.streamlit/config.toml`)
|
| 165 |
-
2. Check build logs for errors
|
| 166 |
-
3. Verify Python dependencies in `requirements.txt`
|
| 167 |
-
|
| 168 |
-
### Can't access Label Studio from cloud
|
| 169 |
-
|
| 170 |
-
**Common issue**: Label Studio must be publicly accessible
|
| 171 |
-
- If running locally, cloud can't reach it
|
| 172 |
-
- Use a public URL or cloud-hosted Label Studio instance
|
| 173 |
-
|
| 174 |
-
---
|
| 175 |
-
|
| 176 |
-
## Quick Decision Guide
|
| 177 |
-
|
| 178 |
-
**Choose HuggingFace Spaces if:**
|
| 179 |
-
- ✅ You want the easiest setup
|
| 180 |
-
- ✅ You don't have a GitHub repo
|
| 181 |
-
- ✅ You prefer ML-focused platform
|
| 182 |
-
|
| 183 |
-
**Choose Streamlit Cloud if:**
|
| 184 |
-
- ✅ Your code is already on GitHub
|
| 185 |
-
- ✅ You prefer Streamlit's native platform
|
| 186 |
-
- ✅ You want tight GitHub integration
|
| 187 |
-
|
| 188 |
-
Both are excellent choices! 🚀
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app.py
CHANGED
|
@@ -36,6 +36,7 @@ def fetch_project_data(proj, url, headers):
|
|
| 36 |
group = "Our Team" if pid in OUR_TEAM_PROJECT_IDS else "Others"
|
| 37 |
|
| 38 |
rows = []
|
|
|
|
| 39 |
page = 1
|
| 40 |
while True:
|
| 41 |
resp = requests.get(f"{url}/api/projects/{pid}/tasks", headers=headers, params={"page": page, "page_size": 100}, timeout=30)
|
|
@@ -66,6 +67,9 @@ def fetch_project_data(proj, url, headers):
|
|
| 66 |
)
|
| 67 |
continue
|
| 68 |
|
|
|
|
|
|
|
|
|
|
| 69 |
ann = annots[0]
|
| 70 |
date = ann.get("created_at", "")[:10] or None
|
| 71 |
|
|
@@ -95,7 +99,7 @@ def fetch_project_data(proj, url, headers):
|
|
| 95 |
break
|
| 96 |
page += 1
|
| 97 |
|
| 98 |
-
return pid, task_count, rows
|
| 99 |
|
| 100 |
|
| 101 |
@st.cache_data(ttl=300)
|
|
@@ -135,10 +139,24 @@ def load_data():
|
|
| 135 |
for proj in projects:
|
| 136 |
pid = proj["id"]
|
| 137 |
task_count = proj.get("task_number", 0)
|
|
|
|
|
|
|
|
|
|
| 138 |
cache_key = f"project_{pid}"
|
| 139 |
|
| 140 |
-
#
|
| 141 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
all_rows.extend(cache[cache_key]["rows"])
|
| 143 |
else:
|
| 144 |
projects_to_fetch.append(proj)
|
|
@@ -150,9 +168,9 @@ def load_data():
|
|
| 150 |
|
| 151 |
progress = st.progress(0, text=f"Loading {len(projects_to_fetch)} projects...")
|
| 152 |
for i, future in enumerate(futures):
|
| 153 |
-
pid, task_count, rows = future.result()
|
| 154 |
all_rows.extend(rows)
|
| 155 |
-
cache[f"project_{pid}"] = {"task_count": task_count, "rows": rows}
|
| 156 |
progress.progress((i + 1) / len(futures), text=f"Loaded {i + 1}/{len(futures)} projects")
|
| 157 |
progress.empty()
|
| 158 |
|
|
|
|
| 36 |
group = "Our Team" if pid in OUR_TEAM_PROJECT_IDS else "Others"
|
| 37 |
|
| 38 |
rows = []
|
| 39 |
+
submitted_count = 0 # Track submitted (annotated) tasks
|
| 40 |
page = 1
|
| 41 |
while True:
|
| 42 |
resp = requests.get(f"{url}/api/projects/{pid}/tasks", headers=headers, params={"page": page, "page_size": 100}, timeout=30)
|
|
|
|
| 67 |
)
|
| 68 |
continue
|
| 69 |
|
| 70 |
+
# Task has annotations - count as submitted
|
| 71 |
+
submitted_count += 1
|
| 72 |
+
|
| 73 |
ann = annots[0]
|
| 74 |
date = ann.get("created_at", "")[:10] or None
|
| 75 |
|
|
|
|
| 99 |
break
|
| 100 |
page += 1
|
| 101 |
|
| 102 |
+
return pid, task_count, submitted_count, rows
|
| 103 |
|
| 104 |
|
| 105 |
@st.cache_data(ttl=300)
|
|
|
|
| 139 |
for proj in projects:
|
| 140 |
pid = proj["id"]
|
| 141 |
task_count = proj.get("task_number", 0)
|
| 142 |
+
# Get submitted task count from Label Studio API
|
| 143 |
+
api_submitted_count = proj.get("num_tasks_with_annotations", 0)
|
| 144 |
+
|
| 145 |
cache_key = f"project_{pid}"
|
| 146 |
|
| 147 |
+
# Invalidate cache if:
|
| 148 |
+
# 1. No cache exists for this project
|
| 149 |
+
# 2. Total task count changed (new tasks added/removed)
|
| 150 |
+
# 3. Submitted task count changed (new annotations/submissions)
|
| 151 |
+
use_cache = False
|
| 152 |
+
if cache_key in cache:
|
| 153 |
+
cached = cache[cache_key]
|
| 154 |
+
# Use cache only if BOTH counts match
|
| 155 |
+
if (cached.get("task_count") == task_count and
|
| 156 |
+
cached.get("submitted_count") == api_submitted_count):
|
| 157 |
+
use_cache = True
|
| 158 |
+
|
| 159 |
+
if use_cache:
|
| 160 |
all_rows.extend(cache[cache_key]["rows"])
|
| 161 |
else:
|
| 162 |
projects_to_fetch.append(proj)
|
|
|
|
| 168 |
|
| 169 |
progress = st.progress(0, text=f"Loading {len(projects_to_fetch)} projects...")
|
| 170 |
for i, future in enumerate(futures):
|
| 171 |
+
pid, task_count, submitted_count, rows = future.result()
|
| 172 |
all_rows.extend(rows)
|
| 173 |
+
cache[f"project_{pid}"] = {"task_count": task_count, "submitted_count": submitted_count, "rows": rows}
|
| 174 |
progress.progress((i + 1) / len(futures), text=f"Loaded {i + 1}/{len(futures)} projects")
|
| 175 |
progress.empty()
|
| 176 |
|