Gintarė Zokaitytė commited on
Commit
1a492df
·
1 Parent(s): 974c830

Fix task count on update

Browse files
Files changed (3) hide show
  1. .env.example +2 -0
  2. DEPLOY.md +0 -188
  3. app.py +23 -5
.env.example ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ LABEL_STUDIO_URL=
2
+ LABEL_STUDIO_API_KEY=
DEPLOY.md DELETED
@@ -1,188 +0,0 @@
1
- # Deployment Guide
2
-
3
- Two easy options: **HuggingFace Spaces** or **Streamlit Cloud**
4
-
5
- ---
6
-
7
- ## Option 1: HuggingFace Spaces (Recommended)
8
-
9
- ### Step 1: Create Space
10
-
11
- 1. Go to https://huggingface.co/new-space
12
- 2. Choose a name (e.g., `annotation-dashboard`)
13
- 3. Select **Streamlit** as the SDK
14
- 4. Choose visibility (Public or Private)
15
- 5. Click **Create Space**
16
-
17
- ### Step 2: Upload Files
18
-
19
- Upload these 3 files:
20
-
21
- - ✅ `app.py`
22
- - ✅ `requirements.txt`
23
- - ✅ `.streamlit/config.toml`
24
-
25
- **How to upload:**
26
- - Click **Files** tab → **Add file** → Upload each file
27
- - Or use Git (see below)
28
-
29
- ### Step 3: Add Secrets
30
-
31
- 1. Go to **Settings** tab
32
- 2. Scroll to **Repository secrets**
33
- 3. Click **New secret**
34
- 4. Add two secrets:
35
-
36
- ```
37
- Name: LABEL_STUDIO_URL
38
- Value: https://your-labelstudio-instance.com
39
- ```
40
-
41
- ```
42
- Name: LABEL_STUDIO_API_KEY
43
- Value: your-api-key-here
44
- ```
45
-
46
- ### Step 4: Wait for Build
47
-
48
- - HuggingFace automatically builds your Space
49
- - Check **Logs** tab if there are issues
50
- - Dashboard will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME`
51
-
52
- ### Using Git (Alternative)
53
-
54
- ```bash
55
- # Clone your Space
56
- git clone https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
57
- cd SPACE_NAME
58
-
59
- # Copy files
60
- cp /path/to/annotation-dashboard/app.py .
61
- cp /path/to/annotation-dashboard/requirements.txt .
62
- mkdir -p .streamlit
63
- cp /path/to/annotation-dashboard/.streamlit/config.toml .streamlit/
64
-
65
- # Push
66
- git add .
67
- git commit -m "Deploy dashboard"
68
- git push
69
- ```
70
-
71
- ---
72
-
73
- ## Option 2: Streamlit Cloud
74
-
75
- ### Step 1: Push to GitHub
76
-
77
- Your dashboard needs to be in a GitHub repository.
78
-
79
- ```bash
80
- cd annotation-dashboard
81
-
82
- # Initialize git if needed
83
- git init
84
- git add app.py requirements.txt .streamlit/config.toml .gitignore
85
- git commit -m "Initial dashboard"
86
-
87
- # Create repo on GitHub (via web UI), then:
88
- git remote add origin https://github.com/YOUR_USERNAME/REPO_NAME.git
89
- git push -u origin main
90
- ```
91
-
92
- ### Step 2: Deploy on Streamlit Cloud
93
-
94
- 1. Go to https://share.streamlit.io/
95
- 2. Click **New app**
96
- 3. Connect your GitHub account (if first time)
97
- 4. Select:
98
- - **Repository**: Your dashboard repo
99
- - **Branch**: `main`
100
- - **Main file path**: `app.py`
101
- 5. Click **Deploy**
102
-
103
- ### Step 3: Add Secrets
104
-
105
- 1. Click **Advanced settings** (before deploying) or **⋮** → **Settings** (after)
106
- 2. Go to **Secrets** section
107
- 3. Add in TOML format:
108
-
109
- ```toml
110
- LABEL_STUDIO_URL = "https://your-labelstudio-instance.com"
111
- LABEL_STUDIO_API_KEY = "your-api-key-here"
112
- ```
113
-
114
- 4. Click **Save**
115
-
116
- ### Step 4: Access Dashboard
117
-
118
- Your app will be at: `https://YOUR_USERNAME-REPO_NAME.streamlit.app`
119
-
120
- ---
121
-
122
- ## Comparison
123
-
124
- | Feature | HuggingFace Spaces | Streamlit Cloud |
125
- |---------|-------------------|-----------------|
126
- | **Setup** | Easier (upload files) | Requires GitHub repo |
127
- | **Free tier** | Generous | Limited hours/month |
128
- | **Custom domain** | Yes (paid) | Yes (paid) |
129
- | **Cache persistence** | ❌ No (ephemeral storage) | ❌ No (ephemeral storage) |
130
- | **Community** | ML/AI focused | Data science focused |
131
- | **Speed** | Fast | Fast |
132
-
133
- **Note**: Cache file (`.cache.pkl`) won't persist on either platform. It rebuilds on each cold start (~30s). For persistent cache, you'd need a database or external storage.
134
-
135
- ---
136
-
137
- ## Get Your Label Studio API Key
138
-
139
- 1. Log into Label Studio
140
- 2. Click your profile (top right)
141
- 3. **Account & Settings**
142
- 4. Scroll to **Access Token**
143
- 5. Copy the token
144
-
145
- ---
146
-
147
- ## Troubleshooting
148
-
149
- ### "Missing credentials" error
150
-
151
- **Fix**: Check secrets are correctly set
152
- - HF Spaces: Settings → Repository secrets
153
- - Streamlit Cloud: App settings → Secrets
154
-
155
- ### Dashboard loads slowly
156
-
157
- **Expected**: First load ~30s (fetches all data)
158
- - Subsequent loads: <5 minutes (cache refresh)
159
- - Cache doesn't persist on free hosting
160
-
161
- ### Build fails
162
-
163
- **Check**:
164
- 1. All 3 files uploaded (`app.py`, `requirements.txt`, `.streamlit/config.toml`)
165
- 2. Check build logs for errors
166
- 3. Verify Python dependencies in `requirements.txt`
167
-
168
- ### Can't access Label Studio from cloud
169
-
170
- **Common issue**: Label Studio must be publicly accessible
171
- - If running locally, cloud can't reach it
172
- - Use a public URL or cloud-hosted Label Studio instance
173
-
174
- ---
175
-
176
- ## Quick Decision Guide
177
-
178
- **Choose HuggingFace Spaces if:**
179
- - ✅ You want the easiest setup
180
- - ✅ You don't have a GitHub repo
181
- - ✅ You prefer ML-focused platform
182
-
183
- **Choose Streamlit Cloud if:**
184
- - ✅ Your code is already on GitHub
185
- - ✅ You prefer Streamlit's native platform
186
- - ✅ You want tight GitHub integration
187
-
188
- Both are excellent choices! 🚀
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -36,6 +36,7 @@ def fetch_project_data(proj, url, headers):
36
  group = "Our Team" if pid in OUR_TEAM_PROJECT_IDS else "Others"
37
 
38
  rows = []
 
39
  page = 1
40
  while True:
41
  resp = requests.get(f"{url}/api/projects/{pid}/tasks", headers=headers, params={"page": page, "page_size": 100}, timeout=30)
@@ -66,6 +67,9 @@ def fetch_project_data(proj, url, headers):
66
  )
67
  continue
68
 
 
 
 
69
  ann = annots[0]
70
  date = ann.get("created_at", "")[:10] or None
71
 
@@ -95,7 +99,7 @@ def fetch_project_data(proj, url, headers):
95
  break
96
  page += 1
97
 
98
- return pid, task_count, rows
99
 
100
 
101
  @st.cache_data(ttl=300)
@@ -135,10 +139,24 @@ def load_data():
135
  for proj in projects:
136
  pid = proj["id"]
137
  task_count = proj.get("task_number", 0)
 
 
 
138
  cache_key = f"project_{pid}"
139
 
140
- # Use cache if task count unchanged
141
- if cache_key in cache and cache[cache_key]["task_count"] == task_count:
 
 
 
 
 
 
 
 
 
 
 
142
  all_rows.extend(cache[cache_key]["rows"])
143
  else:
144
  projects_to_fetch.append(proj)
@@ -150,9 +168,9 @@ def load_data():
150
 
151
  progress = st.progress(0, text=f"Loading {len(projects_to_fetch)} projects...")
152
  for i, future in enumerate(futures):
153
- pid, task_count, rows = future.result()
154
  all_rows.extend(rows)
155
- cache[f"project_{pid}"] = {"task_count": task_count, "rows": rows}
156
  progress.progress((i + 1) / len(futures), text=f"Loaded {i + 1}/{len(futures)} projects")
157
  progress.empty()
158
 
 
36
  group = "Our Team" if pid in OUR_TEAM_PROJECT_IDS else "Others"
37
 
38
  rows = []
39
+ submitted_count = 0 # Track submitted (annotated) tasks
40
  page = 1
41
  while True:
42
  resp = requests.get(f"{url}/api/projects/{pid}/tasks", headers=headers, params={"page": page, "page_size": 100}, timeout=30)
 
67
  )
68
  continue
69
 
70
+ # Task has annotations - count as submitted
71
+ submitted_count += 1
72
+
73
  ann = annots[0]
74
  date = ann.get("created_at", "")[:10] or None
75
 
 
99
  break
100
  page += 1
101
 
102
+ return pid, task_count, submitted_count, rows
103
 
104
 
105
  @st.cache_data(ttl=300)
 
139
  for proj in projects:
140
  pid = proj["id"]
141
  task_count = proj.get("task_number", 0)
142
+ # Get submitted task count from Label Studio API
143
+ api_submitted_count = proj.get("num_tasks_with_annotations", 0)
144
+
145
  cache_key = f"project_{pid}"
146
 
147
+ # Invalidate cache if:
148
+ # 1. No cache exists for this project
149
+ # 2. Total task count changed (new tasks added/removed)
150
+ # 3. Submitted task count changed (new annotations/submissions)
151
+ use_cache = False
152
+ if cache_key in cache:
153
+ cached = cache[cache_key]
154
+ # Use cache only if BOTH counts match
155
+ if (cached.get("task_count") == task_count and
156
+ cached.get("submitted_count") == api_submitted_count):
157
+ use_cache = True
158
+
159
+ if use_cache:
160
  all_rows.extend(cache[cache_key]["rows"])
161
  else:
162
  projects_to_fetch.append(proj)
 
168
 
169
  progress = st.progress(0, text=f"Loading {len(projects_to_fetch)} projects...")
170
  for i, future in enumerate(futures):
171
+ pid, task_count, submitted_count, rows = future.result()
172
  all_rows.extend(rows)
173
+ cache[f"project_{pid}"] = {"task_count": task_count, "submitted_count": submitted_count, "rows": rows}
174
  progress.progress((i + 1) / len(futures), text=f"Loaded {i + 1}/{len(futures)} projects")
175
  progress.empty()
176