Spaces:

nus-project
/

annotation-dashboard

Sleeping

App Files Files Community

Gintarė Zokaitytė commited on Jan 27

Commit

ffe022c

0 Parent(s):

Initial dashboard deployment

Browse files

Files changed (8) hide show

.gitignore +32 -0
.streamlit/config.toml +11 -0
DEPLOY.md +188 -0
GITHUB_DEPLOY.md +288 -0
README.md +92 -0
README_GITHUB.md +129 -0
app.py +455 -0
requirements.txt +4 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,32 @@

+# Python cache
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+*.egg
+*.egg-info/
+dist/
+build/
+# Credentials (DO NOT COMMIT)
+.streamlit/secrets.toml
+.env
+# Data cache (speeds up loading)
+.cache.pkl
+*.pkl
+*.cache
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log

.streamlit/config.toml ADDED Viewed

	@@ -0,0 +1,11 @@

+[theme]
+primaryColor = "#d4af37"
+backgroundColor = "#ffffff"
+secondaryBackgroundColor = "#f0f2f6"
+textColor = "#262730"
+font = "sans serif"
+[server]
+headless = true
+port = 7860
+enableCORS = false

DEPLOY.md ADDED Viewed

	@@ -0,0 +1,188 @@

+# Deployment Guide
+Two easy options: **HuggingFace Spaces** or **Streamlit Cloud**
+---
+## Option 1: HuggingFace Spaces (Recommended)
+### Step 1: Create Space
+1. Go to https://huggingface.co/new-space
+2. Choose a name (e.g., `annotation-dashboard`)
+3. Select **Streamlit** as the SDK
+4. Choose visibility (Public or Private)
+5. Click **Create Space**
+### Step 2: Upload Files
+Upload these 3 files:
+- ✅ `app.py`
+- ✅ `requirements.txt`
+- ✅ `.streamlit/config.toml`
+**How to upload:**
+- Click **Files** tab → **Add file** → Upload each file
+- Or use Git (see below)
+### Step 3: Add Secrets
+1. Go to **Settings** tab
+2. Scroll to **Repository secrets**
+3. Click **New secret**
+4. Add two secrets:
+```
+Name: LABEL_STUDIO_URL
+Value: https://your-labelstudio-instance.com
+```
+```
+Name: LABEL_STUDIO_API_KEY
+Value: your-api-key-here
+```
+### Step 4: Wait for Build
+- HuggingFace automatically builds your Space
+- Check **Logs** tab if there are issues
+- Dashboard will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME`
+### Using Git (Alternative)
+```bash
+# Clone your Space
+git clone https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
+cd SPACE_NAME
+# Copy files
+cp /path/to/annotation-dashboard/app.py .
+cp /path/to/annotation-dashboard/requirements.txt .
+mkdir -p .streamlit
+cp /path/to/annotation-dashboard/.streamlit/config.toml .streamlit/
+# Push
+git add .
+git commit -m "Deploy dashboard"
+git push
+```
+---
+## Option 2: Streamlit Cloud
+### Step 1: Push to GitHub
+Your dashboard needs to be in a GitHub repository.
+```bash
+cd annotation-dashboard
+# Initialize git if needed
+git init
+git add app.py requirements.txt .streamlit/config.toml .gitignore
+git commit -m "Initial dashboard"
+# Create repo on GitHub (via web UI), then:
+git remote add origin https://github.com/YOUR_USERNAME/REPO_NAME.git
+git push -u origin main
+```
+### Step 2: Deploy on Streamlit Cloud
+1. Go to https://share.streamlit.io/
+2. Click **New app**
+3. Connect your GitHub account (if first time)
+4. Select:
+   - **Repository**: Your dashboard repo
+   - **Branch**: `main`
+   - **Main file path**: `app.py`
+5. Click **Deploy**
+### Step 3: Add Secrets
+1. Click **Advanced settings** (before deploying) or **⋮** → **Settings** (after)
+2. Go to **Secrets** section
+3. Add in TOML format:
+```toml
+LABEL_STUDIO_URL = "https://your-labelstudio-instance.com"
+LABEL_STUDIO_API_KEY = "your-api-key-here"
+```
+4. Click **Save**
+### Step 4: Access Dashboard
+Your app will be at: `https://YOUR_USERNAME-REPO_NAME.streamlit.app`
+---
+## Comparison
+| Feature | HuggingFace Spaces | Streamlit Cloud |
+|---------|-------------------|-----------------|
+| **Setup** | Easier (upload files) | Requires GitHub repo |
+| **Free tier** | Generous | Limited hours/month |
+| **Custom domain** | Yes (paid) | Yes (paid) |
+| **Cache persistence** | ❌ No (ephemeral storage) | ❌ No (ephemeral storage) |
+| **Community** | ML/AI focused | Data science focused |
+| **Speed** | Fast | Fast |
+**Note**: Cache file (`.cache.pkl`) won't persist on either platform. It rebuilds on each cold start (~30s). For persistent cache, you'd need a database or external storage.
+---
+## Get Your Label Studio API Key
+1. Log into Label Studio
+2. Click your profile (top right)
+3. **Account & Settings**
+4. Scroll to **Access Token**
+5. Copy the token
+---
+## Troubleshooting
+### "Missing credentials" error
+**Fix**: Check secrets are correctly set
+- HF Spaces: Settings → Repository secrets
+- Streamlit Cloud: App settings → Secrets
+### Dashboard loads slowly
+**Expected**: First load ~30s (fetches all data)
+- Subsequent loads: <5 minutes (cache refresh)
+- Cache doesn't persist on free hosting
+### Build fails
+**Check**:
+1. All 3 files uploaded (`app.py`, `requirements.txt`, `.streamlit/config.toml`)
+2. Check build logs for errors
+3. Verify Python dependencies in `requirements.txt`
+### Can't access Label Studio from cloud
+**Common issue**: Label Studio must be publicly accessible
+- If running locally, cloud can't reach it
+- Use a public URL or cloud-hosted Label Studio instance
+---
+## Quick Decision Guide
+**Choose HuggingFace Spaces if:**
+- ✅ You want the easiest setup
+- ✅ You don't have a GitHub repo
+- ✅ You prefer ML-focused platform
+**Choose Streamlit Cloud if:**
+- ✅ Your code is already on GitHub
+- ✅ You prefer Streamlit's native platform
+- ✅ You want tight GitHub integration
+Both are excellent choices! 🚀

GITHUB_DEPLOY.md ADDED Viewed

	@@ -0,0 +1,288 @@

+# Deploy from GitHub Organization
+Step-by-step guide to deploy the dashboard from your GitHub organization.
+---
+## Step 1: Push to GitHub Organization
+### Option A: Create New Repo via GitHub Web
+1. Go to your organization on GitHub
+2. Click **New repository**
+3. Name it (e.g., `annotation-dashboard`)
+4. Choose visibility (Public or Private)
+5. **Don't** initialize with README (we have files already)
+6. Click **Create repository**
+Then push your code:
+```bash
+cd annotation-dashboard
+# Initialize git if needed
+git init
+# Add files
+git add app.py requirements.txt .streamlit/ .gitignore
+git commit -m "Initial dashboard"
+# Add remote (replace ORG_NAME and REPO_NAME)
+git remote add origin https://github.com/ORG_NAME/REPO_NAME.git
+# Push
+git branch -M main
+git push -u origin main
+```
+### Option B: Use GitHub CLI (faster)
+```bash
+cd annotation-dashboard
+# Login to GitHub (first time only)
+gh auth login
+# Create repo in your org and push
+gh repo create ORG_NAME/annotation-dashboard --source=. --public --push
+# Or private:
+gh repo create ORG_NAME/annotation-dashboard --source=. --private --push
+```
+---
+## Step 2: Deploy to HuggingFace Spaces from GitHub
+### Link GitHub to HuggingFace
+1. Go to https://huggingface.co/new-space
+2. Choose **Import from GitHub**
+3. Connect your GitHub account (first time only)
+4. Select your organization and repository
+5. Click **Import**
+### Add Secrets
+1. Once imported, go to **Settings** → **Repository secrets**
+2. Add:
+   - `LABEL_STUDIO_URL`
+   - `LABEL_STUDIO_API_KEY`
+### Auto-sync
+Now any push to GitHub automatically updates your HF Space! 🎉
+---
+## Step 3: Deploy to Streamlit Cloud from GitHub
+1. Go to https://share.streamlit.io/
+2. Click **New app**
+3. **Connect GitHub** (allow access to organization)
+4. Select:
+   - **Repository**: `ORG_NAME/REPO_NAME`
+   - **Branch**: `main`
+   - **Main file**: `app.py`
+5. **Advanced settings** → **Secrets** → Add:
+```toml
+LABEL_STUDIO_URL = "https://your-instance.com"
+LABEL_STUDIO_API_KEY = "your-api-key"
+```
+6. Click **Deploy**
+Your app will be at: `https://ORG_NAME-REPO_NAME.streamlit.app`
+---
+## Recommended: Add README for GitHub
+Create a nice README for your GitHub repo:
+```bash
+cat > README.md << 'EOF'
+# Annotation Progress Dashboard
+Live dashboard tracking Lithuanian NER annotation progress.
+## 🚀 Live Demo
+- **HuggingFace**: [link to your space]
+- **Streamlit Cloud**: [link to your app]
+## Features
+- Real-time progress metrics
+- Weekly team statistics
+- Category breakdown
+- Completion projections
+- Fast caching (30s → <2s)
+## Local Development
+\`\`\`bash
+pip install -r requirements.txt
+export LABEL_STUDIO_URL="https://..."
+export LABEL_STUDIO_API_KEY="..."
+streamlit run app.py
+\`\`\`
+## Deployment
+See [DEPLOY.md](DEPLOY.md) for cloud deployment instructions.
+## Tech Stack
+- **Streamlit** - Web framework
+- **Pandas** - Data processing
+- **Plotly** - Interactive charts
+- **Label Studio SDK** - Data source
+EOF
+git add README.md
+git commit -m "Add README"
+git push
+```
+---
+## GitHub Actions (Optional)
+Auto-deploy on every commit with GitHub Actions:
+```bash
+mkdir -p .github/workflows
+cat > .github/workflows/deploy.yml << 'EOF'
+name: Deploy Dashboard
+on:
+  push:
+    branches: [main]
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Deploy to HuggingFace
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          pip install huggingface-hub
+          huggingface-cli upload YOUR_USERNAME/SPACE_NAME . --repo-type=space
+EOF
+```
+Add `HF_TOKEN` secret in GitHub:
+1. Settings → Secrets and variables → Actions
+2. New repository secret → `HF_TOKEN`
+3. Get token from https://huggingface.co/settings/tokens
+---
+## Team Collaboration
+### Add Team Members
+1. Go to GitHub repo → **Settings** → **Collaborators**
+2. Add team members
+3. They can now push updates
+### Protected Branches
+Require reviews before merging:
+1. **Settings** → **Branches**
+2. **Add rule** for `main`
+3. Enable:
+   - Require pull request reviews
+   - Require status checks
+---
+## Quick Reference
+```bash
+# Clone from organization
+git clone https://github.com/ORG_NAME/REPO_NAME.git
+# Make changes
+git add .
+git commit -m "Update dashboard"
+git push
+# Both HF Spaces and Streamlit Cloud auto-update!
+```
+---
+## Troubleshooting
+### Can't push to organization repo
+**Fix**: Check you have write permissions
+- Ask organization admin to add you
+- Or fork the repo to your personal account
+### GitHub Actions failing
+**Check**:
+1. `HF_TOKEN` secret is set
+2. Token has write permissions
+3. Check Actions logs for details
+### Streamlit Cloud can't access private repo
+**Fix**:
+1. Make repo public, OR
+2. Grant Streamlit access in GitHub:
+   - Settings → Applications → Streamlit
+   - Grant access to organization
+---
+## Best Practices
+✅ **Do**:
+- Use `.gitignore` (already included)
+- Add meaningful commit messages
+- Keep secrets in platform secrets, not code
+- Document changes in commits
+❌ **Don't**:
+- Commit `.cache.pkl` (in `.gitignore`)
+- Commit secrets or `.env` files
+- Force push to `main` branch
+- Commit large test data files
+---
+## Example Workflow
+```bash
+# 1. Create feature branch
+git checkout -b feature/add-new-chart
+# 2. Make changes
+# ... edit app.py ...
+# 3. Test locally
+streamlit run app.py
+# 4. Commit and push
+git add app.py
+git commit -m "Add new chart for entity distribution"
+git push origin feature/add-new-chart
+# 5. Create Pull Request on GitHub
+# 6. Review and merge to main
+# 7. HF Spaces and Streamlit Cloud auto-update! 🎉
+```
+---
+Need help? Check:
+- [DEPLOY.md](DEPLOY.md) - Cloud deployment details
+- [README.md](README.md) - General dashboard info

README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+# Annotation Progress Dashboard
+Live dashboard for tracking Lithuanian NER annotation progress.
+## Quick Start
+### Local
+```bash
+pip install -r requirements.txt
+# Set credentials
+export LABEL_STUDIO_URL="https://your-instance.com"
+export LABEL_STUDIO_API_KEY="your-key"
+streamlit run app.py
+```
+### Deploy to HuggingFace Spaces
+1. Create new Space at https://huggingface.co/new-space (choose **Streamlit** SDK)
+2. Upload files:
+   - `app.py`
+   - `requirements.txt`
+   - `.streamlit/config.toml`
+3. Add secrets in Space Settings → Variables and secrets:
+   - `LABEL_STUDIO_URL` = `https://your-instance.com`
+   - `LABEL_STUDIO_API_KEY` = `your-api-key`
+4. Done! Your dashboard will auto-build and deploy.
+## Get Your API Key
+1. Log into Label Studio
+2. Profile → Account & Settings → Access Token
+3. Copy the token
+## Features
+- Real-time progress metrics
+- Weekly team statistics
+- Category breakdown (mokslinis/ziniasklaida)
+- Completion projection based on recent pace
+- Auto-refresh every 5 minutes
+- **Fast loading with smart caching**:
+  - Disk cache (`.cache.pkl`) persists between runs
+  - Only fetches changed projects
+  - Parallel fetching (10 projects at once)
+  - First load: ~30s, subsequent: <2s
+## Caching Explained
+**Cache location**: `.cache.pkl` in the same directory as `app.py`
+**How it works**:
+- First run: Fetches all data from Label Studio (~30 seconds)
+- Saves to disk cache
+- Next runs: Only fetches projects that changed (new tasks added)
+- Shows progress bar when fetching
+**Clear cache**:
+```bash
+rm .cache.pkl
+```
+Or just wait - cache auto-refreshes every 5 minutes.
+## Configuration
+Edit `app.py` to customize:
+```python
+GOAL_WORDS = 2_200_000        # Total goal
+CATEGORY_GOAL = 1_100_000     # Per-category goal
+OUR_TEAM_PROJECT_IDS = {...}  # Your team project IDs
+CACHE_FILE = Path(".cache.pkl")  # Cache location
+```
+## Troubleshooting
+**Dashboard loads slowly every time**:
+- Cache file may not be writable
+- Check `.cache.pkl` exists after first load
+- On HF Spaces, cache won't persist (limitation of the platform)
+**"Missing credentials" error**:
+- Check environment variables are set
+- For HF Spaces: verify secrets in Space settings
+That's it!

README_GITHUB.md ADDED Viewed

	@@ -0,0 +1,129 @@

+# Annotation Progress Dashboard
+Live dashboard for tracking Lithuanian NER annotation project progress.
+## 🚀 Quick Deploy
+### Easiest: HuggingFace Spaces
+1. Go to https://huggingface.co/new-space
+2. Choose **Streamlit** SDK
+3. Upload: `app.py`, `requirements.txt`, `.streamlit/config.toml`
+4. Add secrets: `LABEL_STUDIO_URL` and `LABEL_STUDIO_API_KEY`
+5. Done! 🎉
+See [DEPLOY.md](DEPLOY.md) for detailed instructions.
+## ✨ Features
+- **Progress Metrics**: Real-time tracking toward 2.2M word goal
+- **Weekly Stats**: Team member contributions with "before" summary
+- **Category Breakdown**: Split by mokslinis/ziniasklaida + status (Ready/Needs Fixing)
+- **Projections**: Estimated completion date based on recent pace
+- **Fast Loading**: Smart caching (30s first load, <2s after)
+## 📊 Screenshots
+[Add screenshots of your dashboard here]
+## 🏃 Local Development
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Set credentials
+export LABEL_STUDIO_URL="https://your-labelstudio-instance.com"
+export LABEL_STUDIO_API_KEY="your-api-key"
+# Run dashboard
+streamlit run app.py
+```
+Visit http://localhost:8501
+## ⚙️ Configuration
+Edit `app.py` to customize:
+```python
+GOAL_WORDS = 2_200_000              # Total word goal
+CATEGORY_GOAL = 1_100_000           # Per-category goal
+OUR_TEAM_PROJECT_IDS = {...}        # Your team's project IDs
+TEAM_COLORS = {...}                 # Chart colors per member
+```
+## 🗂️ Project Structure
+```
+annotation-dashboard/
+├── app.py                    # Main dashboard (all-in-one)
+├── requirements.txt          # Dependencies
+├── .streamlit/
+│   └── config.toml          # Theme & settings
+├── .cache.pkl               # Auto-generated cache
+├── .gitignore               # Git ignore rules
+├── DEPLOY.md                # Cloud deployment guide
+├── GITHUB_DEPLOY.md         # GitHub organization setup
+└── README.md                # This file
+```
+## 📚 Documentation
+- **[DEPLOY.md](DEPLOY.md)** - Deploy to HuggingFace Spaces or Streamlit Cloud
+- **[GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)** - Setup with GitHub organization
+## 🔧 Tech Stack
+- **Streamlit** - Web framework
+- **Pandas** - Data processing
+- **Plotly** - Interactive charts
+- **Requests** - API client
+- **Label Studio** - Data source
+## 🚀 Deployment Options
+| Platform | Pros | Setup Time |
+|----------|------|------------|
+| **HuggingFace Spaces** | Easy upload, ML-focused | 5 min |
+| **Streamlit Cloud** | GitHub integration | 10 min |
+| **Local** | Full control | 2 min |
+## 📈 Performance
+- **First load**: ~30 seconds (fetches all data)
+- **Cached load**: <2 seconds (smart caching)
+- **Auto-refresh**: Every 5 minutes
+- **Cache location**: `.cache.pkl` (in `.gitignore`)
+## 🔐 Security
+✅ Secrets stored in platform secrets (not in code)
+✅ `.env` and secrets files in `.gitignore`
+✅ Cache file excluded from git
+✅ No hardcoded credentials
+## 🤝 Contributing
+1. Clone the repo
+2. Create a feature branch: `git checkout -b feature/amazing-feature`
+3. Make changes and test locally
+4. Commit: `git commit -m 'Add amazing feature'`
+5. Push: `git push origin feature/amazing-feature`
+6. Open a Pull Request
+## 📝 License
+[Add your license here]
+## 👥 Team
+[Add team members here]
+## 📧 Contact
+[Add contact info or link to organization]
+---
+**Built with ❤️ for the Lithuanian NER Annotation Project**

app.py ADDED Viewed

	@@ -0,0 +1,455 @@

+"""Annotation Progress Dashboard - Simple & Elegant"""
+import re
+import os
+import pickle
+from pathlib import Path
+from concurrent.futures import ThreadPoolExecutor
+import streamlit as st
+import pandas as pd
+import plotly.graph_objects as go
+import requests
+# =============================================================================
+# Configuration
+# =============================================================================
+GOAL_WORDS = 2_200_000
+CATEGORY_GOAL = 1_100_000
+OUR_TEAM_PROJECT_IDS = {29, 30, 31, 32, 33, 37}
+ANNOTATED_STATES = ["Acceptable", "No Rating"]
+GOAL_STATES = ["Acceptable", "No Rating", "ReqAttn (entities)"]
+TEAM_COLORS = {
+    "A.K. (22)": "#0066cc",
+    "J.Š. (23)": "#00cccc",
+    "J.Š. (24)": "#00cc00",
+    "G.Z. (25)": "#ff9900",
+    "L.M. (26)": "#9933ff",
+    "M.M. (27)": "#cc0000",
+}
+# Cache file location (persists between runs)
+CACHE_FILE = Path(".cache.pkl")
+# =============================================================================
+# Setup
+# =============================================================================
+st.set_page_config(page_title="Annotation Progress", page_icon="📊", layout="wide")
+# =============================================================================
+# Data Loading
+# =============================================================================
+def fetch_project_data(proj, url, headers):
+    """Fetch data from one project (for parallel execution)."""
+    pid, name, task_count = proj["id"], proj.get("title", f"Project {proj['id']}"), proj.get("task_number", 0)
+    group = "Our Team" if pid in OUR_TEAM_PROJECT_IDS else "Others"
+    rows = []
+    page = 1
+    while True:
+        resp = requests.get(
+            f"{url}/api/projects/{pid}/tasks",
+            headers=headers,
+            params={"page": page, "page_size": 100},
+            timeout=30
+        )
+        resp.raise_for_status()
+        data = resp.json()
+        tasks = data if isinstance(data, list) else data.get("tasks", [])
+        if not tasks:
+            break
+        for task in tasks:
+            task_data = task.get("data", {})
+            words = task_data.get("words") or len(task_data.get("text", "").split())
+            category = task_data.get("category")
+            annots = [a for a in task.get("annotations", []) if not a.get("was_cancelled")]
+            if not annots:
+                rows.append({
+                    "project_id": pid, "project": name, "project_group": group,
+                    "date": None, "state": "Not Annotated",
+                    "words": int(words), "category": category
+                })
+                continue
+            ann = annots[0]
+            date = ann.get("created_at", "")[:10] or None
+            rating = None
+            for item in ann.get("result", []):
+                if item.get("type") == "choices" and item.get("from_name") == "text_rating":
+                    rating = item.get("value", {}).get("choices", [None])[0]
+                    break
+            has_entities = any(i.get("type") == "labels" for i in ann.get("result", []))
+            if rating is None:
+                state = "No Rating"
+            elif rating == "Requires Attention":
+                state = f"ReqAttn ({'entities' if has_entities else 'empty'})"
+            elif rating == "Unacceptable":
+                state = f"Unacceptable ({'entities' if has_entities else 'empty'})"
+            else:
+                state = "Acceptable"
+            rows.append({
+                "project_id": pid, "project": name, "project_group": group,
+                "date": date, "state": state,
+                "words": int(words), "category": category
+            })
+        if isinstance(data, list) and len(data) < 100:
+            break
+        if isinstance(data, dict) and not data.get("next"):
+            break
+        page += 1
+    return pid, task_count, rows
+@st.cache_data(ttl=300)
+def load_data():
+    """Load annotation data from Label Studio with disk cache."""
+    try:
+        url = st.secrets.get("LABEL_STUDIO_URL", os.getenv("LABEL_STUDIO_URL", "")).rstrip("/")
+        key = st.secrets.get("LABEL_STUDIO_API_KEY", os.getenv("LABEL_STUDIO_API_KEY", ""))
+    except (KeyError, FileNotFoundError, AttributeError):
+        url = os.getenv("LABEL_STUDIO_URL", "").rstrip("/")
+        key = os.getenv("LABEL_STUDIO_API_KEY", "")
+    if not url or not key:
+        st.error("Missing credentials. Set LABEL_STUDIO_URL and LABEL_STUDIO_API_KEY.")
+        st.stop()
+    headers = {"Authorization": f"Token {key}"}
+    # Fetch all projects
+    resp = requests.get(f"{url}/api/projects", headers=headers, timeout=30)
+    resp.raise_for_status()
+    projects = resp.json().get("results", [])
+    # Load cache
+    cache = {}
+    if CACHE_FILE.exists():
+        try:
+            with open(CACHE_FILE, "rb") as f:
+                cache = pickle.load(f)
+        except Exception:
+            cache = {}
+    # Check which projects need updating
+    projects_to_fetch = []
+    all_rows = []
+    for proj in projects:
+        pid = proj["id"]
+        task_count = proj.get("task_number", 0)
+        cache_key = f"project_{pid}"
+        # Use cache if task count unchanged
+        if cache_key in cache and cache[cache_key]["task_count"] == task_count:
+            all_rows.extend(cache[cache_key]["rows"])
+        else:
+            projects_to_fetch.append(proj)
+    # Fetch updated projects in parallel
+    if projects_to_fetch:
+        with ThreadPoolExecutor(max_workers=10) as executor:
+            futures = [executor.submit(fetch_project_data, proj, url, headers) for proj in projects_to_fetch]
+            progress = st.progress(0, text=f"Loading {len(projects_to_fetch)} projects...")
+            for i, future in enumerate(futures):
+                pid, task_count, rows = future.result()
+                all_rows.extend(rows)
+                cache[f"project_{pid}"] = {"task_count": task_count, "rows": rows}
+                progress.progress((i + 1) / len(futures), text=f"Loaded {i + 1}/{len(futures)} projects")
+            progress.empty()
+        # Save cache
+        try:
+            with open(CACHE_FILE, "wb") as f:
+                pickle.dump(cache, f)
+        except Exception:
+            pass
+    # Create dataframe
+    df = pd.DataFrame(all_rows)
+    df["words"] = df["words"].astype(int)
+    df["date"] = pd.to_datetime(df["date"], errors="coerce")
+    df["is_annotated"] = df["state"].isin(ANNOTATED_STATES)
+    df["is_goal_state"] = df["state"].isin(GOAL_STATES)
+    return df
+# =============================================================================
+# Helper Functions
+# =============================================================================
+def anonymize(name):
+    """Convert '26 [Lukas Malakauskas]' to 'L.M. (26)'"""
+    if name == "Others":
+        return "Others"
+    match = re.match(r"(\d+)\s+\[(.+?)\]", name)
+    if match:
+        num, full = match.groups()
+        parts = full.split()
+        if len(parts) >= 2:
+            return f"{parts[0][0]}.{parts[-1][0]}. ({num})"
+    return name
+# =============================================================================
+# Main App
+# =============================================================================
+st.title("📊 Annotation Progress Dashboard")
+st.markdown("---")
+# Load data
+with st.spinner("Loading..."):
+    df = load_data()
+# Overview metrics
+total = df[df["is_goal_state"]]["words"].sum()
+remaining = GOAL_WORDS - total
+progress = total / GOAL_WORDS * 100
+col1, col2 = st.columns(2)
+col1.metric("Progress toward 2.2M", f"{total:,}", f"{progress:.1f}%")
+col2.metric("Remaining", f"{remaining:,}", f"{100-progress:.1f}%")
+st.markdown("---")
+# Tabs
+tab1, tab2 = st.tabs(["📊 Weekly Stats", "⏱️ Pacing"])
+# ============== TAB 1: Weekly Stats ==============
+with tab1:
+    st.caption("Goal states (Acceptable + No Rating + ReqAttn with entities)")
+    cutoff_date = pd.Timestamp("2025-12-22")
+    # Filter data - use GOAL_STATES to match progress metrics
+    df_week = df[df["is_goal_state"] & df["date"].notna()].copy()
+    df_week["week_start"] = df_week["date"] - pd.to_timedelta(df_week["date"].dt.dayofweek, unit="d")
+    df_week["member"] = df_week.apply(
+        lambda r: anonymize(r["project"]) if r["project_group"] == "Our Team" else "Others",
+        axis=1
+    )
+    # Weekly pivot (all data)
+    weekly_all = df_week.pivot_table(
+        index="week_start", columns="member", values="words", aggfunc="sum", fill_value=0
+    ).astype(int)
+    # Split into before and after cutoff
+    weekly_before = weekly_all[weekly_all.index < cutoff_date]
+    weekly_after = weekly_all[weekly_all.index >= cutoff_date]
+    # Ensure consistent columns
+    all_members = set(weekly_all.columns)
+    if "Others" not in all_members:
+        all_members.add("Others")
+    for member in all_members:
+        if member not in weekly_after.columns:
+            weekly_after[member] = 0
+        if member not in weekly_before.columns:
+            weekly_before[member] = 0
+    # Sort columns by total contribution
+    totals = weekly_all.sum().sort_values(ascending=False)
+    weekly_after = weekly_after[totals.index]
+    weekly_after["Total"] = weekly_after.sum(axis=1)
+    # Calculate "Before" summary row
+    before_totals = weekly_before[totals.index].sum()
+    before_totals["Total"] = before_totals.sum()
+    # Format weekly data for display
+    display = weekly_after.reset_index()
+    display["Week"] = (
+        display["week_start"].dt.strftime("%Y-%m-%d") + " - " +
+        (display["week_start"] + pd.Timedelta(days=6)).dt.strftime("%Y-%m-%d")
+    )
+    display = display.drop("week_start", axis=1)
+    display = display[["Week"] + list(totals.index) + ["Total"]]
+    # Add "Before" row at the beginning
+    before_row = pd.DataFrame([{"Week": f"Before {cutoff_date.strftime('%Y-%m-%d')}", **before_totals}])
+    display = pd.concat([before_row, display], ignore_index=True)
+    # Add TOTAL row at the end
+    all_totals = weekly_all[totals.index].sum()
+    all_totals["Total"] = all_totals.sum()
+    total_row = pd.DataFrame([{"Week": "TOTAL", **all_totals}])
+    display = pd.concat([display, total_row], ignore_index=True)
+    # Format numbers
+    for col in display.columns:
+        if col != "Week":
+            display[col] = display[col].apply(lambda x: f"{int(x):,}" if pd.notna(x) else "")
+    # Style and show
+    def style_row(row):
+        if row["Week"] == "TOTAL":
+            return ["font-weight: bold; background-color: #f0f0f0;"] * len(row)
+        elif row["Week"].startswith("Before"):
+            return ["font-style: italic; background-color: #f9f9f9;"] * len(row)
+        return [""] * len(row)
+    styled = display.style.apply(style_row, axis=1).set_properties(subset=["Total"], **{"font-weight": "bold"})
+    st.dataframe(styled, hide_index=True, use_container_width=True)
+# ============== TAB 2: Pacing ==============
+with tab2:
+    st.subheader("Category Breakdown")
+    st.caption("Requirement: 1.1M words from each category")
+    # Split by status: Ready vs Needs Fixing
+    df_ready = df[df["is_annotated"]]  # Acceptable + No Rating
+    df_needs_fixing = df[df["state"] == "ReqAttn (entities)"]
+    df_total = df[df["is_goal_state"]]
+    # Calculate by category
+    mok_ready = df_ready[df_ready["category"] == "mokslinis"]["words"].sum()
+    mok_fixing = df_needs_fixing[df_needs_fixing["category"] == "mokslinis"]["words"].sum()
+    mok_total = mok_ready + mok_fixing
+    zin_ready = df_ready[df_ready["category"] == "ziniasklaida"]["words"].sum()
+    zin_fixing = df_needs_fixing[df_needs_fixing["category"] == "ziniasklaida"]["words"].sum()
+    zin_total = zin_ready + zin_fixing
+    total_ready = mok_ready + zin_ready
+    total_fixing = mok_fixing + zin_fixing
+    total_all = total_ready + total_fixing
+    cat_df = pd.DataFrame({
+        "Category": ["mokslinis", "ziniasklaida", "TOTAL"],
+        "Ready": [f"{mok_ready:,}", f"{zin_ready:,}", f"{total_ready:,}"],
+        "Needs Fixing": [f"{mok_fixing:,}", f"{zin_fixing:,}", f"{total_fixing:,}"],
+        "Total": [f"{mok_total:,}", f"{zin_total:,}", f"{total_all:,}"],
+        "Goal": [f"{CATEGORY_GOAL:,}", f"{CATEGORY_GOAL:,}", f"{GOAL_WORDS:,}"],
+        "Progress": [
+            f"{mok_total/CATEGORY_GOAL*100:.1f}%",
+            f"{zin_total/CATEGORY_GOAL*100:.1f}%",
+            f"{total_all/GOAL_WORDS*100:.1f}%"
+        ]
+    })
+    st.dataframe(cat_df, hide_index=True, use_container_width=True)
+    st.markdown("---")
+    st.header("Cumulative Progress & Projection")
+    # Cumulative data
+    df_cum = df[df["is_goal_state"] & df["date"].notna()].copy()
+    df_cum["member"] = df_cum.apply(
+        lambda r: anonymize(r["project"]) if r["project_group"] == "Our Team" else "Others",
+        axis=1
+    )
+    daily = df_cum.groupby(["date", "member"])["words"].sum().reset_index()
+    pivot = daily.pivot_table(index="date", columns="member", values="words", fill_value=0)
+    cumulative = pivot.sort_index().cumsum()
+    cumulative["Total"] = cumulative.sum(axis=1)
+    cumulative = cumulative[cumulative.index >= pd.Timestamp("2025-12-18")]
+    # Projection calculation
+    last_date = cumulative.index[-1]
+    current = cumulative["Total"].iloc[-1]
+    # Calculate rate from last 14 days
+    lookback = cumulative[cumulative.index >= last_date - pd.Timedelta(days=14)]
+    if len(lookback) >= 2:
+        days = (last_date - lookback.index[0]).days or 1
+        rate = (current - lookback["Total"].iloc[0]) / days
+        days_left = (GOAL_WORDS - current) / rate if rate > 0 else 0
+        completion = last_date + pd.Timedelta(days=days_left)
+        weekly_rate = rate * 7
+    else:
+        rate = completion = weekly_rate = None
+    # Chart
+    fig = go.Figure()
+    # Goal lines
+    fig.add_hline(y=1_100_000, line_dash="dot", line_color="orange",
+                  annotation_text="Midpoint: 1.1M", annotation_position="top left")
+    fig.add_hline(y=GOAL_WORDS, line_dash="dot", line_color="red",
+                  annotation_text="Goal: 2.2M", annotation_position="top left")
+    # Members
+    members = [c for c in cumulative.columns if c not in ["Total", "Others"]]
+    members = sorted(members, key=lambda x: cumulative[x].iloc[-1], reverse=True)
+    if "Others" in cumulative.columns:
+        fig.add_trace(go.Scatter(
+            x=cumulative.index, y=cumulative["Others"],
+            name=f"Others: {cumulative['Others'].iloc[-1]:,.0f}",
+            mode="lines", line=dict(width=2, color="#7f8c8d")
+        ))
+    for m in members:
+        color = TEAM_COLORS.get(m, "#34495e")
+        fig.add_trace(go.Scatter(
+            x=cumulative.index, y=cumulative[m],
+            name=f"{m}: {cumulative[m].iloc[-1]:,.0f}",
+            mode="lines", line=dict(width=2, color=color)
+        ))
+    # Total
+    fig.add_trace(go.Scatter(
+        x=cumulative.index, y=cumulative["Total"],
+        name=f"Total: {cumulative['Total'].iloc[-1]:,.0f}",
+        mode="lines", line=dict(width=3, color="#d4af37"),
+        fill="tozeroy", fillcolor="rgba(212, 175, 55, 0.1)"
+    ))
+    # Projection
+    if completion:
+        proj_dates = pd.date_range(last_date, completion, freq="D")
+        proj_vals = current + rate * (proj_dates - last_date).days
+        fig.add_trace(go.Scatter(
+            x=proj_dates, y=proj_vals,
+            name=f"Projection ({int(weekly_rate):,}/wk)",
+            mode="lines", line=dict(width=3, color="#d4af37", dash="dot")
+        ))
+        fig.add_trace(go.Scatter(
+            x=[completion], y=[GOAL_WORDS],
+            mode="markers+text", marker=dict(size=14, color="#d4af37", symbol="diamond"),
+            text=[completion.strftime("%b %d")], textposition="top center",
+            showlegend=False
+        ))
+        title = f"Cumulative Progress → Est. {completion.strftime('%B %d, %Y')}"
+    else:
+        title = "Cumulative Progress"
+    fig.update_layout(
+        title=title, xaxis_title="Date", yaxis_title="Cumulative Words",
+        height=600, hovermode="x unified", template="plotly_white"
+    )
+    fig.update_yaxes(tickformat=".2s")
+    st.plotly_chart(fig, use_container_width=True)
+    # Metrics
+    if completion:
+        st.markdown("### Pacing Estimates")
+        c1, c2, c3 = st.columns(3)
+        c1.metric("Per Week Rate", f"{int(weekly_rate):,} words")
+        c2.metric("Weeks Remaining", f"{days_left/7:.1f} weeks")
+        c3.metric("Est. Completion", completion.strftime("%Y-%m-%d"))
+# Footer
+st.markdown("---")
+st.caption(
+    f"Updated: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')} | "
+    "Auto-refresh: 5 min | Press 'R' to refresh"
+)

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+streamlit>=1.28.0
+pandas>=2.0.0
+plotly>=5.17.0
+requests>=2.31.0