# 📖 Data Use Annotation Tool — Annotator Guide Welcome! This guide explains how to use the **Data Use Annotation Tool** to review and annotate data/dataset mentions in documents. --- ## 1. Getting Started ### Signing In 1. Open the tool — you'll see a login screen 2. Click **🤗 Sign in with HuggingFace** 3. Authorize with your HuggingFace account 4. You'll be redirected to the tool showing your assigned documents > **Note:** Only accounts listed in the annotator configuration will see documents. If you see no documents after logging in, contact the admin. --- ## 2. Interface Overview The tool has two main panels: | Panel | Purpose | |-------|---------| | **Left — PDF Viewer** | Shows the original PDF for the current page | | **Right — Markdown Annotation** | Shows extracted text with highlighted data mentions | ### Top Bar - **Title** — "Data Use Annotation Tool" - **Progress Bar** — Overall annotation progress across all corpora - **User Badge** — Your HuggingFace username - **📊 Leaderboard** — See annotation stats for all annotators ### Document Selector - Dropdown at top-left showing your assigned documents - Documents are labeled by corpus: **[World Bank]**, **[UNHCR]**, etc. - Format: `[Corpus] Doc N (X pages)` --- ## 3. Page Navigation At the bottom of the screen you'll find the page navigator: ``` ⏮ ← Prev | Page 3 ● (3 / 11) | Next → ⏭ ``` | Button | Action | |--------|--------| | **← Prev / Next →** | Move one page at a time | | **⏮ / ⏭** | Jump to the previous/next page that has data mentions | | **● (green dot)** | Indicates the current page has AI-detected data mentions | All pages are shown, including those without mentions. Use the jump buttons to quickly navigate to pages of interest. --- ## 4. Understanding Data Mentions The AI model pre-detects potential dataset mentions in the text. Each mention is highlighted with a color based on its **tag**: | Color | Tag | Meaning | |-------|-----|---------| | 🟢 Green | **Named** | A specific, named dataset (e.g. "2022 National Census") | | 🟡 Amber | **Descriptive** | A described but not formally named dataset (e.g. "a household survey") | | 🟣 Purple | **Vague** | An unclear or ambiguous data reference | | ⚪ Gray | **Non-Dataset** | Flagged by the model but not actually a dataset | A **legend** above the text shows the count of each type on the current page. --- ## 5. Reviewing Existing Mentions (Validation) Click the **toggle button (‹)** on the right edge to open the **Data Mentions** side panel. For each AI-detected mention you can: ### Validate 1. Click **Validate** on a mention 2. Optionally add notes explaining your decision 3. Choose one of: - ✅ **Correct** — The mention is a real dataset - ❌ **Wrong** — The mention is not a dataset (false positive) ### Change Tag - Click the **tag badge** (e.g. "Named") to edit it - Select the correct tag from the dropdown - Click **Save** to update ### Delete - Click **🗑 Delete** to remove a false mention - Click again to confirm (auto-cancels after 3 seconds) ### Status Indicators - **"Needs review"** — Not yet validated by you - **"✓ verified"** / **"✗ rejected"** — Your validation result - A checkmark appears next to validated mentions --- ## 6. Adding New Annotations If you spot a dataset mention that the AI missed: 1. **Select the text** — Click and drag to highlight the dataset name in the markdown preview 2. **Click "✍️ Annotate Selection"** — The annotation modal will appear 3. **Choose a Dataset Tag**: - **Named Dataset** — A specific named dataset - **Descriptive** — A described but unnamed dataset - **Vague** — An ambiguous reference 4. **Click "Save Annotation"** — Your annotation is saved > **Tip:** If no text is selected when you click the button, it will shake to remind you to select text first. --- ## 7. Page Workflow For each page, the recommended workflow is: 1. **Read** the markdown text on the right while referencing the PDF on the left 2. **Review** each highlighted mention — validate or reject in the side panel 3. **Add** any missed mentions using text selection 4. **Move** to the next page (a warning appears if you have unvalidated mentions) ### Unvalidated Mentions Warning When moving to the next page with unvalidated mentions, you'll see: > ⚠️ You have N unverified data mention(s) on this page. Do you want to proceed? You can proceed or go back to finish validating. --- ## 8. Tips & Best Practices - **Use the PDF** for context — the markdown is extracted text and may have formatting issues - **Jump buttons (⏮/⏭)** let you skip pages without mentions quickly - **Pages without mentions** may still contain datasets the AI missed — browse them when possible - **Validate everything** on a page before moving on for the most efficient workflow - **Be precise** when selecting text for new annotations — select just the dataset name, not surrounding context --- ## 9. FAQ **Q: Can I undo a validation?** A: Click "Validate" again to re-validate with a different verdict. **Q: What if the markdown text doesn't match the PDF?** A: This can happen with complex layouts (tables, figures). Annotate based on what you can read. The PDF is the source of truth. **Q: Why are some pages empty?** A: Some pages (like cover pages or blank pages) may have no extracted text. Use the jump buttons to skip them. **Q: Who sees my annotations?** A: Annotations are stored centrally. Admins and other annotators with access to the same documents may see your work. --- ## Need Help? Contact the project admin if you encounter issues or have questions about specific annotation decisions.