data-use-annotation / ANNOTATOR_GUIDE.md
rafmacalaba's picture
docs: add annotator guide for the Data Use Annotation Tool
f6b1e29
# ๐Ÿ“– Data Use Annotation Tool โ€” Annotator Guide
Welcome! This guide explains how to use the **Data Use Annotation Tool** to review and annotate data/dataset mentions in documents.
---
## 1. Getting Started
### Signing In
1. Open the tool โ€” you'll see a login screen
2. Click **๐Ÿค— Sign in with HuggingFace**
3. Authorize with your HuggingFace account
4. You'll be redirected to the tool showing your assigned documents
> **Note:** Only accounts listed in the annotator configuration will see documents. If you see no documents after logging in, contact the admin.
---
## 2. Interface Overview
The tool has two main panels:
| Panel | Purpose |
|-------|---------|
| **Left โ€” PDF Viewer** | Shows the original PDF for the current page |
| **Right โ€” Markdown Annotation** | Shows extracted text with highlighted data mentions |
### Top Bar
- **Title** โ€” "Data Use Annotation Tool"
- **Progress Bar** โ€” Overall annotation progress across all corpora
- **User Badge** โ€” Your HuggingFace username
- **๐Ÿ“Š Leaderboard** โ€” See annotation stats for all annotators
### Document Selector
- Dropdown at top-left showing your assigned documents
- Documents are labeled by corpus: **[World Bank]**, **[UNHCR]**, etc.
- Format: `[Corpus] Doc N (X pages)`
---
## 3. Page Navigation
At the bottom of the screen you'll find the page navigator:
```
โฎ โ† Prev | Page 3 โ— (3 / 11) | Next โ†’ โญ
```
| Button | Action |
|--------|--------|
| **โ† Prev / Next โ†’** | Move one page at a time |
| **โฎ / โญ** | Jump to the previous/next page that has data mentions |
| **โ— (green dot)** | Indicates the current page has AI-detected data mentions |
All pages are shown, including those without mentions. Use the jump buttons to quickly navigate to pages of interest.
---
## 4. Understanding Data Mentions
The AI model pre-detects potential dataset mentions in the text. Each mention is highlighted with a color based on its **tag**:
| Color | Tag | Meaning |
|-------|-----|---------|
| ๐ŸŸข Green | **Named** | A specific, named dataset (e.g. "2022 National Census") |
| ๐ŸŸก Amber | **Descriptive** | A described but not formally named dataset (e.g. "a household survey") |
| ๐ŸŸฃ Purple | **Vague** | An unclear or ambiguous data reference |
| โšช Gray | **Non-Dataset** | Flagged by the model but not actually a dataset |
A **legend** above the text shows the count of each type on the current page.
---
## 5. Reviewing Existing Mentions (Validation)
Click the **toggle button (โ€น)** on the right edge to open the **Data Mentions** side panel. For each AI-detected mention you can:
### Validate
1. Click **Validate** on a mention
2. Optionally add notes explaining your decision
3. Choose one of:
- โœ… **Correct** โ€” The mention is a real dataset
- โŒ **Wrong** โ€” The mention is not a dataset (false positive)
### Change Tag
- Click the **tag badge** (e.g. "Named") to edit it
- Select the correct tag from the dropdown
- Click **Save** to update
### Delete
- Click **๐Ÿ—‘ Delete** to remove a false mention
- Click again to confirm (auto-cancels after 3 seconds)
### Status Indicators
- **"Needs review"** โ€” Not yet validated by you
- **"โœ“ verified"** / **"โœ— rejected"** โ€” Your validation result
- A checkmark appears next to validated mentions
---
## 6. Adding New Annotations
If you spot a dataset mention that the AI missed:
1. **Select the text** โ€” Click and drag to highlight the dataset name in the markdown preview
2. **Click "โœ๏ธ Annotate Selection"** โ€” The annotation modal will appear
3. **Choose a Dataset Tag**:
- **Named Dataset** โ€” A specific named dataset
- **Descriptive** โ€” A described but unnamed dataset
- **Vague** โ€” An ambiguous reference
4. **Click "Save Annotation"** โ€” Your annotation is saved
> **Tip:** If no text is selected when you click the button, it will shake to remind you to select text first.
---
## 7. Page Workflow
For each page, the recommended workflow is:
1. **Read** the markdown text on the right while referencing the PDF on the left
2. **Review** each highlighted mention โ€” validate or reject in the side panel
3. **Add** any missed mentions using text selection
4. **Move** to the next page (a warning appears if you have unvalidated mentions)
### Unvalidated Mentions Warning
When moving to the next page with unvalidated mentions, you'll see:
> โš ๏ธ You have N unverified data mention(s) on this page. Do you want to proceed?
You can proceed or go back to finish validating.
---
## 8. Tips & Best Practices
- **Use the PDF** for context โ€” the markdown is extracted text and may have formatting issues
- **Jump buttons (โฎ/โญ)** let you skip pages without mentions quickly
- **Pages without mentions** may still contain datasets the AI missed โ€” browse them when possible
- **Validate everything** on a page before moving on for the most efficient workflow
- **Be precise** when selecting text for new annotations โ€” select just the dataset name, not surrounding context
---
## 9. FAQ
**Q: Can I undo a validation?**
A: Click "Validate" again to re-validate with a different verdict.
**Q: What if the markdown text doesn't match the PDF?**
A: This can happen with complex layouts (tables, figures). Annotate based on what you can read. The PDF is the source of truth.
**Q: Why are some pages empty?**
A: Some pages (like cover pages or blank pages) may have no extracted text. Use the jump buttons to skip them.
**Q: Who sees my annotations?**
A: Annotations are stored centrally. Admins and other annotators with access to the same documents may see your work.
---
## Need Help?
Contact the project admin if you encounter issues or have questions about specific annotation decisions.