Spaces:
Running
๐ Data Use Annotation Tool โ Annotator Guide
Welcome! This guide explains how to use the Data Use Annotation Tool to review and annotate data/dataset mentions in documents.
1. Getting Started
Signing In
- Open the tool โ you'll see a login screen
- Click ๐ค Sign in with HuggingFace
- Authorize with your HuggingFace account
- You'll be redirected to the tool showing your assigned documents
Note: Only accounts listed in the annotator configuration will see documents. If you see no documents after logging in, contact the admin.
2. Interface Overview
The tool has two main panels:
| Panel | Purpose |
|---|---|
| Left โ PDF Viewer | Shows the original PDF for the current page |
| Right โ Markdown Annotation | Shows extracted text with highlighted data mentions |
Top Bar
- Title โ "Data Use Annotation Tool"
- Progress Bar โ Overall annotation progress across all corpora
- User Badge โ Your HuggingFace username
- ๐ Leaderboard โ See annotation stats for all annotators
Document Selector
- Dropdown at top-left showing your assigned documents
- Documents are labeled by corpus: [World Bank], [UNHCR], etc.
- Format:
[Corpus] Doc N (X pages)
3. Page Navigation
At the bottom of the screen you'll find the page navigator:
โฎ โ Prev | Page 3 โ (3 / 11) | Next โ โญ
| Button | Action |
|---|---|
| โ Prev / Next โ | Move one page at a time |
| โฎ / โญ | Jump to the previous/next page that has data mentions |
| โ (green dot) | Indicates the current page has AI-detected data mentions |
All pages are shown, including those without mentions. Use the jump buttons to quickly navigate to pages of interest.
4. Understanding Data Mentions
The AI model pre-detects potential dataset mentions in the text. Each mention is highlighted with a color based on its tag:
| Color | Tag | Meaning |
|---|---|---|
| ๐ข Green | Named | A specific, named dataset (e.g. "2022 National Census") |
| ๐ก Amber | Descriptive | A described but not formally named dataset (e.g. "a household survey") |
| ๐ฃ Purple | Vague | An unclear or ambiguous data reference |
| โช Gray | Non-Dataset | Flagged by the model but not actually a dataset |
A legend above the text shows the count of each type on the current page.
5. Reviewing Existing Mentions (Validation)
Click the toggle button (โน) on the right edge to open the Data Mentions side panel. For each AI-detected mention you can:
Validate
- Click Validate on a mention
- Optionally add notes explaining your decision
- Choose one of:
- โ Correct โ The mention is a real dataset
- โ Wrong โ The mention is not a dataset (false positive)
Change Tag
- Click the tag badge (e.g. "Named") to edit it
- Select the correct tag from the dropdown
- Click Save to update
Delete
- Click ๐ Delete to remove a false mention
- Click again to confirm (auto-cancels after 3 seconds)
Status Indicators
- "Needs review" โ Not yet validated by you
- "โ verified" / "โ rejected" โ Your validation result
- A checkmark appears next to validated mentions
6. Adding New Annotations
If you spot a dataset mention that the AI missed:
- Select the text โ Click and drag to highlight the dataset name in the markdown preview
- Click "โ๏ธ Annotate Selection" โ The annotation modal will appear
- Choose a Dataset Tag:
- Named Dataset โ A specific named dataset
- Descriptive โ A described but unnamed dataset
- Vague โ An ambiguous reference
- Click "Save Annotation" โ Your annotation is saved
Tip: If no text is selected when you click the button, it will shake to remind you to select text first.
7. Page Workflow
For each page, the recommended workflow is:
- Read the markdown text on the right while referencing the PDF on the left
- Review each highlighted mention โ validate or reject in the side panel
- Add any missed mentions using text selection
- Move to the next page (a warning appears if you have unvalidated mentions)
Unvalidated Mentions Warning
When moving to the next page with unvalidated mentions, you'll see:
โ ๏ธ You have N unverified data mention(s) on this page. Do you want to proceed?
You can proceed or go back to finish validating.
8. Tips & Best Practices
- Use the PDF for context โ the markdown is extracted text and may have formatting issues
- Jump buttons (โฎ/โญ) let you skip pages without mentions quickly
- Pages without mentions may still contain datasets the AI missed โ browse them when possible
- Validate everything on a page before moving on for the most efficient workflow
- Be precise when selecting text for new annotations โ select just the dataset name, not surrounding context
9. FAQ
Q: Can I undo a validation? A: Click "Validate" again to re-validate with a different verdict.
Q: What if the markdown text doesn't match the PDF? A: This can happen with complex layouts (tables, figures). Annotate based on what you can read. The PDF is the source of truth.
Q: Why are some pages empty? A: Some pages (like cover pages or blank pages) may have no extracted text. Use the jump buttons to skip them.
Q: Who sees my annotations? A: Annotations are stored centrally. Admins and other annotators with access to the same documents may see your work.
Need Help?
Contact the project admin if you encounter issues or have questions about specific annotation decisions.