Spaces:

ai4data
/

data-use-annotation

Running

Panel	Purpose
Left — PDF Viewer	Shows the original PDF for the current page
Right — Markdown Annotation	Shows extracted text with highlighted data mentions

Top Bar

Title — "Data Use Annotation Tool"
Progress Bar — Overall annotation progress across all corpora
User Badge — Your HuggingFace username
📊 Leaderboard — See annotation stats for all annotators

Document Selector

Dropdown at top-left showing your assigned documents
Documents are labeled by corpus: [World Bank], [UNHCR], etc.
Format: [Corpus] Doc N (X pages)

3. Page Navigation

At the bottom of the screen you'll find the page navigator:

⏮  ← Prev  |  Page 3 ●  (3 / 11)  |  Next →  ⏭

Button	Action
← Prev / Next →	Move one page at a time
⏮ / ⏭	Jump to the previous/next page that has data mentions
● (green dot)	Indicates the current page has AI-detected data mentions

All pages are shown, including those without mentions. Use the jump buttons to quickly navigate to pages of interest.

4. Understanding Data Mentions

The AI model pre-detects potential dataset mentions in the text. Each mention is highlighted with a color based on its tag:

Color	Tag	Meaning
🟢 Green	Named	A specific, named dataset (e.g. "2022 National Census")
🟡 Amber	Descriptive	A described but not formally named dataset (e.g. "a household survey")
🟣 Purple	Vague	An unclear or ambiguous data reference
⚪ Gray	Non-Dataset	Flagged by the model but not actually a dataset

A legend above the text shows the count of each type on the current page.

5. Reviewing Existing Mentions (Validation)

Click the toggle button (‹) on the right edge to open the Data Mentions side panel. For each AI-detected mention you can:

Validate

Click Validate on a mention
Optionally add notes explaining your decision
Choose one of:
- ✅ Correct — The mention is a real dataset
- ❌ Wrong — The mention is not a dataset (false positive)

Change Tag

Click the tag badge (e.g. "Named") to edit it
Select the correct tag from the dropdown
Click Save to update

Delete

Click 🗑 Delete to remove a false mention
Click again to confirm (auto-cancels after 3 seconds)

Status Indicators

"Needs review" — Not yet validated by you
"✓ verified" / "✗ rejected" — Your validation result
A checkmark appears next to validated mentions

6. Adding New Annotations

If you spot a dataset mention that the AI missed:

Select the text — Click and drag to highlight the dataset name in the markdown preview
Click "✍️ Annotate Selection" — The annotation modal will appear
Choose a Dataset Tag:
- Named Dataset — A specific named dataset
- Descriptive — A described but unnamed dataset
- Vague — An ambiguous reference
Click "Save Annotation" — Your annotation is saved

Tip: If no text is selected when you click the button, it will shake to remind you to select text first.

7. Page Workflow

For each page, the recommended workflow is:

Read the markdown text on the right while referencing the PDF on the left
Review each highlighted mention — validate or reject in the side panel
Add any missed mentions using text selection
Move to the next page (a warning appears if you have unvalidated mentions)

Unvalidated Mentions Warning

When moving to the next page with unvalidated mentions, you'll see:

⚠️ You have N unverified data mention(s) on this page. Do you want to proceed?

You can proceed or go back to finish validating.

8. Tips & Best Practices

Use the PDF for context — the markdown is extracted text and may have formatting issues
Jump buttons (⏮/⏭) let you skip pages without mentions quickly
Pages without mentions may still contain datasets the AI missed — browse them when possible
Validate everything on a page before moving on for the most efficient workflow
Be precise when selecting text for new annotations — select just the dataset name, not surrounding context

9. FAQ

Q: Can I undo a validation? A: Click "Validate" again to re-validate with a different verdict.

Q: What if the markdown text doesn't match the PDF? A: This can happen with complex layouts (tables, figures). Annotate based on what you can read. The PDF is the source of truth.

Q: Why are some pages empty? A: Some pages (like cover pages or blank pages) may have no extracted text. Use the jump buttons to skip them.

Q: Who sees my annotations? A: Annotations are stored centrally. Admins and other annotators with access to the same documents may see your work.

Need Help?

Contact the project admin if you encounter issues or have questions about specific annotation decisions.