File size: 5,756 Bytes
f6b1e29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
# ๐Ÿ“– Data Use Annotation Tool โ€” Annotator Guide

Welcome! This guide explains how to use the **Data Use Annotation Tool** to review and annotate data/dataset mentions in documents.

---

## 1. Getting Started

### Signing In
1. Open the tool โ€” you'll see a login screen
2. Click **๐Ÿค— Sign in with HuggingFace**
3. Authorize with your HuggingFace account
4. You'll be redirected to the tool showing your assigned documents

> **Note:** Only accounts listed in the annotator configuration will see documents. If you see no documents after logging in, contact the admin.

---

## 2. Interface Overview

The tool has two main panels:

| Panel | Purpose |
|-------|---------|
| **Left โ€” PDF Viewer** | Shows the original PDF for the current page |
| **Right โ€” Markdown Annotation** | Shows extracted text with highlighted data mentions |

### Top Bar
- **Title** โ€” "Data Use Annotation Tool"
- **Progress Bar** โ€” Overall annotation progress across all corpora
- **User Badge** โ€” Your HuggingFace username
- **๐Ÿ“Š Leaderboard** โ€” See annotation stats for all annotators

### Document Selector
- Dropdown at top-left showing your assigned documents
- Documents are labeled by corpus: **[World Bank]**, **[UNHCR]**, etc.
- Format: `[Corpus] Doc N (X pages)`

---

## 3. Page Navigation

At the bottom of the screen you'll find the page navigator:

```
โฎ  โ† Prev  |  Page 3 โ—  (3 / 11)  |  Next โ†’  โญ
```

| Button | Action |
|--------|--------|
| **โ† Prev / Next โ†’** | Move one page at a time |
| **โฎ / โญ** | Jump to the previous/next page that has data mentions |
| **โ— (green dot)** | Indicates the current page has AI-detected data mentions |

All pages are shown, including those without mentions. Use the jump buttons to quickly navigate to pages of interest.

---

## 4. Understanding Data Mentions

The AI model pre-detects potential dataset mentions in the text. Each mention is highlighted with a color based on its **tag**:

| Color | Tag | Meaning |
|-------|-----|---------|
| ๐ŸŸข Green | **Named** | A specific, named dataset (e.g. "2022 National Census") |
| ๐ŸŸก Amber | **Descriptive** | A described but not formally named dataset (e.g. "a household survey") |
| ๐ŸŸฃ Purple | **Vague** | An unclear or ambiguous data reference |
| โšช Gray | **Non-Dataset** | Flagged by the model but not actually a dataset |

A **legend** above the text shows the count of each type on the current page.

---

## 5. Reviewing Existing Mentions (Validation)

Click the **toggle button (โ€น)** on the right edge to open the **Data Mentions** side panel. For each AI-detected mention you can:

### Validate
1. Click **Validate** on a mention
2. Optionally add notes explaining your decision
3. Choose one of:
   - โœ… **Correct** โ€” The mention is a real dataset
   - โŒ **Wrong** โ€” The mention is not a dataset (false positive)

### Change Tag
- Click the **tag badge** (e.g. "Named") to edit it
- Select the correct tag from the dropdown
- Click **Save** to update

### Delete
- Click **๐Ÿ—‘ Delete** to remove a false mention
- Click again to confirm (auto-cancels after 3 seconds)

### Status Indicators
- **"Needs review"** โ€” Not yet validated by you
- **"โœ“ verified"** / **"โœ— rejected"** โ€” Your validation result
- A checkmark appears next to validated mentions

---

## 6. Adding New Annotations

If you spot a dataset mention that the AI missed:

1. **Select the text** โ€” Click and drag to highlight the dataset name in the markdown preview
2. **Click "โœ๏ธ Annotate Selection"** โ€” The annotation modal will appear
3. **Choose a Dataset Tag**:
   - **Named Dataset** โ€” A specific named dataset
   - **Descriptive** โ€” A described but unnamed dataset
   - **Vague** โ€” An ambiguous reference
4. **Click "Save Annotation"** โ€” Your annotation is saved

> **Tip:** If no text is selected when you click the button, it will shake to remind you to select text first.

---

## 7. Page Workflow

For each page, the recommended workflow is:

1. **Read** the markdown text on the right while referencing the PDF on the left
2. **Review** each highlighted mention โ€” validate or reject in the side panel
3. **Add** any missed mentions using text selection
4. **Move** to the next page (a warning appears if you have unvalidated mentions)

### Unvalidated Mentions Warning
When moving to the next page with unvalidated mentions, you'll see:
> โš ๏ธ You have N unverified data mention(s) on this page. Do you want to proceed?

You can proceed or go back to finish validating.

---

## 8. Tips & Best Practices

- **Use the PDF** for context โ€” the markdown is extracted text and may have formatting issues
- **Jump buttons (โฎ/โญ)** let you skip pages without mentions quickly
- **Pages without mentions** may still contain datasets the AI missed โ€” browse them when possible
- **Validate everything** on a page before moving on for the most efficient workflow
- **Be precise** when selecting text for new annotations โ€” select just the dataset name, not surrounding context

---

## 9. FAQ

**Q: Can I undo a validation?**
A: Click "Validate" again to re-validate with a different verdict.

**Q: What if the markdown text doesn't match the PDF?**
A: This can happen with complex layouts (tables, figures). Annotate based on what you can read. The PDF is the source of truth.

**Q: Why are some pages empty?**
A: Some pages (like cover pages or blank pages) may have no extracted text. Use the jump buttons to skip them.

**Q: Who sees my annotations?**
A: Annotations are stored centrally. Admins and other annotators with access to the same documents may see your work.

---

## Need Help?

Contact the project admin if you encounter issues or have questions about specific annotation decisions.