A newer version of the Streamlit SDK is available: 1.58.0
title: Distractor Annotation Tool
emoji: π―
colorFrom: purple
colorTo: indigo
sdk: streamlit
sdk_version: 1.36.0
app_file: app.py
pinned: false
π― Distractor Annotation Tool
Collaborative annotation GUI for the MSc NLP research project "Keeping LLMs on Track in Task-Oriented Dialogue".
One-time Setup
1. Create a private HF Dataset repo for shared annotations
Go to huggingface.co/new-dataset, make it private, and note the repo ID (e.g. yourgroup/distractor-annotations).
2. Set secrets in your HF Space
In your Space β Settings β Repository secrets, add:
| Secret | Value |
|---|---|
HF_TOKEN |
Your HF token with write access |
ANNOTATIONS_REPO_ID |
e.g. yourgroup/distractor-annotations |
3. Set secrets in your GitHub repo
In GitHub β Settings β Secrets and variables β Actions, add the same HF_TOKEN.
4. Update the sync workflow
In .github/workflows/sync_to_hf.yml, replace YOUR_HF_USERNAME and YOUR_SPACE_NAME with your actual values.
5. Import seed data
On first run, go to the Dashboard and click Import Seed Data to populate the shared repo with the group's initial entries.
Workflow
| Page | Purpose |
|---|---|
| π Dashboard | Stats overview, seed import, config check |
| π Browse | Explore the base nvidia dataset and seed entries |
| βοΈ Annotate | Create multi-turn distractor entries |
| π₯ Annotations | View, edit, review all group work |
| π¬ Test LLM | Send distractors to a live LLM, judge if it gets distracted |
Annotation Schema
Each annotation follows the nvidia/CantTalkAboutThis schema, extended with:
distractors_multiturn: rich multi-turn distractor sequences_id,_annotator,_review_status,_created_at,_updated_at_llm_test_results: logged results from the Test LLM page
Base Dataset
nvidia/CantTalkAboutThis-Topic-Control-Dataset