Spaces:
Sleeping
Sleeping
Commit ·
8cec78f
1
Parent(s): 8c6a471
Migration to RIET-lab/moral-kg-workshop-listenr
Browse files- .gitignore +1 -2
- README.md +0 -1
- SETUP.md +0 -116
- SPEC.md +0 -132
- config.yaml +2 -4
- setup.py +0 -221
- setup.sh +0 -16
- utils/__init__.py +0 -145
- utils/dataset_utils.py +0 -351
- utils/log_utils.py +0 -292
- utils/phase1_utils.py +0 -240
- utils/setup_utils.py +0 -200
- utils/user_utils.py +0 -208
- utils/webhook_utils.py +0 -340
- utils/wipe_utils.py +0 -164
- utils/workspace_utils.py +0 -387
- wipe.py +0 -165
- wipe.sh +0 -16
.gitignore
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
# Env file nor User config file not to be uploaded to the HF Space!
|
| 2 |
.env
|
| 3 |
-
users/
|
| 4 |
archive/
|
| 5 |
-
*__pycache__*
|
|
|
|
| 1 |
# Env file nor User config file not to be uploaded to the HF Space!
|
| 2 |
.env
|
|
|
|
| 3 |
archive/
|
| 4 |
+
*__pycache__*
|
README.md
CHANGED
|
@@ -21,7 +21,6 @@ Part of RIET Lab's initiative to improve AI using moral reasoning.
|
|
| 21 |
**Organization**: [https://huggingface.co/RIET-lab](https://huggingface.co/RIET-lab)
|
| 22 |
**Workshop Website**: [https://sites.google.com/view/mereworkshop](https://sites.google.com/view/mereworkshop)
|
| 23 |
**Argdown Documentation**:[https://argdown.org/guide/](https://argdown.org/guide/)
|
| 24 |
-
**Listener HF Space**: [https://huggingface.co/spaces/RIET-lab/moral-kg-workshop-listener](https://huggingface.co/spaces/RIET-lab/moral-kg-workshop-listener)
|
| 25 |
|
| 26 |
- Creating your own Argilla Spaces, check the [quickstart guide](http://docs.argilla.io/latest/getting_started/quickstart/) and the [Hugging Face Spaces configuration](http://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/) for more details.
|
| 27 |
- Discovering the Argilla UI, sign in with your Hugging Face account!
|
|
|
|
| 21 |
**Organization**: [https://huggingface.co/RIET-lab](https://huggingface.co/RIET-lab)
|
| 22 |
**Workshop Website**: [https://sites.google.com/view/mereworkshop](https://sites.google.com/view/mereworkshop)
|
| 23 |
**Argdown Documentation**:[https://argdown.org/guide/](https://argdown.org/guide/)
|
|
|
|
| 24 |
|
| 25 |
- Creating your own Argilla Spaces, check the [quickstart guide](http://docs.argilla.io/latest/getting_started/quickstart/) and the [Hugging Face Spaces configuration](http://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/) for more details.
|
| 26 |
- Discovering the Argilla UI, sign in with your Hugging Face account!
|
SETUP.md
DELETED
|
@@ -1,116 +0,0 @@
|
|
| 1 |
-
# MERe Workshop Setup Guide
|
| 2 |
-
|
| 3 |
-
Setup and usage guide for the MERe Workshop dataset annotation process.
|
| 4 |
-
A very important note: much of this infrastructure is to avoid paying for a space - there is NO persistant storage in `moral-kg-workshop`.
|
| 5 |
-
|
| 6 |
-
## Environment
|
| 7 |
-
|
| 8 |
-
### Required Environment Variables
|
| 9 |
-
|
| 10 |
-
```bash
|
| 11 |
-
export ARGILLA_API_URL="your-argilla-url"
|
| 12 |
-
export ARGILLA_API_KEY="your-api-key"
|
| 13 |
-
export HF_TOKEN="your-huggingface-token"
|
| 14 |
-
```
|
| 15 |
-
|
| 16 |
-
### Optional Environment Variables
|
| 17 |
-
|
| 18 |
-
```bash
|
| 19 |
-
export SLACK_WEBHOOK_URL="your-slack-webhook-url"
|
| 20 |
-
# For error notifications to slack channel
|
| 21 |
-
# Requires a custom slack app setup with a webhook url.
|
| 22 |
-
# See https://api.slack.com/messaging/webhooks
|
| 23 |
-
```
|
| 24 |
-
|
| 25 |
-
### Dependencies
|
| 26 |
-
|
| 27 |
-
Install required Python packages:
|
| 28 |
-
```bash
|
| 29 |
-
pip install -r requirements.txt
|
| 30 |
-
```
|
| 31 |
-
|
| 32 |
-
## Configuration
|
| 33 |
-
|
| 34 |
-
See `config.yaml`
|
| 35 |
-
|
| 36 |
-
## Space Setup
|
| 37 |
-
|
| 38 |
-
### Complete Setup
|
| 39 |
-
|
| 40 |
-
Run all setup operations (users, datasets, webhooks):
|
| 41 |
-
```bash
|
| 42 |
-
./setup.sh
|
| 43 |
-
# or
|
| 44 |
-
python setup.py
|
| 45 |
-
```
|
| 46 |
-
|
| 47 |
-
### Partial Setup
|
| 48 |
-
|
| 49 |
-
Skip specific operations:
|
| 50 |
-
```bash
|
| 51 |
-
python setup.py --skip-users # Skip user creation
|
| 52 |
-
python setup.py --skip-workspaces # Skip workspace creation (breaks dataset allocation)
|
| 53 |
-
python setup.py --skip-datasets # Skip dataset creation
|
| 54 |
-
python setup.py --skip-webhooks # Skip webhook creation
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
### Status Check Only
|
| 58 |
-
|
| 59 |
-
View current space status without making changes:
|
| 60 |
-
```bash
|
| 61 |
-
python setup.py --status-only
|
| 62 |
-
```
|
| 63 |
-
|
| 64 |
-
## Wipe Operations
|
| 65 |
-
|
| 66 |
-
### Complete Wipe
|
| 67 |
-
|
| 68 |
-
Remove everything (users, workspaces, datasets, webhooks):
|
| 69 |
-
```bash
|
| 70 |
-
./wipe.sh
|
| 71 |
-
# or
|
| 72 |
-
python3 wipe.py
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
### Selective Wipe
|
| 76 |
-
|
| 77 |
-
Remove specific components:
|
| 78 |
-
```bash
|
| 79 |
-
python wipe.py --datasets-only # Only datasets
|
| 80 |
-
python wipe.py --users-only # Only users
|
| 81 |
-
python wipe.py --webhooks-only # Only webhooks
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
### Force Wipe
|
| 85 |
-
|
| 86 |
-
Skip confirmation prompts:
|
| 87 |
-
```bash
|
| 88 |
-
python wipe.py --force
|
| 89 |
-
```
|
| 90 |
-
|
| 91 |
-
### Status Check Only
|
| 92 |
-
|
| 93 |
-
View current space status without making changes:
|
| 94 |
-
```bash
|
| 95 |
-
python wipe.py --status-only
|
| 96 |
-
```
|
| 97 |
-
|
| 98 |
-
## Troubleshooting
|
| 99 |
-
|
| 100 |
-
### Debug Mode
|
| 101 |
-
|
| 102 |
-
For detailed debugging, set log level to DEBUG in `config.yaml`:
|
| 103 |
-
```yaml
|
| 104 |
-
logging:
|
| 105 |
-
level: "DEBUG"
|
| 106 |
-
```
|
| 107 |
-
|
| 108 |
-
### Status Commands
|
| 109 |
-
|
| 110 |
-
```bash
|
| 111 |
-
# Check status during setup
|
| 112 |
-
python3 setup.py --status-only
|
| 113 |
-
|
| 114 |
-
# Check status during wipe
|
| 115 |
-
python3 wipe.py --status-only
|
| 116 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SPEC.md
DELETED
|
@@ -1,132 +0,0 @@
|
|
| 1 |
-
# Moral-kg annotation process setup
|
| 2 |
-
Notes on the annotation pipeline / data ETL process
|
| 3 |
-
|
| 4 |
-
## Annotation Pipeline Architecture
|
| 5 |
-
Annotation occurs in two phases. During **Phase 1** annotators determine which set
|
| 6 |
-
of claims best represents the argument of each paper. During **Phase 2** annotators
|
| 7 |
-
map those claims into an Argument Map.
|
| 8 |
-
|
| 9 |
-
### Setup
|
| 10 |
-
1. Create users and user-specific workspaces based off of `users.csv` list
|
| 11 |
-
2. Create Phase 1 dataset with records for each user
|
| 12 |
-
- NOTE: depending on space constraints, we could do one Phase 1 dataset for
|
| 13 |
-
all users as only Phase 2 is user-response-dependent.
|
| 14 |
-
3. Create webhooks.
|
| 15 |
-
|
| 16 |
-
### Phase 1 Argilla Dataset
|
| 17 |
-
Creation:
|
| 18 |
-
- At startup
|
| 19 |
-
Records Input:
|
| 20 |
-
- Manual batch input via HF dataset `moral-kg-sample`
|
| 21 |
-
Response Output:
|
| 22 |
-
- Real-time webhook to HF dataset `moral-kg-sample-labels`
|
| 23 |
-
- Real-time webhook to Argilla Phase 2 dataset records
|
| 24 |
-
Updates:
|
| 25 |
-
- Only if moral-kg-sample is updated (this is handled manually)
|
| 26 |
-
Fields:
|
| 27 |
-
- Title (Author, Year) "title_info"
|
| 28 |
-
- Text "text"
|
| 29 |
-
Metadata:
|
| 30 |
-
- Identifier (visible to annotators) "id"
|
| 31 |
-
Questions:
|
| 32 |
-
- TextQuestion "claims"
|
| 33 |
-
- Users list the claims which best represent the argument in the paper
|
| 34 |
-
- AI/ML-generated claims are proposed in a list as a suggestion
|
| 35 |
-
Webhooks:
|
| 36 |
-
- Listen if the dataset is ever published (it shouldn't be) and notify admin if
|
| 37 |
-
it is.
|
| 38 |
-
- Response created/updated/deleted -> update `moral-kg-sample-labels`
|
| 39 |
-
-> update Argilla Phase 2 records
|
| 40 |
-
|
| 41 |
-
### Phase 2 Argilla Dataset
|
| 42 |
-
Creation:
|
| 43 |
-
- When the first Phase 1 response is created
|
| 44 |
-
Records Input:
|
| 45 |
-
- Real-time webhook `response.created`/`.updated`/`.deleted` from Phase 1
|
| 46 |
-
Response Output:
|
| 47 |
-
- Real-time webhook to HF dataset `moral-kg-sample-maps`
|
| 48 |
-
Updates:
|
| 49 |
-
- When a Phase 1 response is created/updated/deleted
|
| 50 |
-
Fields:
|
| 51 |
-
- Title (Author, Year) "title_info"
|
| 52 |
-
- Argdown Page "argdown"
|
| 53 |
-
- Text "text"
|
| 54 |
-
Metadata:
|
| 55 |
-
- Identifier (visible to annotators) "id"
|
| 56 |
-
Questions:
|
| 57 |
-
- TextQuestion "argmap"
|
| 58 |
-
- Users are asked to copy and paste their final Argdown input into this box
|
| 59 |
-
as the solution.
|
| 60 |
-
Webhooks:
|
| 61 |
-
- Listen if the dataset is ever published (it shouldn't be) and notify admin if
|
| 62 |
-
it is.
|
| 63 |
-
- Response created/updated/deleted -> update `moral-kg-sample-maps`
|
| 64 |
-
|
| 65 |
-
## HuggingFace Datasets
|
| 66 |
-
There are three huggingface datasets that will be involved in the annotation
|
| 67 |
-
process: `moral-kg-sample`, `moral-kg-sample-labels`, and `moral-kg-sample-maps`.
|
| 68 |
-
|
| 69 |
-
### `moral-kg-sample` (private)
|
| 70 |
-
Will store the data associated with each paper in the sample:
|
| 71 |
-
- identifier | str | The Phil-Papers ID associated with each paper
|
| 72 |
-
- title | str | The title of the paper
|
| 73 |
-
- authors | list:str | The authors attributed to the paper
|
| 74 |
-
- year | str | The publication year of the paper
|
| 75 |
-
- text | str | The paper content (in plain text or markdown)
|
| 76 |
-
- map | dict | The claim:method map that contains each claim
|
| 77 |
-
extracted from the text and its associated
|
| 78 |
-
extraction method.
|
| 79 |
-
|
| 80 |
-
### `moral-kg-sample-labels` (private)
|
| 81 |
-
Will store data associated with the claims annotators select for each paper in
|
| 82 |
-
the sample:
|
| 83 |
-
- identifier | str | The Phil-Papers ID associated with each paper
|
| 84 |
-
- annotator | str | The annotator's unique Argilla UUID
|
| 85 |
-
- map | dict | The claim:method map that contains each claim the
|
| 86 |
-
annotator selects as representative of the paper.
|
| 87 |
-
Claims not found in the original map are labeled
|
| 88 |
-
"annotator"
|
| 89 |
-
|
| 90 |
-
### `moral-kg-sample-maps` (private)
|
| 91 |
-
Will store data associated with the argument maps annotators create for each
|
| 92 |
-
paper in the sample:
|
| 93 |
-
- identifier | str | The Phil-Papers ID associated with each paper
|
| 94 |
-
- annotator | str | The annotator's unique Argilla UUID
|
| 95 |
-
- argmap | dict | The argument map (in Argdown format) that
|
| 96 |
-
represents the paper argument structure.
|
| 97 |
-
|
| 98 |
-
## Webhooks
|
| 99 |
-
|
| 100 |
-
### dataset.published
|
| 101 |
-
- Stretch goal: implement slack notification. For now just log that a dataset
|
| 102 |
-
was published.
|
| 103 |
-
|
| 104 |
-
### response.created
|
| 105 |
-
IF data.data.values contains "claims":
|
| 106 |
-
- This means it is phase 1 response
|
| 107 |
-
ELSE IF data.data.values contains "argmap":
|
| 108 |
-
- This means it is phase 2 response
|
| 109 |
-
|
| 110 |
-
### response.updated
|
| 111 |
-
IF data.record.questions.name contains "claims":
|
| 112 |
-
-
|
| 113 |
-
ELSE IF data.record.questions.name contains "argmap":
|
| 114 |
-
-
|
| 115 |
-
|
| 116 |
-
### response.deleted
|
| 117 |
-
IF data.record.questions.name contains "claims":
|
| 118 |
-
-
|
| 119 |
-
ELSE IF data.record.questions.name contains "argmap":
|
| 120 |
-
-
|
| 121 |
-
|
| 122 |
-
## Notes, Comments, and Questions
|
| 123 |
-
- I assume that our ultimate moral-kg dataset, that which makes up the entirety
|
| 124 |
-
of the KG and will be public, will be in a separate HF dataset.
|
| 125 |
-
- There are no user event webhooks so we must either:
|
| 126 |
-
1. batch create users or
|
| 127 |
-
2. poll every second during the workshop or
|
| 128 |
-
3. track OAuth sign-ins
|
| 129 |
-
- Should we put a link to the website pdf alongside its processed text?
|
| 130 |
-
- For Phase 2 argmap building: ideally we are able to extract the user text
|
| 131 |
-
inputted into the iFrame but I'm not confident we will be able to so this
|
| 132 |
-
solution suffices for now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.yaml
CHANGED
|
@@ -1,4 +1,6 @@
|
|
| 1 |
# moral-kg-workshop config
|
|
|
|
|
|
|
| 2 |
|
| 3 |
# File Paths Configuration
|
| 4 |
paths:
|
|
@@ -83,10 +85,6 @@ phase1:
|
|
| 83 |
logging:
|
| 84 |
level: "INFO"
|
| 85 |
format: "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
|
| 86 |
-
# External library log levels (set to WARNING/ERROR to reduce verbosity)
|
| 87 |
-
external_libraries:
|
| 88 |
-
httpx: "WARNING"
|
| 89 |
-
argilla.sdk: "WARNING"
|
| 90 |
|
| 91 |
# Error Handling Configuration
|
| 92 |
error_handling:
|
|
|
|
| 1 |
# moral-kg-workshop config
|
| 2 |
+
#
|
| 3 |
+
# NOTE: See moral-kg-workshop-listener config for updates!
|
| 4 |
|
| 5 |
# File Paths Configuration
|
| 6 |
paths:
|
|
|
|
| 85 |
logging:
|
| 86 |
level: "INFO"
|
| 87 |
format: "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
# Error Handling Configuration
|
| 90 |
error_handling:
|
setup.py
DELETED
|
@@ -1,221 +0,0 @@
|
|
| 1 |
-
#!/usr/bin/env python3
|
| 2 |
-
|
| 3 |
-
"""
|
| 4 |
-
setup.py
|
| 5 |
-
|
| 6 |
-
Setup script for the MERe Workshop Argilla Hugging Face space. This is the
|
| 7 |
-
primary annotation pipeline. Creates users, workspaces, datasets, and webhooks.
|
| 8 |
-
"""
|
| 9 |
-
|
| 10 |
-
import argparse
|
| 11 |
-
import json
|
| 12 |
-
import os
|
| 13 |
-
|
| 14 |
-
from huggingface_hub import HfApi
|
| 15 |
-
|
| 16 |
-
from utils import (
|
| 17 |
-
validate_env,
|
| 18 |
-
log_operation_success,
|
| 19 |
-
log_operation_failure,
|
| 20 |
-
get_status,
|
| 21 |
-
log_info,
|
| 22 |
-
log_warning,
|
| 23 |
-
create_users,
|
| 24 |
-
create_user_workspaces,
|
| 25 |
-
create_webhooks,
|
| 26 |
-
create_phase1_datasets,
|
| 27 |
-
list_users,
|
| 28 |
-
get_config,
|
| 29 |
-
)
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
def parse_args():
|
| 33 |
-
"""Parse command line arguments."""
|
| 34 |
-
parser = argparse.ArgumentParser(
|
| 35 |
-
description="Setup MERe Workshop Argilla Hugging Face space",
|
| 36 |
-
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
|
| 37 |
-
)
|
| 38 |
-
|
| 39 |
-
parser.add_argument(
|
| 40 |
-
"-u", "--skip-users",
|
| 41 |
-
action="store_true",
|
| 42 |
-
help="Skip user creation step"
|
| 43 |
-
)
|
| 44 |
-
|
| 45 |
-
parser.add_argument(
|
| 46 |
-
"-w", "--skip-workspaces",
|
| 47 |
-
action="store_true",
|
| 48 |
-
help="Skip workspace creation and user assignment step"
|
| 49 |
-
)
|
| 50 |
-
|
| 51 |
-
parser.add_argument(
|
| 52 |
-
"-d", "--skip-datasets",
|
| 53 |
-
action="store_true",
|
| 54 |
-
help="Skip dataset creation step"
|
| 55 |
-
)
|
| 56 |
-
|
| 57 |
-
parser.add_argument(
|
| 58 |
-
"-l", "--skip-listener",
|
| 59 |
-
action="store_true",
|
| 60 |
-
help="Skip restarting the listener space (skips webhook creation step)."
|
| 61 |
-
)
|
| 62 |
-
|
| 63 |
-
parser.add_argument(
|
| 64 |
-
"-s",
|
| 65 |
-
"--status-only",
|
| 66 |
-
action="store_true",
|
| 67 |
-
help="Only show current space status, do not perform setup",
|
| 68 |
-
)
|
| 69 |
-
|
| 70 |
-
return parser.parse_args()
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
def restart_listener():
|
| 74 |
-
"""Start the RIET-lab/moral-kg-workshop-listener space."""
|
| 75 |
-
try:
|
| 76 |
-
api = HfApi(token=os.getenv("HF_TOKEN"))
|
| 77 |
-
api.restart_space(repo_id="RIET-lab/moral-kg-workshop-listener")
|
| 78 |
-
log_operation_success("restart listener space", "Space restart initiated successfully")
|
| 79 |
-
return True
|
| 80 |
-
except Exception as e:
|
| 81 |
-
log_operation_failure("restart listener space", e)
|
| 82 |
-
return False
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
def show_space_status():
|
| 86 |
-
"""Display current space status."""
|
| 87 |
-
status = get_status()
|
| 88 |
-
|
| 89 |
-
if "error" in status:
|
| 90 |
-
log_operation_failure("check space status", status["error"])
|
| 91 |
-
return False
|
| 92 |
-
|
| 93 |
-
print()
|
| 94 |
-
log_info("=== Current Argilla Space Status ===")
|
| 95 |
-
log_info(f"Workspaces: {status['workspaces']}")
|
| 96 |
-
log_info(f"Users: {status['users']}")
|
| 97 |
-
log_info(f"Datasets: {status['datasets']}")
|
| 98 |
-
log_info(f"Records: {status['records']}")
|
| 99 |
-
log_info(f"Webhooks: {status['webhooks']}")
|
| 100 |
-
print()
|
| 101 |
-
|
| 102 |
-
return True
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
def track_user_info(
|
| 106 |
-
filepath=None
|
| 107 |
-
):
|
| 108 |
-
"""Store Argilla user info to a file or log them if no file is provided."""
|
| 109 |
-
users = list_users()
|
| 110 |
-
|
| 111 |
-
if filepath:
|
| 112 |
-
try:
|
| 113 |
-
with open(filepath, 'w', encoding='utf-8') as f:
|
| 114 |
-
json.dump(users, f, indent=2)
|
| 115 |
-
log_info(f"User ID map written to {filepath}")
|
| 116 |
-
except Exception as e:
|
| 117 |
-
log_operation_failure("map user ids", e)
|
| 118 |
-
else:
|
| 119 |
-
log_info(f"User ID map: {users}")
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
def main():
|
| 123 |
-
"""Main setup function."""
|
| 124 |
-
args = parse_args()
|
| 125 |
-
config = get_config()
|
| 126 |
-
|
| 127 |
-
# Validate environment
|
| 128 |
-
try:
|
| 129 |
-
validate_env()
|
| 130 |
-
log_operation_success("setup validation", "Environment validated")
|
| 131 |
-
except Exception as e:
|
| 132 |
-
log_operation_failure("setup validation", e)
|
| 133 |
-
return 1
|
| 134 |
-
|
| 135 |
-
# Show current status
|
| 136 |
-
if not show_space_status():
|
| 137 |
-
return 1
|
| 138 |
-
|
| 139 |
-
# If status-only mode, exit here
|
| 140 |
-
if args.status_only:
|
| 141 |
-
return 0
|
| 142 |
-
|
| 143 |
-
# Track overall success
|
| 144 |
-
operations_success = []
|
| 145 |
-
|
| 146 |
-
# Step 1: Create users
|
| 147 |
-
if not args.skip_users:
|
| 148 |
-
print()
|
| 149 |
-
log_info("Creating users...")
|
| 150 |
-
success = create_users()
|
| 151 |
-
operations_success.append(success)
|
| 152 |
-
|
| 153 |
-
if success:
|
| 154 |
-
log_info("Success: Users created successfully")
|
| 155 |
-
# Track user profiles after creation so we can map users to their UUIDs
|
| 156 |
-
track_user_info(config.get('paths', {}).get('users_info', None))
|
| 157 |
-
else:
|
| 158 |
-
log_info("Failed: Could not create users")
|
| 159 |
-
else:
|
| 160 |
-
log_info("Skipping user creation")
|
| 161 |
-
|
| 162 |
-
# Step 2: Create workspaces
|
| 163 |
-
if not args.skip_workspaces:
|
| 164 |
-
print()
|
| 165 |
-
log_info("Creating workspaces and assigning users...")
|
| 166 |
-
success = create_user_workspaces()
|
| 167 |
-
operations_success.append(success)
|
| 168 |
-
|
| 169 |
-
if success:
|
| 170 |
-
log_info("Success: Workspaces created and users assigned successfully")
|
| 171 |
-
else:
|
| 172 |
-
log_info("Failed: Could not create workspaces and assign users")
|
| 173 |
-
else:
|
| 174 |
-
log_info("Skipping workspace creation and user assignment")
|
| 175 |
-
|
| 176 |
-
# Step 3: Create datasets
|
| 177 |
-
if not args.skip_datasets:
|
| 178 |
-
print()
|
| 179 |
-
log_info("Creating datasets...")
|
| 180 |
-
success = create_phase1_datasets()
|
| 181 |
-
operations_success.append(success)
|
| 182 |
-
|
| 183 |
-
if success:
|
| 184 |
-
log_info("Success: Datasets created successfully")
|
| 185 |
-
else:
|
| 186 |
-
log_info("Failed: Could not create datasets")
|
| 187 |
-
else:
|
| 188 |
-
log_info("Skipping dataset creation")
|
| 189 |
-
|
| 190 |
-
# # Step 4: Restart listener to create webhooks
|
| 191 |
-
if not args.skip_listener:
|
| 192 |
-
print()
|
| 193 |
-
log_info("Restarting RIET-lab/moral-kg-workshop-listener space...")
|
| 194 |
-
success = restart_listener()
|
| 195 |
-
if success:
|
| 196 |
-
log_info("Success: Listener space restart initiated")
|
| 197 |
-
else:
|
| 198 |
-
log_info("Failed: Could not restart listener space")
|
| 199 |
-
return 0 if success else 1
|
| 200 |
-
|
| 201 |
-
# Show final status
|
| 202 |
-
show_space_status()
|
| 203 |
-
|
| 204 |
-
# Overall result
|
| 205 |
-
if operations_success:
|
| 206 |
-
successful_count = sum(operations_success)
|
| 207 |
-
total_count = len(operations_success)
|
| 208 |
-
|
| 209 |
-
if successful_count == total_count:
|
| 210 |
-
log_operation_success("complete setup", "All operations completed successfully", send_to_slack=True)
|
| 211 |
-
return 0
|
| 212 |
-
else:
|
| 213 |
-
log_operation_failure("complete setup", Exception("Some or all operations failed"), send_to_slack=True)
|
| 214 |
-
return 1
|
| 215 |
-
else:
|
| 216 |
-
log_operation_success("complete setup", "No operations were required", send_to_slack=True)
|
| 217 |
-
return 0
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
if __name__ == "__main__":
|
| 221 |
-
exit(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
setup.sh
DELETED
|
@@ -1,16 +0,0 @@
|
|
| 1 |
-
#!/bin/bash
|
| 2 |
-
|
| 3 |
-
# setup.sh
|
| 4 |
-
#
|
| 5 |
-
# Shell wrapper for the MERe Workshop setup process.
|
| 6 |
-
|
| 7 |
-
set -euo pipefail
|
| 8 |
-
|
| 9 |
-
# Get the directory where this script is located
|
| 10 |
-
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
|
| 11 |
-
|
| 12 |
-
# Change to the script directory
|
| 13 |
-
cd "$SCRIPT_DIR"
|
| 14 |
-
|
| 15 |
-
# Run the setup script
|
| 16 |
-
python setup.py "$@"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/__init__.py
DELETED
|
@@ -1,145 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
utils package for MERe Workshop annotation pipeline
|
| 3 |
-
|
| 4 |
-
This package provides utilities for:
|
| 5 |
-
- Configuration management (setup_utils)
|
| 6 |
-
- Logging and notifications (log_utils)
|
| 7 |
-
- Argilla Phase 1 dataset and Hugging Face dataset management (dataset_utils)
|
| 8 |
-
- User management (users_utils.py)
|
| 9 |
-
- Argilla webhook management (webhook_utils.py)
|
| 10 |
-
- Argilla space wiping/status (wipe_utils)
|
| 11 |
-
"""
|
| 12 |
-
|
| 13 |
-
from .setup_utils import (
|
| 14 |
-
get_root,
|
| 15 |
-
get_config,
|
| 16 |
-
get_client,
|
| 17 |
-
get_hf_api,
|
| 18 |
-
validate_env,
|
| 19 |
-
load_users,
|
| 20 |
-
)
|
| 21 |
-
from .log_utils import (
|
| 22 |
-
log_error,
|
| 23 |
-
log_warning,
|
| 24 |
-
log_info,
|
| 25 |
-
log_operation_success,
|
| 26 |
-
log_operation_failure,
|
| 27 |
-
log_dataset_operation,
|
| 28 |
-
log_user_operation,
|
| 29 |
-
log_webhook_operation,
|
| 30 |
-
)
|
| 31 |
-
from .dataset_utils import (
|
| 32 |
-
create_dataset,
|
| 33 |
-
delete_datasets,
|
| 34 |
-
delete_dataset,
|
| 35 |
-
list_datasets,
|
| 36 |
-
update_datasets,
|
| 37 |
-
update_dataset,
|
| 38 |
-
load_moral_kg_sample,
|
| 39 |
-
)
|
| 40 |
-
from .phase1_utils import (
|
| 41 |
-
create_phase1_datasets,
|
| 42 |
-
delete_phase1_datasets,
|
| 43 |
-
update_phase1_datasets,
|
| 44 |
-
)
|
| 45 |
-
from .user_utils import (
|
| 46 |
-
create_users,
|
| 47 |
-
create_user,
|
| 48 |
-
delete_users,
|
| 49 |
-
delete_user,
|
| 50 |
-
list_users,
|
| 51 |
-
)
|
| 52 |
-
from .workspace_utils import (
|
| 53 |
-
create_workspaces,
|
| 54 |
-
create_workspace,
|
| 55 |
-
create_user_workspaces,
|
| 56 |
-
create_user_workspace,
|
| 57 |
-
delete_workspaces,
|
| 58 |
-
delete_workspace,
|
| 59 |
-
delete_user_workspaces,
|
| 60 |
-
delete_user_workspace,
|
| 61 |
-
list_workspaces,
|
| 62 |
-
list_user_workspaces,
|
| 63 |
-
)
|
| 64 |
-
from .webhook_utils import (
|
| 65 |
-
create_webhooks,
|
| 66 |
-
create_webhook,
|
| 67 |
-
delete_webhooks,
|
| 68 |
-
delete_webhook,
|
| 69 |
-
list_webhooks,
|
| 70 |
-
list_webhook_events,
|
| 71 |
-
update_webhooks,
|
| 72 |
-
update_webhook,
|
| 73 |
-
validate_webhooks,
|
| 74 |
-
webhook_exists,
|
| 75 |
-
)
|
| 76 |
-
from .wipe_utils import (
|
| 77 |
-
get_status,
|
| 78 |
-
wipe_space,
|
| 79 |
-
wipe_datasets_only,
|
| 80 |
-
wipe_users_only,
|
| 81 |
-
wipe_webhooks_only,
|
| 82 |
-
)
|
| 83 |
-
|
| 84 |
-
__all__ = [
|
| 85 |
-
"get_root",
|
| 86 |
-
"get_config",
|
| 87 |
-
"get_client",
|
| 88 |
-
"get_hf_api",
|
| 89 |
-
"validate_env",
|
| 90 |
-
"load_users",
|
| 91 |
-
|
| 92 |
-
"log_error",
|
| 93 |
-
"log_warning",
|
| 94 |
-
"log_info",
|
| 95 |
-
"log_operation_success",
|
| 96 |
-
"log_operation_failure",
|
| 97 |
-
"log_dataset_operation",
|
| 98 |
-
"log_user_operation",
|
| 99 |
-
"log_webhook_operation",
|
| 100 |
-
|
| 101 |
-
"create_phase1_datasets",
|
| 102 |
-
"create_dataset",
|
| 103 |
-
"delete_phase1_datasets",
|
| 104 |
-
"delete_datasets",
|
| 105 |
-
"delete_dataset",
|
| 106 |
-
"list_datasets",
|
| 107 |
-
"update_phase1_datasets",
|
| 108 |
-
"update_datasets",
|
| 109 |
-
"update_dataset",
|
| 110 |
-
"load_moral_kg_sample",
|
| 111 |
-
|
| 112 |
-
"create_users",
|
| 113 |
-
"create_user",
|
| 114 |
-
"delete_users",
|
| 115 |
-
"delete_user",
|
| 116 |
-
"list_users",
|
| 117 |
-
|
| 118 |
-
"create_workspaces",
|
| 119 |
-
"create_workspace",
|
| 120 |
-
"create_user_workspaces",
|
| 121 |
-
"create_user_workspace",
|
| 122 |
-
"delete_workspaces",
|
| 123 |
-
"delete_workspace",
|
| 124 |
-
"delete_user_workspaces",
|
| 125 |
-
"delete_user_workspace",
|
| 126 |
-
"list_workspaces",
|
| 127 |
-
"list_user_workspaces",
|
| 128 |
-
|
| 129 |
-
"create_webhooks",
|
| 130 |
-
"create_webhook",
|
| 131 |
-
"delete_webhooks",
|
| 132 |
-
"delete_webhook",
|
| 133 |
-
"list_webhooks",
|
| 134 |
-
"list_webhook_events",
|
| 135 |
-
"update_webhooks",
|
| 136 |
-
"update_webhook",
|
| 137 |
-
"validate_webhooks",
|
| 138 |
-
"webhook_exists",
|
| 139 |
-
|
| 140 |
-
"get_status",
|
| 141 |
-
"wipe_space",
|
| 142 |
-
"wipe_datasets_only",
|
| 143 |
-
"wipe_users_only",
|
| 144 |
-
"wipe_webhooks_only",
|
| 145 |
-
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/dataset_utils.py
DELETED
|
@@ -1,351 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
dataset_utils.py
|
| 3 |
-
|
| 4 |
-
Helper functions for dataset creation and management in the MERe Workshop annotation pipeline.
|
| 5 |
-
Transformed from create-datasets.py script to follow proper helper function paradigm.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
import os
|
| 9 |
-
import warnings
|
| 10 |
-
from typing import Dict, List, Optional
|
| 11 |
-
|
| 12 |
-
import argilla as rg
|
| 13 |
-
from datasets import load_dataset
|
| 14 |
-
|
| 15 |
-
from .setup_utils import (
|
| 16 |
-
get_config,
|
| 17 |
-
get_client,
|
| 18 |
-
get_hf_api
|
| 19 |
-
)
|
| 20 |
-
from .log_utils import (
|
| 21 |
-
log_info,
|
| 22 |
-
log_operation_success,
|
| 23 |
-
log_operation_failure,
|
| 24 |
-
log_dataset_operation
|
| 25 |
-
)
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
# Get config
|
| 29 |
-
_config = get_config()
|
| 30 |
-
|
| 31 |
-
# Get client
|
| 32 |
-
_client = get_client()
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
def load_moral_kg_sample(
|
| 36 |
-
) -> Optional[List[Dict]]:
|
| 37 |
-
"""Load the moral-kg-sample dataset from HuggingFace."""
|
| 38 |
-
global config
|
| 39 |
-
|
| 40 |
-
dataset_name = _config.get('datasets.sample')
|
| 41 |
-
if not dataset_name:
|
| 42 |
-
log_operation_failure("load sample dataset", Exception("Dataset name not configured"))
|
| 43 |
-
return None
|
| 44 |
-
|
| 45 |
-
try:
|
| 46 |
-
# Setup HF client to ensure authentication
|
| 47 |
-
get_hf_api()
|
| 48 |
-
|
| 49 |
-
dataset = load_dataset(dataset_name, split="train", token=os.getenv("HF_TOKEN"))
|
| 50 |
-
|
| 51 |
-
# Convert to list of dictionaries for easier processing
|
| 52 |
-
records = []
|
| 53 |
-
for item in dataset:
|
| 54 |
-
item = dict(item)
|
| 55 |
-
records.append({
|
| 56 |
-
'identifier': item.get('identifier'),
|
| 57 |
-
'title': item.get('title'),
|
| 58 |
-
'authors': item.get('authors'),
|
| 59 |
-
'year': item.get('year'),
|
| 60 |
-
'categories': item.get('categories'),
|
| 61 |
-
'text': item.get('text'),
|
| 62 |
-
'map': item.get('map')
|
| 63 |
-
})
|
| 64 |
-
|
| 65 |
-
log_operation_success("load moral-kg-sample dataset", f"Loaded {len(records)} records")
|
| 66 |
-
return records
|
| 67 |
-
|
| 68 |
-
except Exception as e:
|
| 69 |
-
log_operation_failure("load moral-kg-sample dataset", e)
|
| 70 |
-
return None
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
def _get_workspace_names(
|
| 74 |
-
) -> List[str]:
|
| 75 |
-
"""Get list of available workspaces."""
|
| 76 |
-
|
| 77 |
-
try:
|
| 78 |
-
global _client
|
| 79 |
-
workspaces = _client.workspaces
|
| 80 |
-
workspace_names = [ws.name or "" for ws in workspaces]
|
| 81 |
-
return workspace_names
|
| 82 |
-
except Exception as e:
|
| 83 |
-
log_operation_failure("fetch workspaces", e)
|
| 84 |
-
return []
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
def _format_title_info(
|
| 88 |
-
authors: List[str],
|
| 89 |
-
year: str,
|
| 90 |
-
title: str
|
| 91 |
-
) -> str:
|
| 92 |
-
"""Format title info as 'Title (Author, Year)'."""
|
| 93 |
-
# Take first author and add et al. if multiple authors
|
| 94 |
-
authors_display = authors[0] if authors else "Unknown"
|
| 95 |
-
if len(authors) > 1:
|
| 96 |
-
authors_display += " et al."
|
| 97 |
-
|
| 98 |
-
return f"{title} ({authors_display}, {year})"
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
def _check_dataset_exists(
|
| 102 |
-
workspace_name: str,
|
| 103 |
-
dataset_name: str
|
| 104 |
-
) -> bool:
|
| 105 |
-
"""Check if dataset already exists in workspace."""
|
| 106 |
-
try:
|
| 107 |
-
with warnings.catch_warnings():
|
| 108 |
-
warnings.simplefilter("ignore")
|
| 109 |
-
workspace = _client.workspaces(workspace_name)
|
| 110 |
-
|
| 111 |
-
if workspace:
|
| 112 |
-
for existing_dataset in workspace.datasets:
|
| 113 |
-
if existing_dataset.name == dataset_name:
|
| 114 |
-
return True
|
| 115 |
-
except Exception:
|
| 116 |
-
pass
|
| 117 |
-
return False
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
def create_dataset(
|
| 121 |
-
dataset_name: str,
|
| 122 |
-
workspace_name: Optional[str],
|
| 123 |
-
settings: rg.Settings,
|
| 124 |
-
records: Optional[List[Dict]] = None,
|
| 125 |
-
) -> bool:
|
| 126 |
-
"""Create a dataset with given settings in specified workspace."""
|
| 127 |
-
global _client
|
| 128 |
-
|
| 129 |
-
try:
|
| 130 |
-
dataset = rg.Dataset(
|
| 131 |
-
name=dataset_name,
|
| 132 |
-
workspace=workspace_name,
|
| 133 |
-
settings=settings,
|
| 134 |
-
client=_client,
|
| 135 |
-
)
|
| 136 |
-
dataset.create()
|
| 137 |
-
log_dataset_operation("created", dataset_name, f"in workspace {workspace_name}")
|
| 138 |
-
|
| 139 |
-
# Add records if provided
|
| 140 |
-
if records:
|
| 141 |
-
dataset.records.log(records)
|
| 142 |
-
log_operation_success("load records into dataset", f"Added {len(records)} records")
|
| 143 |
-
|
| 144 |
-
return True
|
| 145 |
-
|
| 146 |
-
except Exception as e:
|
| 147 |
-
log_operation_failure("create dataset", e)
|
| 148 |
-
return False
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
def delete_datasets(
|
| 152 |
-
dataset_names: Optional[List[str]] = None,
|
| 153 |
-
workspace_name: Optional[str] = None
|
| 154 |
-
) -> bool:
|
| 155 |
-
"""Delete multiple datasets or all datasets if none specified."""
|
| 156 |
-
global _client
|
| 157 |
-
|
| 158 |
-
if dataset_names is None:
|
| 159 |
-
# Delete all datasets from all workspaces or specific workspace
|
| 160 |
-
if workspace_name:
|
| 161 |
-
with warnings.catch_warnings():
|
| 162 |
-
warnings.simplefilter("ignore")
|
| 163 |
-
workspace = _client.workspaces(workspace_name)
|
| 164 |
-
if not workspace:
|
| 165 |
-
log_operation_failure("delete datasets", Exception(f"Workspace {workspace_name} not found"))
|
| 166 |
-
return False
|
| 167 |
-
|
| 168 |
-
datasets = workspace.datasets
|
| 169 |
-
dataset_names = [ds.name for ds in datasets if ds.name]
|
| 170 |
-
|
| 171 |
-
success_count = 0
|
| 172 |
-
for ds_name in dataset_names:
|
| 173 |
-
if delete_dataset(workspace_name, ds_name):
|
| 174 |
-
success_count += 1
|
| 175 |
-
|
| 176 |
-
log_operation_success("delete datasets from workspace",
|
| 177 |
-
f"Deleted {success_count}/{len(dataset_names)} datasets from {workspace_name}")
|
| 178 |
-
|
| 179 |
-
return success_count == len(dataset_names)
|
| 180 |
-
else:
|
| 181 |
-
# Get all datasets from all workspaces
|
| 182 |
-
all_datasets = []
|
| 183 |
-
for ws in _client.workspaces:
|
| 184 |
-
ws_name = ws.name
|
| 185 |
-
if ws_name:
|
| 186 |
-
datasets = ws.datasets
|
| 187 |
-
for ds in datasets:
|
| 188 |
-
if ds.name:
|
| 189 |
-
all_datasets.append((ws_name, ds.name))
|
| 190 |
-
|
| 191 |
-
success_count = 0
|
| 192 |
-
for ws_name, ds_name in all_datasets:
|
| 193 |
-
if delete_dataset(ws_name, ds_name):
|
| 194 |
-
success_count += 1
|
| 195 |
-
|
| 196 |
-
log_operation_success("delete all datasets",
|
| 197 |
-
f"Deleted {success_count}/{len(all_datasets)} datasets")
|
| 198 |
-
|
| 199 |
-
return success_count == len(all_datasets)
|
| 200 |
-
else:
|
| 201 |
-
# Delete specific datasets
|
| 202 |
-
if not workspace_name:
|
| 203 |
-
log_operation_failure("delete datasets", Exception("Workspace name required when specifying dataset names"))
|
| 204 |
-
return False
|
| 205 |
-
|
| 206 |
-
success_count = 0
|
| 207 |
-
for dataset_name in dataset_names:
|
| 208 |
-
if delete_dataset(workspace_name, dataset_name):
|
| 209 |
-
success_count += 1
|
| 210 |
-
|
| 211 |
-
log_operation_success("delete datasets",
|
| 212 |
-
f"Deleted {success_count}/{len(dataset_names)} datasets")
|
| 213 |
-
|
| 214 |
-
return success_count == len(dataset_names)
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
def delete_dataset(
|
| 218 |
-
workspace_name: str,
|
| 219 |
-
dataset_name: str
|
| 220 |
-
) -> bool:
|
| 221 |
-
"""Delete a specific dataset from a workspace."""
|
| 222 |
-
try:
|
| 223 |
-
global _client
|
| 224 |
-
workspace = _client.workspaces(workspace_name)
|
| 225 |
-
|
| 226 |
-
if not workspace:
|
| 227 |
-
log_operation_failure("delete dataset", Exception(f"Workspace {workspace_name} not found"))
|
| 228 |
-
return False
|
| 229 |
-
|
| 230 |
-
# Find the dataset in workspace
|
| 231 |
-
dataset = None
|
| 232 |
-
for ds in workspace.datasets:
|
| 233 |
-
if ds.name == dataset_name:
|
| 234 |
-
dataset = ds
|
| 235 |
-
break
|
| 236 |
-
|
| 237 |
-
if not dataset:
|
| 238 |
-
log_operation_failure("delete dataset", Exception(f"Dataset {dataset_name} not found in workspace {workspace_name}"))
|
| 239 |
-
return False
|
| 240 |
-
|
| 241 |
-
# Delete all records first
|
| 242 |
-
try:
|
| 243 |
-
records = list(dataset.records)
|
| 244 |
-
# Filter out None records to avoid AttributeError
|
| 245 |
-
records = [r for r in records if r is not None]
|
| 246 |
-
|
| 247 |
-
if records:
|
| 248 |
-
dataset.records.delete(records=records)
|
| 249 |
-
log_dataset_operation("deleted records", dataset_name, f"{len(records)} records")
|
| 250 |
-
|
| 251 |
-
else:
|
| 252 |
-
log_info(f"No records found in dataset {dataset_name}")
|
| 253 |
-
except Exception as e:
|
| 254 |
-
if e is AttributeError:
|
| 255 |
-
pass
|
| 256 |
-
|
| 257 |
-
else:
|
| 258 |
-
log_operation_failure("delete dataset records", e)
|
| 259 |
-
|
| 260 |
-
# Delete the dataset
|
| 261 |
-
dataset.delete()
|
| 262 |
-
log_dataset_operation("deleted", dataset_name, f"from workspace {workspace_name}")
|
| 263 |
-
|
| 264 |
-
return True
|
| 265 |
-
|
| 266 |
-
except Exception as e:
|
| 267 |
-
log_operation_failure("delete dataset", e)
|
| 268 |
-
return False
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
def list_datasets(
|
| 272 |
-
) -> Dict[str, List[str]]:
|
| 273 |
-
"""List all datasets grouped by workspace."""
|
| 274 |
-
global _client
|
| 275 |
-
|
| 276 |
-
try:
|
| 277 |
-
workspace_datasets = {}
|
| 278 |
-
|
| 279 |
-
for workspace in _client.workspaces:
|
| 280 |
-
workspace_name = workspace.name or "Unknown"
|
| 281 |
-
datasets = [dataset.name for dataset in workspace.datasets if dataset.name]
|
| 282 |
-
workspace_datasets[workspace_name] = datasets
|
| 283 |
-
|
| 284 |
-
log_dataset_operation("listed", f"workspace {workspace_name}",
|
| 285 |
-
f"Found {len(datasets)} datasets")
|
| 286 |
-
|
| 287 |
-
return workspace_datasets
|
| 288 |
-
|
| 289 |
-
except Exception as e:
|
| 290 |
-
log_operation_failure("list datasets", e)
|
| 291 |
-
return {}
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
def update_datasets(
|
| 295 |
-
dataset_updates: List[Dict[str, str]],
|
| 296 |
-
new_settings: Optional[rg.Settings] = None
|
| 297 |
-
) -> bool:
|
| 298 |
-
"""Update multiple datasets."""
|
| 299 |
-
success_count = 0
|
| 300 |
-
|
| 301 |
-
for update_info in dataset_updates:
|
| 302 |
-
workspace_name = update_info.get('workspace', '')
|
| 303 |
-
dataset_name = update_info.get('dataset', '')
|
| 304 |
-
new_workspace = update_info.get('new_workspace')
|
| 305 |
-
|
| 306 |
-
if update_dataset(workspace_name, dataset_name, new_settings, new_workspace):
|
| 307 |
-
success_count += 1
|
| 308 |
-
|
| 309 |
-
log_operation_success("update datasets",
|
| 310 |
-
f"Updated {success_count}/{len(dataset_updates)} datasets")
|
| 311 |
-
|
| 312 |
-
return success_count == len(dataset_updates)
|
| 313 |
-
|
| 314 |
-
|
| 315 |
-
def update_dataset(
|
| 316 |
-
workspace_name: str,
|
| 317 |
-
dataset_name: str,
|
| 318 |
-
new_settings: Optional[rg.Settings] = None,
|
| 319 |
-
new_workspace: Optional[str] = None
|
| 320 |
-
) -> bool:
|
| 321 |
-
"""Update a specific dataset's settings or move to new workspace."""
|
| 322 |
-
global _client
|
| 323 |
-
|
| 324 |
-
try:
|
| 325 |
-
with warnings.catch_warnings():
|
| 326 |
-
warnings.simplefilter("ignore")
|
| 327 |
-
workspace = _client.workspaces(workspace_name)
|
| 328 |
-
dataset = workspace.datasets(dataset_name) #type: ignore
|
| 329 |
-
|
| 330 |
-
if not dataset:
|
| 331 |
-
log_operation_failure("update dataset",
|
| 332 |
-
Exception(f"Dataset {dataset_name} not found in workspace {workspace_name}"))
|
| 333 |
-
return False
|
| 334 |
-
|
| 335 |
-
# Update settings if provided
|
| 336 |
-
if new_settings:
|
| 337 |
-
# Note: Argilla may not support direct settings updates, this might need to be recreate
|
| 338 |
-
log_operation_success("update dataset settings",
|
| 339 |
-
f"Attempted to update {dataset_name}")
|
| 340 |
-
|
| 341 |
-
# Move to new workspace if provided
|
| 342 |
-
if new_workspace:
|
| 343 |
-
# Note: This typically requires recreating the dataset in the new workspace
|
| 344 |
-
log_operation_success("move dataset workspace",
|
| 345 |
-
f"Attempted to move {dataset_name} to {new_workspace}")
|
| 346 |
-
|
| 347 |
-
return True
|
| 348 |
-
|
| 349 |
-
except Exception as e:
|
| 350 |
-
log_operation_failure("update dataset", e)
|
| 351 |
-
return False
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/log_utils.py
DELETED
|
@@ -1,292 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
log_utils.py
|
| 3 |
-
|
| 4 |
-
Logging and notification utilities for the MERe Workshop annotation pipeline.
|
| 5 |
-
Handles error logging, Slack notifications, and webhook data logging.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
import logging
|
| 9 |
-
import os
|
| 10 |
-
import textwrap
|
| 11 |
-
from typing import Optional
|
| 12 |
-
|
| 13 |
-
import requests
|
| 14 |
-
|
| 15 |
-
from .setup_utils import get_config
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
# Get config
|
| 19 |
-
config = get_config()
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
def _setup_logging(
|
| 23 |
-
) -> logging.Logger:
|
| 24 |
-
"""Set up logging configuration."""
|
| 25 |
-
global config
|
| 26 |
-
log_config = config.get("logging", {})
|
| 27 |
-
|
| 28 |
-
# Configure logging
|
| 29 |
-
logging.basicConfig(
|
| 30 |
-
level=getattr(logging, log_config.get("level", "INFO")),
|
| 31 |
-
format=log_config.get(
|
| 32 |
-
"format", "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
|
| 33 |
-
),
|
| 34 |
-
)
|
| 35 |
-
|
| 36 |
-
# Configure external library log levels
|
| 37 |
-
external_libs = log_config.get("external_libraries", {})
|
| 38 |
-
for lib_name, log_level in external_libs.items():
|
| 39 |
-
lib_logger = logging.getLogger(lib_name)
|
| 40 |
-
lib_logger.setLevel(getattr(logging, log_level.upper()))
|
| 41 |
-
|
| 42 |
-
return logging.getLogger("mere_workshop")
|
| 43 |
-
|
| 44 |
-
# Get logger
|
| 45 |
-
logger = _setup_logging()
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
def _can_print(
|
| 49 |
-
) -> bool:
|
| 50 |
-
"""
|
| 51 |
-
Use to test if you can use the print() function.
|
| 52 |
-
|
| 53 |
-
returns True: if you can _print_ in the space or locally
|
| 54 |
-
return False: if you cannot _print_ in the space or locally
|
| 55 |
-
"""
|
| 56 |
-
global config
|
| 57 |
-
|
| 58 |
-
return config.get("error_handling.log_data_not_slack", False)
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
def _can_log(
|
| 62 |
-
) -> bool:
|
| 63 |
-
"""
|
| 64 |
-
Use to test if you can use any log_*() function.
|
| 65 |
-
|
| 66 |
-
returns True: if you can _log_ in the space or locally
|
| 67 |
-
return False: if you cannot _log_ in the space or locally
|
| 68 |
-
"""
|
| 69 |
-
global config
|
| 70 |
-
|
| 71 |
-
return (config.get("error_handling.log_data_not_slack", False) and
|
| 72 |
-
config.get("error_handling.force_slack_notifications", False))
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
def _send_slack_notification(
|
| 76 |
-
message: str,
|
| 77 |
-
) -> bool:
|
| 78 |
-
"""Sends Slack notification if configured."""
|
| 79 |
-
global config
|
| 80 |
-
global logger
|
| 81 |
-
|
| 82 |
-
slack_webhook_url = os.getenv("SLACK_WEBHOOK_URL")
|
| 83 |
-
if not slack_webhook_url:
|
| 84 |
-
if _can_log():
|
| 85 |
-
logger.warning("SLACK_WEBHOOK_URL not configured, skipping notification")
|
| 86 |
-
elif _can_print():
|
| 87 |
-
print(f"SLACK_WEBHOOK_URL not configured, skipping notification")
|
| 88 |
-
|
| 89 |
-
return False
|
| 90 |
-
|
| 91 |
-
try:
|
| 92 |
-
payload = {"text": message}
|
| 93 |
-
response = requests.post(
|
| 94 |
-
slack_webhook_url,
|
| 95 |
-
json=payload,
|
| 96 |
-
headers={"Content-Type": "application/json"},
|
| 97 |
-
timeout=10,
|
| 98 |
-
)
|
| 99 |
-
|
| 100 |
-
if response.status_code == 200:
|
| 101 |
-
if _can_log():
|
| 102 |
-
logger.info(f"Slack notification sent: {message}")
|
| 103 |
-
elif _can_print():
|
| 104 |
-
print(f"Slack notification sent: {message}")
|
| 105 |
-
|
| 106 |
-
return True
|
| 107 |
-
else:
|
| 108 |
-
if _can_log():
|
| 109 |
-
logger.error(f"Failed to send Slack notification. Status code: {response.status_code}")
|
| 110 |
-
elif _can_print():
|
| 111 |
-
print(f"Failed to send Slack notification. Status code: {response.status_code}")
|
| 112 |
-
|
| 113 |
-
return False
|
| 114 |
-
|
| 115 |
-
except Exception as e:
|
| 116 |
-
if _can_log():
|
| 117 |
-
logger.error("Error sending Slack notification", e)
|
| 118 |
-
elif _can_print():
|
| 119 |
-
print(f"Error sending Slack notification: {e}")
|
| 120 |
-
|
| 121 |
-
return False
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
def _send_to_slack(
|
| 125 |
-
send_to_slack: bool,
|
| 126 |
-
message: str
|
| 127 |
-
) -> bool:
|
| 128 |
-
"""Determine if a log should be sent as a slack notification"""
|
| 129 |
-
global config
|
| 130 |
-
global logger
|
| 131 |
-
|
| 132 |
-
try:
|
| 133 |
-
if (config.get("error_handling.slack_notifications", False) and
|
| 134 |
-
(send_to_slack or
|
| 135 |
-
config.get("error_handling.force_slack_notifications", False))):
|
| 136 |
-
return _send_slack_notification(message)
|
| 137 |
-
else:
|
| 138 |
-
return True
|
| 139 |
-
|
| 140 |
-
except Exception as e:
|
| 141 |
-
logger.error("Error sending Slack notification", e)
|
| 142 |
-
return False
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
def log_error(
|
| 146 |
-
error_msg: str,
|
| 147 |
-
exception: Optional[Exception] = None,
|
| 148 |
-
send_to_slack: bool = False,
|
| 149 |
-
) -> None:
|
| 150 |
-
"""Log errors."""
|
| 151 |
-
global config
|
| 152 |
-
global logger
|
| 153 |
-
|
| 154 |
-
if exception:
|
| 155 |
-
# Format error with indented description using textwrap
|
| 156 |
-
error_detail = textwrap.indent(str(exception), " ")
|
| 157 |
-
full_msg = (
|
| 158 |
-
f"{error_msg}\n Exception: {type(exception).__name__}\n{error_detail}"
|
| 159 |
-
)
|
| 160 |
-
else:
|
| 161 |
-
full_msg = error_msg
|
| 162 |
-
|
| 163 |
-
if _can_log():
|
| 164 |
-
logger.error(full_msg)
|
| 165 |
-
_send_to_slack(send_to_slack, full_msg)
|
| 166 |
-
elif _can_print():
|
| 167 |
-
print(f"[ERROR] {full_msg}")
|
| 168 |
-
_send_to_slack(send_to_slack, full_msg)
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
def log_warning(
|
| 172 |
-
warning_msg: str,
|
| 173 |
-
send_to_slack: bool = False
|
| 174 |
-
) -> None:
|
| 175 |
-
"""Log warnings."""
|
| 176 |
-
global config
|
| 177 |
-
global logger
|
| 178 |
-
|
| 179 |
-
if _can_log():
|
| 180 |
-
logger.warning(warning_msg)
|
| 181 |
-
_send_to_slack(send_to_slack, warning_msg)
|
| 182 |
-
elif _can_print():
|
| 183 |
-
print(f"[WARNING] {warning_msg}")
|
| 184 |
-
_send_to_slack(send_to_slack, warning_msg)
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
def log_info(
|
| 188 |
-
info_msg: str,
|
| 189 |
-
send_to_slack: bool = False
|
| 190 |
-
) -> None:
|
| 191 |
-
"""Log information."""
|
| 192 |
-
global config
|
| 193 |
-
global logger
|
| 194 |
-
|
| 195 |
-
if _can_log():
|
| 196 |
-
logger.info(info_msg)
|
| 197 |
-
_send_to_slack(send_to_slack, info_msg)
|
| 198 |
-
elif _can_print():
|
| 199 |
-
print(f"[INFO] {info_msg}")
|
| 200 |
-
_send_to_slack(send_to_slack, info_msg)
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
def log_operation_success(
|
| 204 |
-
operation: str,
|
| 205 |
-
details: Optional[str] = None,
|
| 206 |
-
send_to_slack: bool = False
|
| 207 |
-
) -> None:
|
| 208 |
-
"""Log successful operation."""
|
| 209 |
-
global config
|
| 210 |
-
|
| 211 |
-
msg = f"Successfully completed {operation}"
|
| 212 |
-
if details:
|
| 213 |
-
msg += f": {details}"
|
| 214 |
-
|
| 215 |
-
log_info(msg)
|
| 216 |
-
_send_to_slack(send_to_slack, msg)
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
def log_operation_failure(
|
| 220 |
-
operation: str,
|
| 221 |
-
error: Optional[Exception] = None,
|
| 222 |
-
send_to_slack: bool = False,
|
| 223 |
-
) -> None:
|
| 224 |
-
"""Logs failed operation."""
|
| 225 |
-
global config
|
| 226 |
-
|
| 227 |
-
msg = f"Failed to {operation}"
|
| 228 |
-
|
| 229 |
-
log_error(msg, error)
|
| 230 |
-
|
| 231 |
-
if error:
|
| 232 |
-
error_detail = textwrap.indent(str(error), " ")
|
| 233 |
-
full_msg = (
|
| 234 |
-
f"{msg}\n Exception: {type(error).__name__}\n{error_detail}"
|
| 235 |
-
)
|
| 236 |
-
_send_to_slack(send_to_slack, full_msg)
|
| 237 |
-
else:
|
| 238 |
-
_send_to_slack(send_to_slack, msg)
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
def log_dataset_operation(
|
| 242 |
-
operation: str,
|
| 243 |
-
dataset_name: str,
|
| 244 |
-
details: Optional[str] = None,
|
| 245 |
-
send_to_slack: bool = False,
|
| 246 |
-
) -> None:
|
| 247 |
-
"""Log dataset-related operations."""
|
| 248 |
-
global config
|
| 249 |
-
global logger
|
| 250 |
-
|
| 251 |
-
msg = f"Dataset {operation} ({dataset_name})"
|
| 252 |
-
if details:
|
| 253 |
-
msg += f": {details}"
|
| 254 |
-
|
| 255 |
-
logger.info(msg)
|
| 256 |
-
_send_to_slack(send_to_slack, msg)
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
def log_user_operation(
|
| 260 |
-
operation: str,
|
| 261 |
-
username: str,
|
| 262 |
-
details: Optional[str] = None,
|
| 263 |
-
send_to_slack: bool = False,
|
| 264 |
-
) -> None:
|
| 265 |
-
"""Log user-related operations."""
|
| 266 |
-
global config
|
| 267 |
-
global logger
|
| 268 |
-
|
| 269 |
-
msg = f"User {operation} ({username})"
|
| 270 |
-
if details:
|
| 271 |
-
msg += f": {details}"
|
| 272 |
-
|
| 273 |
-
logger.info(msg)
|
| 274 |
-
_send_to_slack(send_to_slack, msg)
|
| 275 |
-
|
| 276 |
-
|
| 277 |
-
def log_webhook_operation(
|
| 278 |
-
operation: str,
|
| 279 |
-
event: str,
|
| 280 |
-
details: Optional[str] = None,
|
| 281 |
-
send_to_slack: bool = False,
|
| 282 |
-
) -> None:
|
| 283 |
-
"""Log webhook-related operations."""
|
| 284 |
-
global config
|
| 285 |
-
global logger
|
| 286 |
-
|
| 287 |
-
msg = f"Webhook {operation} ({event})"
|
| 288 |
-
if details:
|
| 289 |
-
msg += f": {details}"
|
| 290 |
-
|
| 291 |
-
logger.info(msg)
|
| 292 |
-
_send_to_slack(send_to_slack, msg)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/phase1_utils.py
DELETED
|
@@ -1,240 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
phase1_utils.py
|
| 3 |
-
|
| 4 |
-
Helper functions for Phase 1 dataset creation and management in the MERe Workshop annotation pipeline.
|
| 5 |
-
"""
|
| 6 |
-
|
| 7 |
-
import json
|
| 8 |
-
import warnings
|
| 9 |
-
from typing import Dict, List, Optional
|
| 10 |
-
|
| 11 |
-
import argilla as rg
|
| 12 |
-
|
| 13 |
-
from .setup_utils import (
|
| 14 |
-
get_config,
|
| 15 |
-
get_client,
|
| 16 |
-
)
|
| 17 |
-
from .log_utils import (
|
| 18 |
-
log_info,
|
| 19 |
-
log_operation_success,
|
| 20 |
-
log_operation_failure,
|
| 21 |
-
log_dataset_operation
|
| 22 |
-
)
|
| 23 |
-
from .dataset_utils import (
|
| 24 |
-
load_moral_kg_sample,
|
| 25 |
-
_get_workspace_names,
|
| 26 |
-
_format_title_info,
|
| 27 |
-
_check_dataset_exists,
|
| 28 |
-
create_dataset,
|
| 29 |
-
delete_dataset,
|
| 30 |
-
update_dataset
|
| 31 |
-
)
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
# Get config and client
|
| 35 |
-
_config = get_config()
|
| 36 |
-
_client = get_client()
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
def _create_phase1_settings(
|
| 40 |
-
) -> rg.Settings:
|
| 41 |
-
"""Create the Phase 1 dataset settings from configuration."""
|
| 42 |
-
global _config
|
| 43 |
-
phase1_config = _config.phase1
|
| 44 |
-
|
| 45 |
-
# Build fields from config
|
| 46 |
-
fields = []
|
| 47 |
-
for field_name, field_config in phase1_config.get('fields', {}).items():
|
| 48 |
-
fields.append(rg.TextField(
|
| 49 |
-
name=field_config['name'],
|
| 50 |
-
title=field_config['title'],
|
| 51 |
-
use_markdown=field_config.get('use_markdown', False)
|
| 52 |
-
))
|
| 53 |
-
|
| 54 |
-
# Build metadata from config
|
| 55 |
-
metadata = []
|
| 56 |
-
for meta_name, meta_config in phase1_config.get('metadata', {}).items():
|
| 57 |
-
metadata.append(rg.TermsMetadataProperty(
|
| 58 |
-
name=meta_config['name'],
|
| 59 |
-
title=meta_config['title'],
|
| 60 |
-
visible_for_annotators=meta_config.get('visible_for_annotators', True)
|
| 61 |
-
))
|
| 62 |
-
|
| 63 |
-
# Build questions from config
|
| 64 |
-
questions = []
|
| 65 |
-
for question_name, question_config in phase1_config.get('questions', {}).items():
|
| 66 |
-
if question_config.get('type') == 'TextQuestion':
|
| 67 |
-
questions.append(rg.TextQuestion(
|
| 68 |
-
name=question_config['name'],
|
| 69 |
-
title=question_config['title'],
|
| 70 |
-
description=question_config.get('description', ''),
|
| 71 |
-
required=question_config.get('required', False)
|
| 72 |
-
))
|
| 73 |
-
else:
|
| 74 |
-
log_operation_failure("add question to Phase 1 dataset",
|
| 75 |
-
Exception("Haven't implemented non TextQuestions into the process."))
|
| 76 |
-
|
| 77 |
-
return rg.Settings(
|
| 78 |
-
guidelines=phase1_config.get('guidelines', ''),
|
| 79 |
-
fields=fields,
|
| 80 |
-
metadata=metadata,
|
| 81 |
-
questions=questions
|
| 82 |
-
)
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
def _create_phase1_dataset(
|
| 86 |
-
workspace_name: str,
|
| 87 |
-
records: List[Dict]
|
| 88 |
-
) -> bool:
|
| 89 |
-
"""Create Phase 1 dataset for a specific workspace."""
|
| 90 |
-
global _client
|
| 91 |
-
|
| 92 |
-
dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
|
| 93 |
-
|
| 94 |
-
# Check if dataset already exists
|
| 95 |
-
if _check_dataset_exists(workspace_name, dataset_name):
|
| 96 |
-
log_dataset_operation("created", dataset_name, f"in workspace {workspace_name} (already exists)")
|
| 97 |
-
# Get existing dataset for record loading
|
| 98 |
-
try:
|
| 99 |
-
with warnings.catch_warnings():
|
| 100 |
-
warnings.simplefilter("ignore")
|
| 101 |
-
workspace = _client.workspaces(workspace_name)
|
| 102 |
-
if workspace:
|
| 103 |
-
for existing_dataset in workspace.datasets:
|
| 104 |
-
if existing_dataset.name == dataset_name:
|
| 105 |
-
dataset = existing_dataset
|
| 106 |
-
break
|
| 107 |
-
except Exception as e:
|
| 108 |
-
log_operation_failure("get existing dataset", e)
|
| 109 |
-
return False
|
| 110 |
-
else:
|
| 111 |
-
# Create new dataset
|
| 112 |
-
try:
|
| 113 |
-
dataset = rg.Dataset(
|
| 114 |
-
name=dataset_name,
|
| 115 |
-
workspace=workspace_name,
|
| 116 |
-
settings=_create_phase1_settings(),
|
| 117 |
-
client=_client,
|
| 118 |
-
)
|
| 119 |
-
dataset.create()
|
| 120 |
-
log_dataset_operation("created", dataset_name, f"in workspace {workspace_name}")
|
| 121 |
-
except Exception as e:
|
| 122 |
-
log_operation_failure("create Phase 1 dataset", e)
|
| 123 |
-
return False
|
| 124 |
-
|
| 125 |
-
# Convert records to Argilla format and load them
|
| 126 |
-
try:
|
| 127 |
-
argilla_records = []
|
| 128 |
-
for record in records:
|
| 129 |
-
title_info = _format_title_info(
|
| 130 |
-
record['authors'],
|
| 131 |
-
record['year'],
|
| 132 |
-
record['title']
|
| 133 |
-
).strip()
|
| 134 |
-
# Parse map from JSON string back to dictionary
|
| 135 |
-
map_data = json.loads(record['map']) if record['map'] else {}
|
| 136 |
-
suggestions = list(map_data.keys())
|
| 137 |
-
|
| 138 |
-
argilla_record = rg.Record(
|
| 139 |
-
fields={
|
| 140 |
-
"title_info": title_info,
|
| 141 |
-
"text": record['text']
|
| 142 |
-
},
|
| 143 |
-
metadata={
|
| 144 |
-
"id": record['identifier'],
|
| 145 |
-
"fields": record['categories']
|
| 146 |
-
},
|
| 147 |
-
suggestions=[
|
| 148 |
-
rg.Suggestion(
|
| 149 |
-
question_name="claims",
|
| 150 |
-
value="\n\n".join(suggestions)
|
| 151 |
-
)
|
| 152 |
-
]
|
| 153 |
-
)
|
| 154 |
-
argilla_records.append(argilla_record)
|
| 155 |
-
|
| 156 |
-
# Add records to dataset
|
| 157 |
-
dataset.records.log(argilla_records)
|
| 158 |
-
log_operation_success("load records into dataset", f"Added {len(argilla_records)} records")
|
| 159 |
-
|
| 160 |
-
return True
|
| 161 |
-
|
| 162 |
-
except Exception as e:
|
| 163 |
-
log_operation_failure("load records into dataset", e)
|
| 164 |
-
return False
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
def create_phase1_datasets(
|
| 168 |
-
) -> bool:
|
| 169 |
-
"""Create Phase 1 datasets for all available workspaces."""
|
| 170 |
-
try:
|
| 171 |
-
# Load client and get workspaces
|
| 172 |
-
workspace_names = _get_workspace_names()
|
| 173 |
-
|
| 174 |
-
if not workspace_names:
|
| 175 |
-
log_operation_failure("create datasets", Exception("No workspaces found"))
|
| 176 |
-
return False
|
| 177 |
-
|
| 178 |
-
# Load records from HuggingFace
|
| 179 |
-
records = load_moral_kg_sample()
|
| 180 |
-
if not records:
|
| 181 |
-
log_operation_failure("create datasets", Exception("Failed to load sample records"))
|
| 182 |
-
return False
|
| 183 |
-
|
| 184 |
-
# Create datasets for each workspace
|
| 185 |
-
success_count = 0
|
| 186 |
-
failed_count = 0
|
| 187 |
-
for workspace_name in workspace_names:
|
| 188 |
-
if _create_phase1_dataset(workspace_name, records):
|
| 189 |
-
success_count += 1
|
| 190 |
-
else:
|
| 191 |
-
failed_count += 1
|
| 192 |
-
|
| 193 |
-
# Use transaction-like logging
|
| 194 |
-
log_info(f"Create Phase 1 datasets: {success_count} / {len(workspace_names)} succeeded, {failed_count} failed.")
|
| 195 |
-
|
| 196 |
-
return success_count == len(workspace_names)
|
| 197 |
-
|
| 198 |
-
except Exception as e:
|
| 199 |
-
log_operation_failure("create datasets for all workspaces", e)
|
| 200 |
-
return False
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
def delete_phase1_datasets(
|
| 204 |
-
) -> bool:
|
| 205 |
-
"""Delete all Phase 1 datasets from all workspaces."""
|
| 206 |
-
global config
|
| 207 |
-
|
| 208 |
-
dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
|
| 209 |
-
workspace_names = _get_workspace_names()
|
| 210 |
-
|
| 211 |
-
success_count = 0
|
| 212 |
-
for workspace_name in workspace_names:
|
| 213 |
-
if delete_dataset(workspace_name, dataset_name):
|
| 214 |
-
success_count += 1
|
| 215 |
-
|
| 216 |
-
log_operation_success("delete Phase 1 datasets",
|
| 217 |
-
f"Deleted {success_count}/{len(workspace_names)} datasets")
|
| 218 |
-
|
| 219 |
-
return success_count == len(workspace_names)
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
def update_phase1_datasets(
|
| 223 |
-
new_settings: Optional[rg.Settings] = None,
|
| 224 |
-
new_workspace: Optional[str] = None
|
| 225 |
-
) -> bool:
|
| 226 |
-
"""Update all Phase 1 datasets with new settings or move to new workspace."""
|
| 227 |
-
global _config
|
| 228 |
-
|
| 229 |
-
dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
|
| 230 |
-
workspace_names = _get_workspace_names()
|
| 231 |
-
|
| 232 |
-
success_count = 0
|
| 233 |
-
for workspace_name in workspace_names:
|
| 234 |
-
if update_dataset(workspace_name, dataset_name, new_settings, new_workspace):
|
| 235 |
-
success_count += 1
|
| 236 |
-
|
| 237 |
-
log_operation_success("update Phase 1 datasets",
|
| 238 |
-
f"Updated {success_count}/{len(workspace_names)} datasets")
|
| 239 |
-
|
| 240 |
-
return success_count == len(workspace_names)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/setup_utils.py
DELETED
|
@@ -1,200 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
setup_utils.py
|
| 3 |
-
|
| 4 |
-
Initialization utilities for the MERe Workshop annotation pipeline.
|
| 5 |
-
Handles setup of clients, configuration loading, and environment validation.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
import os
|
| 9 |
-
from pathlib import Path
|
| 10 |
-
from typing import Any, Dict
|
| 11 |
-
|
| 12 |
-
import argilla as rg
|
| 13 |
-
from huggingface_hub import HfApi
|
| 14 |
-
import rootutils
|
| 15 |
-
import yaml
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
# Setup project root
|
| 19 |
-
_root = rootutils.setup_root(__file__, indicator=".git", pythonpath=True)
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
def validate_env(
|
| 23 |
-
) -> bool:
|
| 24 |
-
"""Validate that all required environment variables are set."""
|
| 25 |
-
required_vars = [
|
| 26 |
-
"ARGILLA_API_URL",
|
| 27 |
-
"ARGILLA_API_KEY",
|
| 28 |
-
"HF_TOKEN"]
|
| 29 |
-
|
| 30 |
-
missing_vars = [var for var in required_vars if not os.getenv(var)]
|
| 31 |
-
|
| 32 |
-
if missing_vars:
|
| 33 |
-
raise EnvironmentError(
|
| 34 |
-
f"Missing required environment variables: {', '.join(missing_vars)}"
|
| 35 |
-
)
|
| 36 |
-
|
| 37 |
-
return True
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
class Config:
|
| 41 |
-
"""Configuration manager for the MERe Workshop application."""
|
| 42 |
-
|
| 43 |
-
def __init__(
|
| 44 |
-
self,
|
| 45 |
-
config_path: str = "config.yaml"
|
| 46 |
-
):
|
| 47 |
-
self._config_path = config_path
|
| 48 |
-
self._config = self._load_config()
|
| 49 |
-
|
| 50 |
-
def _load_config(
|
| 51 |
-
self
|
| 52 |
-
) -> Dict[str, Any] | None:
|
| 53 |
-
"""Load configuration from YAML file."""
|
| 54 |
-
if validate_env():
|
| 55 |
-
config_file = _root / self._config_path
|
| 56 |
-
|
| 57 |
-
if not config_file.exists():
|
| 58 |
-
raise FileNotFoundError(f"Configuration file not found: {config_file}")
|
| 59 |
-
|
| 60 |
-
with open(config_file, "r", encoding="utf-8") as f:
|
| 61 |
-
return yaml.safe_load(f)
|
| 62 |
-
|
| 63 |
-
def get(
|
| 64 |
-
self,
|
| 65 |
-
key_path: str,
|
| 66 |
-
default: Any = None
|
| 67 |
-
) -> Any:
|
| 68 |
-
"""Get configuration value using dot notation (e.g., 'datasets.sample')."""
|
| 69 |
-
keys = key_path.split(".")
|
| 70 |
-
value = self._config
|
| 71 |
-
|
| 72 |
-
for key in keys:
|
| 73 |
-
if isinstance(value, dict) and key in value:
|
| 74 |
-
value = value[key]
|
| 75 |
-
else:
|
| 76 |
-
return default
|
| 77 |
-
|
| 78 |
-
return value
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
@property
|
| 82 |
-
def datasets(
|
| 83 |
-
self
|
| 84 |
-
) -> Dict[str, str]:
|
| 85 |
-
"""Get dataset configuration."""
|
| 86 |
-
return self.get("datasets", {})
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
@property
|
| 90 |
-
def webhook_events(
|
| 91 |
-
self
|
| 92 |
-
) -> Dict[str, Any]:
|
| 93 |
-
"""Get webhook configuration."""
|
| 94 |
-
return self.get("webhooks.events", {})
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
@property
|
| 98 |
-
def phase1(
|
| 99 |
-
self
|
| 100 |
-
) -> Dict[str, Any]:
|
| 101 |
-
"""Get Phase 1 configuration."""
|
| 102 |
-
return self.get("phase1", {})
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
@property
|
| 106 |
-
def users_config(
|
| 107 |
-
self
|
| 108 |
-
) -> Dict[str, Any]:
|
| 109 |
-
"""Get users configuration."""
|
| 110 |
-
return self.get("users", {})
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
@property
|
| 114 |
-
def paths(
|
| 115 |
-
self
|
| 116 |
-
) -> Dict[str, str]:
|
| 117 |
-
"""Get file paths configuration."""
|
| 118 |
-
return self.get("paths", {})
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
# Global config instance
|
| 122 |
-
_config = Config()
|
| 123 |
-
|
| 124 |
-
# Global Argilla client instance
|
| 125 |
-
_client = None
|
| 126 |
-
|
| 127 |
-
# Global Hugging Face API instance
|
| 128 |
-
_hf_api = None
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
def get_root(
|
| 132 |
-
) -> Path:
|
| 133 |
-
"""Get the project root directory."""
|
| 134 |
-
return _root
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
def get_config(
|
| 138 |
-
) -> Config:
|
| 139 |
-
"""Get the configuration manager."""
|
| 140 |
-
return _config
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
def get_client(
|
| 144 |
-
) -> rg.Argilla: # type: ignore
|
| 145 |
-
"""Get the Argilla client."""
|
| 146 |
-
global _client
|
| 147 |
-
|
| 148 |
-
if _client is not None:
|
| 149 |
-
return _client
|
| 150 |
-
|
| 151 |
-
if validate_env():
|
| 152 |
-
try:
|
| 153 |
-
_client = rg.Argilla(
|
| 154 |
-
api_url=os.getenv("ARGILLA_API_URL"),
|
| 155 |
-
api_key=os.getenv("ARGILLA_API_KEY"),
|
| 156 |
-
)
|
| 157 |
-
return _client
|
| 158 |
-
|
| 159 |
-
except Exception as e:
|
| 160 |
-
if "ArgillaCredentialsError" in str(e):
|
| 161 |
-
print(
|
| 162 |
-
"\n HINT: Did you wipe/restart the space? If you did, ",
|
| 163 |
-
"you need to update your Argilla API key!\n"
|
| 164 |
-
)
|
| 165 |
-
raise
|
| 166 |
-
|
| 167 |
-
def get_hf_api(
|
| 168 |
-
) -> HfApi: # type: ignore
|
| 169 |
-
"""Get the HuggingFace API client."""
|
| 170 |
-
global _hf_api
|
| 171 |
-
|
| 172 |
-
if _hf_api is not None:
|
| 173 |
-
return _hf_api
|
| 174 |
-
|
| 175 |
-
if validate_env():
|
| 176 |
-
_hf_api = HfApi(token=os.getenv("HF_TOKEN"))
|
| 177 |
-
|
| 178 |
-
return _hf_api
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
def load_users(
|
| 182 |
-
) -> list[Dict[str, str]] | None:
|
| 183 |
-
"""Load users from CSV file specified in config."""
|
| 184 |
-
config = get_config()
|
| 185 |
-
csv_path = config.get("paths.users_csv", "users.csv")
|
| 186 |
-
|
| 187 |
-
full_path = _root / csv_path
|
| 188 |
-
if not full_path.exists():
|
| 189 |
-
raise FileNotFoundError(f"Users CSV file not found: {full_path}")
|
| 190 |
-
|
| 191 |
-
import csv
|
| 192 |
-
|
| 193 |
-
users = []
|
| 194 |
-
with open(full_path, "r", newline="", encoding="utf-8") as csvfile:
|
| 195 |
-
reader = csv.DictReader(csvfile)
|
| 196 |
-
for row in reader:
|
| 197 |
-
user_data = {key.rstrip(): value.rstrip() for key, value in row.items()}
|
| 198 |
-
users.append(user_data)
|
| 199 |
-
|
| 200 |
-
return users
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/user_utils.py
DELETED
|
@@ -1,208 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
user_utils.py
|
| 3 |
-
|
| 4 |
-
Helper functions for user management in the MERe Workshop annotation pipeline.
|
| 5 |
-
Transformed from create-users.py script to follow proper helper function paradigm.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from typing import Dict, List, Optional
|
| 9 |
-
|
| 10 |
-
import argilla as rg
|
| 11 |
-
|
| 12 |
-
from .setup_utils import (
|
| 13 |
-
get_config,
|
| 14 |
-
get_client,
|
| 15 |
-
load_users
|
| 16 |
-
)
|
| 17 |
-
from .log_utils import (
|
| 18 |
-
log_info,
|
| 19 |
-
log_operation_success,
|
| 20 |
-
log_operation_failure,
|
| 21 |
-
log_user_operation
|
| 22 |
-
)
|
| 23 |
-
|
| 24 |
-
# Get config
|
| 25 |
-
_config = get_config()
|
| 26 |
-
|
| 27 |
-
# Get client
|
| 28 |
-
_client = get_client()
|
| 29 |
-
|
| 30 |
-
def create_user(
|
| 31 |
-
user_data: Dict[str, str],
|
| 32 |
-
) -> bool:
|
| 33 |
-
"""Create a single user."""
|
| 34 |
-
global _config
|
| 35 |
-
global _client
|
| 36 |
-
|
| 37 |
-
username = user_data['username']
|
| 38 |
-
|
| 39 |
-
# Check if user already exists
|
| 40 |
-
try:
|
| 41 |
-
for existing_user in _client.users:
|
| 42 |
-
if existing_user.username == username:
|
| 43 |
-
log_user_operation("created", username, f"role: {existing_user.role} (already exists)")
|
| 44 |
-
log_operation_success("create user", f"{username} (already exists)")
|
| 45 |
-
return True
|
| 46 |
-
except Exception:
|
| 47 |
-
# Continue with creation if check fails
|
| 48 |
-
pass
|
| 49 |
-
|
| 50 |
-
try:
|
| 51 |
-
# Create user
|
| 52 |
-
user = rg.User(
|
| 53 |
-
username=username,
|
| 54 |
-
first_name=user_data.get('first_name', ''),
|
| 55 |
-
last_name=user_data.get('last_name', ''),
|
| 56 |
-
role=user_data.get('role', _config.get('users.default_role', 'annotator')),
|
| 57 |
-
password=user_data['password']
|
| 58 |
-
)
|
| 59 |
-
|
| 60 |
-
created_user = user.create()
|
| 61 |
-
log_user_operation("created", username, f"role: {user.role}")
|
| 62 |
-
|
| 63 |
-
log_operation_success("create user", username)
|
| 64 |
-
return True
|
| 65 |
-
|
| 66 |
-
except Exception as e:
|
| 67 |
-
# Check if user already exists
|
| 68 |
-
error_str = str(e).lower()
|
| 69 |
-
if "conflict" in error_str or "already exists" in error_str or "not unique" in error_str:
|
| 70 |
-
log_user_operation("created", username, "role: annotator (already exists)")
|
| 71 |
-
return True
|
| 72 |
-
else:
|
| 73 |
-
log_operation_failure("create user", e)
|
| 74 |
-
return False
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
def create_users(
|
| 78 |
-
users_data: Optional[List[Dict[str, str]]] = None
|
| 79 |
-
) -> bool:
|
| 80 |
-
"""Create all users from the CSV file or provided list."""
|
| 81 |
-
try:
|
| 82 |
-
if users_data is None:
|
| 83 |
-
users_data = load_users()
|
| 84 |
-
|
| 85 |
-
if not users_data:
|
| 86 |
-
log_operation_failure("create users", Exception("No users found"))
|
| 87 |
-
return False
|
| 88 |
-
|
| 89 |
-
# Create each user
|
| 90 |
-
success_count = 0
|
| 91 |
-
for user_data in users_data:
|
| 92 |
-
if create_user(user_data):
|
| 93 |
-
success_count += 1
|
| 94 |
-
|
| 95 |
-
log_operation_success("create users",
|
| 96 |
-
f"Created {success_count}/{len(users_data)} users successfully")
|
| 97 |
-
|
| 98 |
-
return success_count == len(users_data)
|
| 99 |
-
|
| 100 |
-
except Exception as e:
|
| 101 |
-
log_operation_failure("create users", e)
|
| 102 |
-
return False
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
def delete_user(
|
| 106 |
-
username: str,
|
| 107 |
-
skip_admin: bool = True
|
| 108 |
-
) -> bool:
|
| 109 |
-
"""Delete a single user."""
|
| 110 |
-
global _client
|
| 111 |
-
|
| 112 |
-
try:
|
| 113 |
-
# Find and delete user
|
| 114 |
-
users = _client.users
|
| 115 |
-
user_to_delete = None
|
| 116 |
-
user_found = False
|
| 117 |
-
|
| 118 |
-
for user in users:
|
| 119 |
-
if user.username == username:
|
| 120 |
-
user_found = True
|
| 121 |
-
if skip_admin:
|
| 122 |
-
if user.role not in ["owner", "admin"]:
|
| 123 |
-
user_to_delete = user
|
| 124 |
-
break
|
| 125 |
-
else:
|
| 126 |
-
log_info(f"SKIPPED OWNER or ADMIN ({user.username})")
|
| 127 |
-
# Skipping admin/owner is considered success
|
| 128 |
-
return True
|
| 129 |
-
else:
|
| 130 |
-
user_to_delete = user
|
| 131 |
-
break
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
if not user_found:
|
| 135 |
-
log_operation_failure("delete user", Exception(f"User {username} not found"))
|
| 136 |
-
return False
|
| 137 |
-
|
| 138 |
-
if not user_to_delete:
|
| 139 |
-
log_operation_failure("delete user", Exception(f"User {username} could not be deleted"))
|
| 140 |
-
return False
|
| 141 |
-
|
| 142 |
-
# Delete user
|
| 143 |
-
user_to_delete.delete()
|
| 144 |
-
log_user_operation("deleted", username)
|
| 145 |
-
|
| 146 |
-
return True
|
| 147 |
-
|
| 148 |
-
except Exception as e:
|
| 149 |
-
log_operation_failure("delete user", e)
|
| 150 |
-
return False
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
def delete_users(
|
| 154 |
-
usernames: Optional[List[str]] = None
|
| 155 |
-
) -> bool:
|
| 156 |
-
"""Delete all users or specified users."""
|
| 157 |
-
try:
|
| 158 |
-
global _client
|
| 159 |
-
|
| 160 |
-
if usernames is None:
|
| 161 |
-
# Delete all users
|
| 162 |
-
users = _client.users
|
| 163 |
-
usernames = [user.username for user in users if user.username]
|
| 164 |
-
|
| 165 |
-
if not usernames:
|
| 166 |
-
log_operation_success("delete users", "No users to delete")
|
| 167 |
-
return True
|
| 168 |
-
|
| 169 |
-
# Delete each user
|
| 170 |
-
success_count = 0
|
| 171 |
-
for username in usernames:
|
| 172 |
-
if delete_user(username):
|
| 173 |
-
success_count += 1
|
| 174 |
-
|
| 175 |
-
log_operation_success("delete users",
|
| 176 |
-
f"Deleted {success_count}/{len(usernames)} users")
|
| 177 |
-
|
| 178 |
-
return success_count == len(usernames)
|
| 179 |
-
|
| 180 |
-
except Exception as e:
|
| 181 |
-
log_operation_failure("delete users", e)
|
| 182 |
-
return False
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
def list_users(
|
| 186 |
-
) -> List[Dict[str, str]]:
|
| 187 |
-
"""List all users with their details."""
|
| 188 |
-
try:
|
| 189 |
-
global _client
|
| 190 |
-
users = _client.users
|
| 191 |
-
user_list = []
|
| 192 |
-
|
| 193 |
-
for user in users:
|
| 194 |
-
user_info = {
|
| 195 |
-
'username': user.username or '',
|
| 196 |
-
'first_name': user.first_name or '',
|
| 197 |
-
'last_name': user.last_name or '',
|
| 198 |
-
'role': user.role or '',
|
| 199 |
-
'id': str(user.id) if user.id else ''
|
| 200 |
-
}
|
| 201 |
-
user_list.append(user_info)
|
| 202 |
-
|
| 203 |
-
log_user_operation("listed all users", f"Found {len(user_list)} users")
|
| 204 |
-
return user_list
|
| 205 |
-
|
| 206 |
-
except Exception as e:
|
| 207 |
-
log_operation_failure("list users", e)
|
| 208 |
-
return []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/webhook_utils.py
DELETED
|
@@ -1,340 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
webhook_utils.py
|
| 3 |
-
|
| 4 |
-
Helper functions for webhook management in the MERe Workshop annotation pipeline.
|
| 5 |
-
Transformed from create-webhooks.py and related scripts to follow proper helper function paradigm.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
import os
|
| 9 |
-
from typing import List, Optional, Dict
|
| 10 |
-
|
| 11 |
-
import argilla as rg
|
| 12 |
-
|
| 13 |
-
from .setup_utils import (
|
| 14 |
-
get_config,
|
| 15 |
-
get_client
|
| 16 |
-
)
|
| 17 |
-
from .log_utils import (
|
| 18 |
-
log_operation_success,
|
| 19 |
-
log_operation_failure,
|
| 20 |
-
log_webhook_operation
|
| 21 |
-
)
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
# Setup config
|
| 25 |
-
_config = get_config()
|
| 26 |
-
|
| 27 |
-
# Setup client
|
| 28 |
-
_client = get_client()
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
def create_webhook(
|
| 33 |
-
event: str,
|
| 34 |
-
description: str,
|
| 35 |
-
) -> Optional[rg.Webhook]:
|
| 36 |
-
"""Create a webhook for a specific event."""
|
| 37 |
-
|
| 38 |
-
global _client
|
| 39 |
-
|
| 40 |
-
webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
|
| 41 |
-
if not webhook_url:
|
| 42 |
-
log_operation_failure("create webhook",
|
| 43 |
-
Exception(f"ARGILLA_WEBHOOK_URL environment variable not set for {event}"))
|
| 44 |
-
return None
|
| 45 |
-
|
| 46 |
-
try:
|
| 47 |
-
webhook = rg.Webhook(
|
| 48 |
-
url=webhook_url,
|
| 49 |
-
events=[event], # type: ignore
|
| 50 |
-
description=description
|
| 51 |
-
)
|
| 52 |
-
|
| 53 |
-
created_webhook = webhook.create()
|
| 54 |
-
log_webhook_operation("created", event, description)
|
| 55 |
-
return created_webhook #type: ignore
|
| 56 |
-
|
| 57 |
-
except Exception as e:
|
| 58 |
-
log_operation_failure("create webhook", e)
|
| 59 |
-
return None
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
def list_webhook_events(
|
| 63 |
-
) -> List[str]:
|
| 64 |
-
"""Return list of webhook events from configuration."""
|
| 65 |
-
global _config
|
| 66 |
-
return _config.get('webhooks.events', [])
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
def create_webhooks(
|
| 70 |
-
) -> bool:
|
| 71 |
-
"""Create webhooks for all configured events."""
|
| 72 |
-
try:
|
| 73 |
-
global _client
|
| 74 |
-
events = list_webhook_events()
|
| 75 |
-
|
| 76 |
-
if not events:
|
| 77 |
-
log_operation_failure("create webhooks",
|
| 78 |
-
Exception("No webhook events configured"))
|
| 79 |
-
return False
|
| 80 |
-
|
| 81 |
-
webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
|
| 82 |
-
if not webhook_url:
|
| 83 |
-
log_operation_failure("create webhooks",
|
| 84 |
-
Exception("ARGILLA_WEBHOOK_URL environment variable not set"))
|
| 85 |
-
return False
|
| 86 |
-
|
| 87 |
-
# Create webhooks for each event, recreating if they already exist
|
| 88 |
-
success_count = 0
|
| 89 |
-
for event in events:
|
| 90 |
-
# Check if webhook already exists
|
| 91 |
-
if webhook_exists(event):
|
| 92 |
-
log_webhook_operation("already exists", event, "recreating")
|
| 93 |
-
# Delete existing webhook first
|
| 94 |
-
for webhook in _client.webhooks:
|
| 95 |
-
if webhook.events and event in webhook.events:
|
| 96 |
-
webhook.delete()
|
| 97 |
-
log_webhook_operation("deleted existing", event)
|
| 98 |
-
break
|
| 99 |
-
|
| 100 |
-
description = f"Webhook for {event} events to {webhook_url}"
|
| 101 |
-
if create_webhook(event, description):
|
| 102 |
-
success_count += 1
|
| 103 |
-
|
| 104 |
-
log_operation_success("create webhooks",
|
| 105 |
-
f"Created {success_count}/{len(events)} webhooks successfully")
|
| 106 |
-
|
| 107 |
-
return success_count == len(events)
|
| 108 |
-
|
| 109 |
-
except Exception as e:
|
| 110 |
-
log_operation_failure("create webhooks", e)
|
| 111 |
-
return False
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
def list_webhooks(
|
| 115 |
-
) -> List[Dict[str, str]]:
|
| 116 |
-
"""List all existing webhooks."""
|
| 117 |
-
try:
|
| 118 |
-
global _client
|
| 119 |
-
webhooks = _client.webhooks
|
| 120 |
-
webhook_list = []
|
| 121 |
-
|
| 122 |
-
for webhook in webhooks:
|
| 123 |
-
webhook_info = {
|
| 124 |
-
'url': webhook.url or '',
|
| 125 |
-
'events': ', '.join(webhook.events) if webhook.events else '',
|
| 126 |
-
'description': webhook.description or ''
|
| 127 |
-
}
|
| 128 |
-
webhook_list.append(webhook_info)
|
| 129 |
-
|
| 130 |
-
log_webhook_operation("listed all webhooks", f"Found {len(webhook_list)} webhooks")
|
| 131 |
-
return webhook_list
|
| 132 |
-
|
| 133 |
-
except Exception as e:
|
| 134 |
-
log_operation_failure("list webhooks", e)
|
| 135 |
-
return []
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
def delete_webhook(
|
| 139 |
-
webhook_url: str,
|
| 140 |
-
webhook_events: List[str],
|
| 141 |
-
) -> bool:
|
| 142 |
-
"""Delete a specific webhook by URL and events."""
|
| 143 |
-
try:
|
| 144 |
-
global _client
|
| 145 |
-
# Find webhook by URL and events
|
| 146 |
-
webhook_to_delete = None
|
| 147 |
-
for webhook in _client.webhooks:
|
| 148 |
-
if (webhook.url == webhook_url and
|
| 149 |
-
webhook.events and
|
| 150 |
-
set(webhook.events) == set(webhook_events)):
|
| 151 |
-
webhook_to_delete = webhook
|
| 152 |
-
break
|
| 153 |
-
|
| 154 |
-
if not webhook_to_delete:
|
| 155 |
-
log_operation_failure("delete webhook",
|
| 156 |
-
Exception(f"Webhook with URL {webhook_url} and events {webhook_events} not found"))
|
| 157 |
-
return False
|
| 158 |
-
|
| 159 |
-
# Delete webhook
|
| 160 |
-
webhook_to_delete.delete()
|
| 161 |
-
log_webhook_operation("deleted", f"{webhook_url} ({', '.join(webhook_events)})")
|
| 162 |
-
|
| 163 |
-
return True
|
| 164 |
-
|
| 165 |
-
except Exception as e:
|
| 166 |
-
log_operation_failure("delete webhook", e)
|
| 167 |
-
return False
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
def delete_webhooks(
|
| 171 |
-
webhook_specs: Optional[List[Dict[str, str]]] = None
|
| 172 |
-
) -> bool:
|
| 173 |
-
"""Delete all webhooks or specified webhooks."""
|
| 174 |
-
try:
|
| 175 |
-
global _client
|
| 176 |
-
|
| 177 |
-
if webhook_specs is None:
|
| 178 |
-
# Delete all webhooks
|
| 179 |
-
webhooks = _client.webhooks
|
| 180 |
-
webhook_specs = []
|
| 181 |
-
for webhook in webhooks:
|
| 182 |
-
if webhook.url and webhook.events:
|
| 183 |
-
webhook_specs.append({
|
| 184 |
-
'url': webhook.url,
|
| 185 |
-
'events': ','.join(webhook.events)
|
| 186 |
-
})
|
| 187 |
-
|
| 188 |
-
if not webhook_specs:
|
| 189 |
-
log_operation_success("delete webhooks", "No webhooks to delete")
|
| 190 |
-
return True
|
| 191 |
-
|
| 192 |
-
# Delete each webhook
|
| 193 |
-
success_count = 0
|
| 194 |
-
for webhook_spec in webhook_specs:
|
| 195 |
-
webhook_url = webhook_spec.get('url', '')
|
| 196 |
-
webhook_events = webhook_spec.get('events', '').split(',') if webhook_spec.get('events') else []
|
| 197 |
-
|
| 198 |
-
if delete_webhook(webhook_url, webhook_events):
|
| 199 |
-
success_count += 1
|
| 200 |
-
|
| 201 |
-
log_operation_success("delete webhooks",
|
| 202 |
-
f"Deleted {success_count}/{len(webhook_specs)} webhooks")
|
| 203 |
-
|
| 204 |
-
return success_count == len(webhook_specs)
|
| 205 |
-
|
| 206 |
-
except Exception as e:
|
| 207 |
-
log_operation_failure("delete webhooks", e,)
|
| 208 |
-
return False
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
def webhook_exists(
|
| 212 |
-
event: str
|
| 213 |
-
) -> bool:
|
| 214 |
-
"""Check if a webhook already exists for a specific event."""
|
| 215 |
-
try:
|
| 216 |
-
global _client
|
| 217 |
-
webhooks = _client.webhooks
|
| 218 |
-
|
| 219 |
-
for webhook in webhooks:
|
| 220 |
-
if webhook.events and event in webhook.events:
|
| 221 |
-
log_webhook_operation("found existing", event, f"webhook URL: {webhook.url}")
|
| 222 |
-
return True
|
| 223 |
-
|
| 224 |
-
return False
|
| 225 |
-
|
| 226 |
-
except Exception as e:
|
| 227 |
-
log_operation_failure("check webhook exists", e)
|
| 228 |
-
return False
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
def validate_webhooks(
|
| 232 |
-
) -> bool:
|
| 233 |
-
"""Validate that webhook configuration is correct."""
|
| 234 |
-
try:
|
| 235 |
-
# Check if webhook URL is set
|
| 236 |
-
webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
|
| 237 |
-
if not webhook_url:
|
| 238 |
-
log_operation_failure("validate webhook config", Exception("ARGILLA_WEBHOOK_URL environment variable not set"))
|
| 239 |
-
return False
|
| 240 |
-
|
| 241 |
-
# Check if events are configured
|
| 242 |
-
events = list_webhook_events()
|
| 243 |
-
if not events:
|
| 244 |
-
log_operation_failure("validate webhook config", Exception("No webhook events configured"))
|
| 245 |
-
return False
|
| 246 |
-
|
| 247 |
-
# Check if Argilla client can be created
|
| 248 |
-
try:
|
| 249 |
-
get_client()
|
| 250 |
-
except Exception as e:
|
| 251 |
-
log_operation_failure("validate webhook config", Exception(f"Cannot create Argilla client: {str(e)}"))
|
| 252 |
-
return False
|
| 253 |
-
|
| 254 |
-
log_operation_success("validate webhook config", f"Configuration valid for {len(events)} events")
|
| 255 |
-
return True
|
| 256 |
-
|
| 257 |
-
except Exception as e:
|
| 258 |
-
log_operation_failure("validate webhook config", e)
|
| 259 |
-
return False
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
def update_webhook(
|
| 263 |
-
webhook_url: str,
|
| 264 |
-
webhook_events: List[str],
|
| 265 |
-
new_url: Optional[str] = None,
|
| 266 |
-
new_events: Optional[List[str]] = None,
|
| 267 |
-
new_description: Optional[str] = None,
|
| 268 |
-
) -> bool:
|
| 269 |
-
"""Update a webhook's properties by recreating it (since Argilla doesn't support direct updates)."""
|
| 270 |
-
try:
|
| 271 |
-
global _client
|
| 272 |
-
# Find webhook
|
| 273 |
-
webhook = None
|
| 274 |
-
for w in _client.webhooks:
|
| 275 |
-
if (w.url == webhook_url and
|
| 276 |
-
w.events and
|
| 277 |
-
set(w.events) == set(webhook_events)):
|
| 278 |
-
webhook = w
|
| 279 |
-
break
|
| 280 |
-
|
| 281 |
-
if not webhook:
|
| 282 |
-
log_operation_failure("update webhook",
|
| 283 |
-
Exception(f"Webhook with URL {webhook_url} and events {webhook_events} not found"))
|
| 284 |
-
return False
|
| 285 |
-
|
| 286 |
-
# Since Argilla doesn't support direct webhook updates, we need to recreate
|
| 287 |
-
# First delete the existing webhook
|
| 288 |
-
webhook.delete()
|
| 289 |
-
log_webhook_operation("deleted for update", f"{webhook_url} ({', '.join(webhook_events)})")
|
| 290 |
-
|
| 291 |
-
# Create new webhook with updated properties
|
| 292 |
-
final_url = new_url if new_url else webhook_url
|
| 293 |
-
final_events = new_events if new_events else webhook_events
|
| 294 |
-
final_description = new_description if new_description else webhook.description
|
| 295 |
-
|
| 296 |
-
for event in final_events:
|
| 297 |
-
description = final_description or f"Webhook for {event} events to {final_url}"
|
| 298 |
-
new_webhook = rg.Webhook(
|
| 299 |
-
url=final_url,
|
| 300 |
-
events=[event], # type: ignore
|
| 301 |
-
description=description
|
| 302 |
-
)
|
| 303 |
-
new_webhook.create()
|
| 304 |
-
|
| 305 |
-
updates = []
|
| 306 |
-
if new_url:
|
| 307 |
-
updates.append(f"url: {new_url}")
|
| 308 |
-
if new_events:
|
| 309 |
-
updates.append(f"events: {', '.join(new_events)}")
|
| 310 |
-
if new_description:
|
| 311 |
-
updates.append(f"description: {new_description}")
|
| 312 |
-
|
| 313 |
-
log_operation_success("update webhook", f"{webhook_url} - {', '.join(updates)}")
|
| 314 |
-
|
| 315 |
-
return True
|
| 316 |
-
|
| 317 |
-
except Exception as e:
|
| 318 |
-
log_operation_failure("update webhook", e)
|
| 319 |
-
return False
|
| 320 |
-
|
| 321 |
-
|
| 322 |
-
def update_webhooks(
|
| 323 |
-
webhook_updates: List[Dict[str, str]]
|
| 324 |
-
) -> bool:
|
| 325 |
-
"""Update multiple webhooks."""
|
| 326 |
-
success_count = 0
|
| 327 |
-
for update_info in webhook_updates:
|
| 328 |
-
webhook_url = update_info.get('url', '')
|
| 329 |
-
webhook_events = update_info.get('events', '').split(',') if update_info.get('events') else []
|
| 330 |
-
new_url = update_info.get('new_url')
|
| 331 |
-
new_events = update_info.get('new_events', '').split(',') if update_info.get('new_events') else None
|
| 332 |
-
new_description = update_info.get('new_description')
|
| 333 |
-
|
| 334 |
-
if update_webhook(webhook_url, webhook_events, new_url, new_events, new_description):
|
| 335 |
-
success_count += 1
|
| 336 |
-
|
| 337 |
-
log_operation_success("update webhooks",
|
| 338 |
-
f"Updated {success_count}/{len(webhook_updates)} webhooks")
|
| 339 |
-
|
| 340 |
-
return success_count == len(webhook_updates)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/wipe_utils.py
DELETED
|
@@ -1,164 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
wipe_utils.py
|
| 3 |
-
|
| 4 |
-
Helper functions for wiping/cleaning Argilla space in the MERe Workshop annotation pipeline.
|
| 5 |
-
Transformed from wipe-space.py script to follow proper helper function paradigm.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from .setup_utils import get_client
|
| 9 |
-
from .dataset_utils import delete_datasets
|
| 10 |
-
from .user_utils import delete_users
|
| 11 |
-
from .webhook_utils import delete_webhooks
|
| 12 |
-
from .workspace_utils import delete_workspaces
|
| 13 |
-
from .log_utils import (
|
| 14 |
-
log_operation_success,
|
| 15 |
-
log_operation_failure,
|
| 16 |
-
)
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
# Setup client
|
| 20 |
-
_client = get_client()
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
def wipe_space(
|
| 24 |
-
) -> bool:
|
| 25 |
-
"""Completely wipe the Argilla space - datasets, users, workspaces, and webhooks."""
|
| 26 |
-
try:
|
| 27 |
-
# Track success of each operation
|
| 28 |
-
operations = [
|
| 29 |
-
("datasets", delete_datasets),
|
| 30 |
-
("webhooks", delete_webhooks),
|
| 31 |
-
("users", delete_users),
|
| 32 |
-
("workspaces", delete_workspaces)
|
| 33 |
-
]
|
| 34 |
-
|
| 35 |
-
operations_results = {}
|
| 36 |
-
|
| 37 |
-
# Execute each operation and continue even if one fails
|
| 38 |
-
for operation_name, operation_func in operations:
|
| 39 |
-
try:
|
| 40 |
-
success = operation_func()
|
| 41 |
-
operations_results[operation_name] = success
|
| 42 |
-
if success:
|
| 43 |
-
log_operation_success(f"wipe {operation_name}", "Operation completed successfully")
|
| 44 |
-
else:
|
| 45 |
-
log_operation_failure(f"wipe {operation_name}", Exception("Operation completed with some failures"))
|
| 46 |
-
except Exception as e:
|
| 47 |
-
operations_results[operation_name] = False
|
| 48 |
-
log_operation_failure(f"wipe {operation_name}", e)
|
| 49 |
-
|
| 50 |
-
# Calculate summary
|
| 51 |
-
successful_ops = sum(1 for success in operations_results.values() if success)
|
| 52 |
-
total_ops = len(operations_results)
|
| 53 |
-
|
| 54 |
-
if successful_ops == total_ops:
|
| 55 |
-
log_operation_success("wipe entire Argilla space", "All components deleted successfully")
|
| 56 |
-
return True
|
| 57 |
-
else:
|
| 58 |
-
failed_ops = [name for name, success in operations_results.items() if not success]
|
| 59 |
-
log_operation_failure("wipe entire Argilla space",
|
| 60 |
-
Exception(f"{total_ops - successful_ops}/{total_ops} operations failed: {', '.join(failed_ops)}"))
|
| 61 |
-
# Return True if at least some operations succeeded
|
| 62 |
-
return successful_ops > 0
|
| 63 |
-
|
| 64 |
-
except Exception as e:
|
| 65 |
-
log_operation_failure("wipe entire Argilla space", e)
|
| 66 |
-
return False
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
def wipe_datasets_only(
|
| 70 |
-
) -> bool:
|
| 71 |
-
"""Wipe only datasets, keeping users and workspaces."""
|
| 72 |
-
try:
|
| 73 |
-
success = delete_datasets()
|
| 74 |
-
|
| 75 |
-
if success:
|
| 76 |
-
log_operation_success("wipe datasets only", "All datasets deleted successfully")
|
| 77 |
-
else:
|
| 78 |
-
log_operation_failure("wipe datasets only", Exception("Some datasets could not be deleted"))
|
| 79 |
-
|
| 80 |
-
return success
|
| 81 |
-
|
| 82 |
-
except Exception as e:
|
| 83 |
-
log_operation_failure("wipe datasets only", e)
|
| 84 |
-
return False
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
def wipe_users_only(
|
| 88 |
-
) -> bool:
|
| 89 |
-
"""Wipe only users, keeping datasets and workspaces."""
|
| 90 |
-
try:
|
| 91 |
-
success = delete_users()
|
| 92 |
-
|
| 93 |
-
if success:
|
| 94 |
-
log_operation_success("wipe users only", "All users deleted successfully")
|
| 95 |
-
else:
|
| 96 |
-
log_operation_failure("wipe users only", Exception("Some users could not be deleted"))
|
| 97 |
-
|
| 98 |
-
return success
|
| 99 |
-
|
| 100 |
-
except Exception as e:
|
| 101 |
-
log_operation_failure("wipe users only", e)
|
| 102 |
-
return False
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
def wipe_webhooks_only(
|
| 106 |
-
) -> bool:
|
| 107 |
-
"""Wipe only webhooks, keeping everything else."""
|
| 108 |
-
try:
|
| 109 |
-
success = delete_webhooks()
|
| 110 |
-
|
| 111 |
-
if success:
|
| 112 |
-
log_operation_success("wipe webhooks only", "All webhooks deleted successfully")
|
| 113 |
-
else:
|
| 114 |
-
log_operation_failure("wipe webhooks only", Exception("Some webhooks could not be deleted"))
|
| 115 |
-
|
| 116 |
-
return success
|
| 117 |
-
|
| 118 |
-
except Exception as e:
|
| 119 |
-
log_operation_failure("wipe webhooks only", e)
|
| 120 |
-
return False
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
def get_status(
|
| 124 |
-
) -> dict:
|
| 125 |
-
"""Get current status of the Argilla space (counts of datasets, users, etc.)."""
|
| 126 |
-
try:
|
| 127 |
-
global _client
|
| 128 |
-
|
| 129 |
-
# Count datasets across all workspaces
|
| 130 |
-
total_datasets = 0
|
| 131 |
-
total_records = 0
|
| 132 |
-
for workspace in _client.workspaces:
|
| 133 |
-
workspace_datasets = workspace.datasets
|
| 134 |
-
total_datasets += len(workspace_datasets)
|
| 135 |
-
|
| 136 |
-
for dataset in workspace_datasets:
|
| 137 |
-
try:
|
| 138 |
-
records = list(dataset.records)
|
| 139 |
-
total_records += len(records)
|
| 140 |
-
except Exception:
|
| 141 |
-
# Skip if can't access records
|
| 142 |
-
pass
|
| 143 |
-
|
| 144 |
-
status = {
|
| 145 |
-
'workspaces': len(_client.workspaces),
|
| 146 |
-
'users': len(_client.users),
|
| 147 |
-
'datasets': total_datasets,
|
| 148 |
-
'records': total_records,
|
| 149 |
-
'webhooks': len(_client.webhooks)
|
| 150 |
-
}
|
| 151 |
-
|
| 152 |
-
log_operation_success("get space status", f"Status retrieved: {status}")
|
| 153 |
-
return status
|
| 154 |
-
|
| 155 |
-
except Exception as e:
|
| 156 |
-
log_operation_failure("get space status", e)
|
| 157 |
-
return {
|
| 158 |
-
'workspaces': 0,
|
| 159 |
-
'users': 0,
|
| 160 |
-
'datasets': 0,
|
| 161 |
-
'records': 0,
|
| 162 |
-
'webhooks': 0,
|
| 163 |
-
'error': str(e)
|
| 164 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/workspace_utils.py
DELETED
|
@@ -1,387 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
workspace_utils.py
|
| 3 |
-
|
| 4 |
-
Helper functions for workspace management in the MERe Workshop annotation pipeline.
|
| 5 |
-
Handles workspace creation, deletion, user assignment, and management operations.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from typing import Dict, List, Optional
|
| 9 |
-
import warnings
|
| 10 |
-
|
| 11 |
-
import argilla as rg
|
| 12 |
-
|
| 13 |
-
from .setup_utils import (
|
| 14 |
-
get_client,
|
| 15 |
-
load_users
|
| 16 |
-
)
|
| 17 |
-
from .log_utils import (
|
| 18 |
-
log_operation_success,
|
| 19 |
-
log_operation_failure,
|
| 20 |
-
log_user_operation
|
| 21 |
-
)
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
# Setup client
|
| 25 |
-
_client = get_client()
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
def create_workspace(
|
| 29 |
-
workspace_name: str,
|
| 30 |
-
) -> bool:
|
| 31 |
-
"""Create a single workspace."""
|
| 32 |
-
global _client
|
| 33 |
-
|
| 34 |
-
# Check if workspace already exists
|
| 35 |
-
try:
|
| 36 |
-
with warnings.catch_warnings():
|
| 37 |
-
warnings.simplefilter("ignore")
|
| 38 |
-
existing_workspace = _client.workspaces(workspace_name)
|
| 39 |
-
if existing_workspace:
|
| 40 |
-
log_operation_success("create workspace", f"{workspace_name} (already exists)")
|
| 41 |
-
return True
|
| 42 |
-
except Exception:
|
| 43 |
-
# Workspace doesn't exist, continue with creation
|
| 44 |
-
pass
|
| 45 |
-
|
| 46 |
-
try:
|
| 47 |
-
workspace = rg.Workspace(name=workspace_name)
|
| 48 |
-
workspace.create()
|
| 49 |
-
log_operation_success("create workspace", workspace_name)
|
| 50 |
-
return True
|
| 51 |
-
|
| 52 |
-
except Exception as e:
|
| 53 |
-
# Check if workspace already exists
|
| 54 |
-
error_str = str(e).lower()
|
| 55 |
-
if "conflict" in error_str or "already exists" in error_str or "not unique" in error_str:
|
| 56 |
-
log_operation_success("create workspace", f"{workspace_name} (already exists)")
|
| 57 |
-
return True
|
| 58 |
-
else:
|
| 59 |
-
log_operation_failure("create workspace", e)
|
| 60 |
-
return False
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
def create_workspaces(
|
| 64 |
-
workspace_names: List[str]
|
| 65 |
-
) -> bool:
|
| 66 |
-
"""Create multiple workspaces from a list of workspace names."""
|
| 67 |
-
global _client
|
| 68 |
-
|
| 69 |
-
success_count = 0
|
| 70 |
-
for workspace_name in workspace_names:
|
| 71 |
-
if create_workspace(workspace_name):
|
| 72 |
-
success_count += 1
|
| 73 |
-
|
| 74 |
-
log_operation_success("create workspaces",
|
| 75 |
-
f"Created {success_count}/{len(workspace_names)} workspaces")
|
| 76 |
-
|
| 77 |
-
return success_count == len(workspace_names)
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
def create_user_workspace(
|
| 81 |
-
username: str,
|
| 82 |
-
workspace_name: str
|
| 83 |
-
) -> bool:
|
| 84 |
-
"""Add a user to a specific workspace."""
|
| 85 |
-
global _client
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
try:
|
| 89 |
-
# Find user
|
| 90 |
-
user = None
|
| 91 |
-
for u in _client.users:
|
| 92 |
-
if u.username == username:
|
| 93 |
-
user = u
|
| 94 |
-
break
|
| 95 |
-
|
| 96 |
-
if not user:
|
| 97 |
-
log_operation_failure("add user to workspace", Exception(f"User {username} not found"))
|
| 98 |
-
return False
|
| 99 |
-
|
| 100 |
-
# Find workspace
|
| 101 |
-
with warnings.catch_warnings():
|
| 102 |
-
warnings.simplefilter("ignore")
|
| 103 |
-
workspace = _client.workspaces(workspace_name)
|
| 104 |
-
if not workspace:
|
| 105 |
-
log_operation_failure("add user to workspace", Exception(f"Workspace {workspace_name} not found"))
|
| 106 |
-
return False
|
| 107 |
-
|
| 108 |
-
# Check if user is already in workspace
|
| 109 |
-
try:
|
| 110 |
-
workspace_users = list(workspace.users)
|
| 111 |
-
for existing_user in workspace_users:
|
| 112 |
-
if existing_user.username == username:
|
| 113 |
-
log_user_operation("added to workspace", username, f"{workspace_name} (already assigned)")
|
| 114 |
-
return True
|
| 115 |
-
except Exception:
|
| 116 |
-
# Continue if check fails
|
| 117 |
-
pass
|
| 118 |
-
|
| 119 |
-
# Add user to workspace
|
| 120 |
-
workspace.add_user(user) #type: ignore
|
| 121 |
-
log_user_operation("added to workspace", username, workspace_name)
|
| 122 |
-
|
| 123 |
-
return True
|
| 124 |
-
|
| 125 |
-
except Exception as e:
|
| 126 |
-
# Check if user already in workspace
|
| 127 |
-
error_str = str(e).lower()
|
| 128 |
-
if "conflict" in error_str or "already" in error_str:
|
| 129 |
-
log_user_operation("added to workspace", username, f"{workspace_name} (already assigned)")
|
| 130 |
-
return True
|
| 131 |
-
else:
|
| 132 |
-
log_operation_failure("add user to workspace", e)
|
| 133 |
-
return False
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
def create_user_workspaces(
|
| 137 |
-
user_workspace_map: Optional[Dict[str, List[str]]] = None
|
| 138 |
-
) -> bool:
|
| 139 |
-
"""Create workspaces for users based on mapping or CSV data."""
|
| 140 |
-
|
| 141 |
-
if user_workspace_map is None:
|
| 142 |
-
# Load from CSV and create user workspaces based on usernames
|
| 143 |
-
users = load_users()
|
| 144 |
-
if not users:
|
| 145 |
-
log_operation_failure("create user workspaces", Exception("No users found in CSV"))
|
| 146 |
-
return False
|
| 147 |
-
|
| 148 |
-
success_count = 0
|
| 149 |
-
total_count = 0
|
| 150 |
-
|
| 151 |
-
for user_data in users:
|
| 152 |
-
username = user_data['username']
|
| 153 |
-
# Create workspace with username as workspace name
|
| 154 |
-
total_count += 1
|
| 155 |
-
if create_workspace(username):
|
| 156 |
-
# Add user to their workspace
|
| 157 |
-
if create_user_workspace(username, username):
|
| 158 |
-
success_count += 1
|
| 159 |
-
|
| 160 |
-
log_operation_success("create user workspaces from CSV",
|
| 161 |
-
f"Created {success_count}/{total_count} user workspaces")
|
| 162 |
-
|
| 163 |
-
return success_count == total_count
|
| 164 |
-
else:
|
| 165 |
-
# Use provided mapping
|
| 166 |
-
success_count = 0
|
| 167 |
-
total_count = 0
|
| 168 |
-
|
| 169 |
-
for username, workspace_names in user_workspace_map.items():
|
| 170 |
-
for workspace_name in workspace_names:
|
| 171 |
-
total_count += 1
|
| 172 |
-
if create_user_workspace(username, workspace_name):
|
| 173 |
-
success_count += 1
|
| 174 |
-
|
| 175 |
-
log_operation_success("create user workspaces from mapping",
|
| 176 |
-
f"Added users to {success_count}/{total_count} workspaces")
|
| 177 |
-
|
| 178 |
-
return success_count == total_count
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
def delete_workspace(
|
| 182 |
-
workspace_name: str, client: Optional[rg.Argilla] = None
|
| 183 |
-
) -> bool:
|
| 184 |
-
"""Delete a single workspace."""
|
| 185 |
-
global _client
|
| 186 |
-
|
| 187 |
-
try:
|
| 188 |
-
with warnings.catch_warnings():
|
| 189 |
-
warnings.simplefilter("ignore")
|
| 190 |
-
workspace = _client.workspaces(workspace_name)
|
| 191 |
-
if not workspace:
|
| 192 |
-
log_operation_failure("delete workspace", Exception(f"Workspace {workspace_name} not found"))
|
| 193 |
-
return False
|
| 194 |
-
|
| 195 |
-
# Check for remaining datasets first
|
| 196 |
-
try:
|
| 197 |
-
datasets = list(workspace.datasets)
|
| 198 |
-
if datasets:
|
| 199 |
-
dataset_names = [ds.name for ds in datasets if ds.name]
|
| 200 |
-
log_operation_failure("delete workspace",
|
| 201 |
-
Exception(f"Workspace {workspace_name} still has datasets: {', '.join(dataset_names)}. Delete datasets first."))
|
| 202 |
-
return False
|
| 203 |
-
except Exception as e:
|
| 204 |
-
# If we can't check datasets, try to continue
|
| 205 |
-
log_operation_failure("check workspace datasets", e)
|
| 206 |
-
|
| 207 |
-
# Remove all users from workspace first
|
| 208 |
-
try:
|
| 209 |
-
workspace_users = list(workspace.users)
|
| 210 |
-
for user in workspace_users:
|
| 211 |
-
try:
|
| 212 |
-
workspace.remove_user(user)
|
| 213 |
-
log_user_operation("removed from workspace", user.username or f"User-{user.id}", workspace_name)
|
| 214 |
-
except Exception as e:
|
| 215 |
-
log_operation_failure("remove user from workspace", e)
|
| 216 |
-
except Exception as e:
|
| 217 |
-
# Continue if user removal fails
|
| 218 |
-
log_operation_failure("remove users from workspace", e)
|
| 219 |
-
|
| 220 |
-
# Delete the workspace
|
| 221 |
-
workspace.delete()
|
| 222 |
-
log_operation_success("delete workspace", workspace_name)
|
| 223 |
-
|
| 224 |
-
return True
|
| 225 |
-
|
| 226 |
-
except Exception as e:
|
| 227 |
-
# Check if it's a dependency error
|
| 228 |
-
error_str = str(e).lower()
|
| 229 |
-
if "has some datasets linked" in error_str or "dependency" in error_str:
|
| 230 |
-
log_operation_failure("delete workspace",
|
| 231 |
-
Exception(f"Workspace {workspace_name} cannot be deleted due to remaining dependencies"))
|
| 232 |
-
else:
|
| 233 |
-
log_operation_failure("delete workspace", e)
|
| 234 |
-
return False
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
def delete_workspaces(
|
| 238 |
-
workspace_names: Optional[List[str]] = None
|
| 239 |
-
) -> bool:
|
| 240 |
-
"""Delete multiple workspaces or all workspaces if none specified."""
|
| 241 |
-
global _client
|
| 242 |
-
if workspace_names is None:
|
| 243 |
-
# Delete all workspaces
|
| 244 |
-
workspaces = _client.workspaces
|
| 245 |
-
workspace_names = [ws.name for ws in workspaces if ws.name]
|
| 246 |
-
|
| 247 |
-
success_count = 0
|
| 248 |
-
for workspace_name in workspace_names:
|
| 249 |
-
if delete_workspace(workspace_name):
|
| 250 |
-
success_count += 1
|
| 251 |
-
|
| 252 |
-
log_operation_success("delete workspaces",
|
| 253 |
-
f"Deleted {success_count}/{len(workspace_names)} workspaces")
|
| 254 |
-
|
| 255 |
-
return success_count == len(workspace_names)
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
def delete_user_workspace(
|
| 259 |
-
username: str,
|
| 260 |
-
workspace_name: str,
|
| 261 |
-
delete_if_empty: bool = True
|
| 262 |
-
) -> bool:
|
| 263 |
-
"""Remove a user from a workspace and optionally delete workspace if empty."""
|
| 264 |
-
global _client
|
| 265 |
-
|
| 266 |
-
try:
|
| 267 |
-
# Find user
|
| 268 |
-
user = None
|
| 269 |
-
for u in _client.users:
|
| 270 |
-
if u.username == username:
|
| 271 |
-
user = u
|
| 272 |
-
break
|
| 273 |
-
|
| 274 |
-
if not user:
|
| 275 |
-
log_operation_failure("remove user from workspace", Exception(f"User {username} not found"))
|
| 276 |
-
return False
|
| 277 |
-
|
| 278 |
-
# Find workspace
|
| 279 |
-
with warnings.catch_warnings():
|
| 280 |
-
warnings.simplefilter("ignore")
|
| 281 |
-
workspace = _client.workspaces(workspace_name)
|
| 282 |
-
if not workspace:
|
| 283 |
-
log_operation_failure("remove user from workspace", Exception(f"Workspace {workspace_name} not found"))
|
| 284 |
-
return False
|
| 285 |
-
|
| 286 |
-
# Remove user from workspace
|
| 287 |
-
workspace.remove_user(user)
|
| 288 |
-
log_user_operation("removed from workspace", username, workspace_name)
|
| 289 |
-
|
| 290 |
-
# Check if workspace is empty and delete if requested
|
| 291 |
-
if delete_if_empty:
|
| 292 |
-
remaining_users = workspace.users
|
| 293 |
-
if not remaining_users:
|
| 294 |
-
workspace.delete()
|
| 295 |
-
log_operation_success("delete empty workspace", workspace_name)
|
| 296 |
-
else:
|
| 297 |
-
log_operation_success("workspace not empty", f"{workspace_name} still has {len(remaining_users)} users")
|
| 298 |
-
|
| 299 |
-
return True
|
| 300 |
-
|
| 301 |
-
except Exception as e:
|
| 302 |
-
log_operation_failure("remove user from workspace", e)
|
| 303 |
-
return False
|
| 304 |
-
|
| 305 |
-
|
| 306 |
-
def delete_user_workspaces(usernames: List[str]) -> bool:
|
| 307 |
-
"""Remove users from all their workspaces and delete empty workspaces."""
|
| 308 |
-
|
| 309 |
-
success_count = 0
|
| 310 |
-
for username in usernames:
|
| 311 |
-
user_workspaces = list_user_workspaces(username)
|
| 312 |
-
user_success = True
|
| 313 |
-
|
| 314 |
-
for workspace_name in user_workspaces:
|
| 315 |
-
if not delete_user_workspace(username, workspace_name, delete_if_empty=True):
|
| 316 |
-
user_success = False
|
| 317 |
-
|
| 318 |
-
if user_success:
|
| 319 |
-
success_count += 1
|
| 320 |
-
|
| 321 |
-
log_operation_success("delete user workspaces",
|
| 322 |
-
f"Processed {success_count}/{len(usernames)} users")
|
| 323 |
-
|
| 324 |
-
return success_count == len(usernames)
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
def list_workspaces(
|
| 328 |
-
) -> List[Dict[str, str]]:
|
| 329 |
-
"""List all workspaces with their details."""
|
| 330 |
-
global _client
|
| 331 |
-
|
| 332 |
-
try:
|
| 333 |
-
workspaces = _client.workspaces
|
| 334 |
-
workspace_list = []
|
| 335 |
-
|
| 336 |
-
for workspace in workspaces:
|
| 337 |
-
workspace_info = {
|
| 338 |
-
'name': workspace.name or '',
|
| 339 |
-
'id': str(workspace.id) if workspace.id else '',
|
| 340 |
-
'user_count': str(len(workspace.users))
|
| 341 |
-
}
|
| 342 |
-
workspace_list.append(workspace_info)
|
| 343 |
-
|
| 344 |
-
log_operation_success("list workspaces", f"Found {len(workspace_list)} workspaces")
|
| 345 |
-
return workspace_list
|
| 346 |
-
|
| 347 |
-
except Exception as e:
|
| 348 |
-
log_operation_failure("list workspaces", e)
|
| 349 |
-
return []
|
| 350 |
-
|
| 351 |
-
|
| 352 |
-
def list_user_workspaces(
|
| 353 |
-
username: str,
|
| 354 |
-
) -> List[str]:
|
| 355 |
-
"""Get list of workspaces a user has access to."""
|
| 356 |
-
global _client
|
| 357 |
-
|
| 358 |
-
try:
|
| 359 |
-
# Find user
|
| 360 |
-
user = None
|
| 361 |
-
for u in _client.users:
|
| 362 |
-
if u.username == username:
|
| 363 |
-
user = u
|
| 364 |
-
break
|
| 365 |
-
|
| 366 |
-
if not user:
|
| 367 |
-
log_operation_failure("get user workspaces", Exception(f"User {username} not found"))
|
| 368 |
-
return []
|
| 369 |
-
|
| 370 |
-
# Get workspaces the user has access to
|
| 371 |
-
workspaces = []
|
| 372 |
-
for workspace in _client.workspaces:
|
| 373 |
-
try:
|
| 374 |
-
# Check if user has access to workspace
|
| 375 |
-
workspace_users = workspace.users
|
| 376 |
-
if any(wu.id == user.id for wu in workspace_users):
|
| 377 |
-
workspaces.append(workspace.name or '')
|
| 378 |
-
except Exception:
|
| 379 |
-
# Skip workspaces we can't access
|
| 380 |
-
continue
|
| 381 |
-
|
| 382 |
-
log_user_operation("listed workspaces", username, f"Found {len(workspaces)} workspaces")
|
| 383 |
-
return workspaces
|
| 384 |
-
|
| 385 |
-
except Exception as e:
|
| 386 |
-
log_operation_failure("get user workspaces", e)
|
| 387 |
-
return []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
wipe.py
DELETED
|
@@ -1,165 +0,0 @@
|
|
| 1 |
-
#!/usr/bin/env python3
|
| 2 |
-
|
| 3 |
-
"""
|
| 4 |
-
wipe.py
|
| 5 |
-
|
| 6 |
-
Clean wipe script for the MERe Workshop annotation pipeline.
|
| 7 |
-
Removes users, workspaces, datasets, and webhooks using modular helper functions.
|
| 8 |
-
"""
|
| 9 |
-
|
| 10 |
-
import sys
|
| 11 |
-
import argparse
|
| 12 |
-
from pathlib import Path
|
| 13 |
-
|
| 14 |
-
from utils import (
|
| 15 |
-
validate_env,
|
| 16 |
-
log_operation_success,
|
| 17 |
-
log_operation_failure,
|
| 18 |
-
wipe_space,
|
| 19 |
-
wipe_datasets_only,
|
| 20 |
-
wipe_users_only,
|
| 21 |
-
wipe_webhooks_only,
|
| 22 |
-
get_status,
|
| 23 |
-
log_info,
|
| 24 |
-
log_warning
|
| 25 |
-
)
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
def parse_args():
|
| 29 |
-
"""Parse command line arguments."""
|
| 30 |
-
parser = argparse.ArgumentParser(
|
| 31 |
-
description="Wipe MERe Workshop Argilla space",
|
| 32 |
-
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
|
| 33 |
-
)
|
| 34 |
-
|
| 35 |
-
parser.add_argument(
|
| 36 |
-
"-d", "--datasets-only",
|
| 37 |
-
action="store_true",
|
| 38 |
-
help="Only wipe datasets, keep users and workspaces",
|
| 39 |
-
)
|
| 40 |
-
|
| 41 |
-
parser.add_argument(
|
| 42 |
-
"-u", "--users-only",
|
| 43 |
-
action="store_true",
|
| 44 |
-
help="Only wipe users, keep datasets and workspaces",
|
| 45 |
-
)
|
| 46 |
-
|
| 47 |
-
parser.add_argument(
|
| 48 |
-
"-w", "--webhooks-only",
|
| 49 |
-
action="store_true",
|
| 50 |
-
help="Only wipe webhooks, keep everything else",
|
| 51 |
-
)
|
| 52 |
-
|
| 53 |
-
parser.add_argument(
|
| 54 |
-
"-s", "--status-only",
|
| 55 |
-
action="store_true",
|
| 56 |
-
help="Only show current space status, do not perform wipe",
|
| 57 |
-
)
|
| 58 |
-
|
| 59 |
-
parser.add_argument("--force", action="store_true", help="Skip confirmation prompt")
|
| 60 |
-
|
| 61 |
-
return parser.parse_args()
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
def show_space_status():
|
| 65 |
-
"""Display current space status."""
|
| 66 |
-
status = get_status()
|
| 67 |
-
|
| 68 |
-
if "error" in status:
|
| 69 |
-
log_operation_failure("check space status", status["error"])
|
| 70 |
-
return False
|
| 71 |
-
|
| 72 |
-
print()
|
| 73 |
-
log_info("=== Current Argilla Space Status ===")
|
| 74 |
-
log_info(f"Workspaces: {status['workspaces']}")
|
| 75 |
-
log_info(f"Users: {status['users']}")
|
| 76 |
-
log_info(f"Datasets: {status['datasets']}")
|
| 77 |
-
log_info(f"Records: {status['records']}")
|
| 78 |
-
log_info(f"Webhooks: {status['webhooks']}")
|
| 79 |
-
print()
|
| 80 |
-
|
| 81 |
-
return True
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
def confirm_wipe(
|
| 85 |
-
operation_description: str,
|
| 86 |
-
force: bool = False
|
| 87 |
-
) -> bool:
|
| 88 |
-
"""Confirm wipe operation with user."""
|
| 89 |
-
if force:
|
| 90 |
-
return True
|
| 91 |
-
|
| 92 |
-
log_warning(f"WARNING: This will {operation_description}")
|
| 93 |
-
log_warning("This action cannot be undone!")
|
| 94 |
-
|
| 95 |
-
log_warning("Are you sure you want to proceed? [y/N]:")
|
| 96 |
-
response = input().strip().lower()
|
| 97 |
-
return response in ["y", "yes"]
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
def main():
|
| 101 |
-
"""Main wipe function."""
|
| 102 |
-
args = parse_args()
|
| 103 |
-
|
| 104 |
-
# Validate environment
|
| 105 |
-
try:
|
| 106 |
-
validate_env()
|
| 107 |
-
log_operation_success("wipe validation", "Environment validated")
|
| 108 |
-
except Exception as e:
|
| 109 |
-
log_operation_failure("wipe validation", e)
|
| 110 |
-
return 1
|
| 111 |
-
|
| 112 |
-
# Show current status
|
| 113 |
-
if not show_space_status():
|
| 114 |
-
return 1
|
| 115 |
-
|
| 116 |
-
# If status-only mode, exit here
|
| 117 |
-
if args.status_only:
|
| 118 |
-
return 0
|
| 119 |
-
|
| 120 |
-
# Determine operation and confirmation message
|
| 121 |
-
if args.datasets_only:
|
| 122 |
-
operation = "datasets only"
|
| 123 |
-
confirmation_msg = "delete ALL DATASETS (keeping users and workspaces)"
|
| 124 |
-
wipe_function = wipe_datasets_only
|
| 125 |
-
elif args.users_only:
|
| 126 |
-
operation = "users only"
|
| 127 |
-
confirmation_msg = (
|
| 128 |
-
"delete ALL USERS (keeping datasets, workspaces, and webhooks)"
|
| 129 |
-
)
|
| 130 |
-
wipe_function = wipe_users_only
|
| 131 |
-
elif args.webhooks_only:
|
| 132 |
-
operation = "webhooks only"
|
| 133 |
-
confirmation_msg = "delete ALL WEBHOOKS (keeping users and datasets)"
|
| 134 |
-
wipe_function = wipe_webhooks_only
|
| 135 |
-
else:
|
| 136 |
-
operation = "entire space"
|
| 137 |
-
confirmation_msg = "DELETE EVERYTHING (users, workspaces, datasets, webhooks)"
|
| 138 |
-
wipe_function = wipe_space
|
| 139 |
-
|
| 140 |
-
# Confirm operation
|
| 141 |
-
if not confirm_wipe(confirmation_msg, args.force):
|
| 142 |
-
log_info("Wipe operation cancelled")
|
| 143 |
-
return 0
|
| 144 |
-
|
| 145 |
-
# Perform wipe operation
|
| 146 |
-
print()
|
| 147 |
-
log_info(f"Wiping {operation}...")
|
| 148 |
-
success = wipe_function()
|
| 149 |
-
|
| 150 |
-
if success:
|
| 151 |
-
log_operation_success(f"wipe {operation}", "Operation completed successfully")
|
| 152 |
-
else:
|
| 153 |
-
log_operation_failure(f"wipe {operation}", Exception("Operation failed"))
|
| 154 |
-
return 1
|
| 155 |
-
|
| 156 |
-
# Show final status
|
| 157 |
-
if not show_space_status():
|
| 158 |
-
return 1
|
| 159 |
-
|
| 160 |
-
log_operation_success("Wipe operation completed", send_to_slack=True)
|
| 161 |
-
return 0
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
if __name__ == "__main__":
|
| 165 |
-
exit(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
wipe.sh
DELETED
|
@@ -1,16 +0,0 @@
|
|
| 1 |
-
#!/bin/bash
|
| 2 |
-
|
| 3 |
-
# wipe.sh
|
| 4 |
-
#
|
| 5 |
-
# Shell wrapper for the MERe Workshop wipe process.
|
| 6 |
-
|
| 7 |
-
set -euo pipefail
|
| 8 |
-
|
| 9 |
-
# Get the directory where this script is located
|
| 10 |
-
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
|
| 11 |
-
|
| 12 |
-
# Change to the script directory
|
| 13 |
-
cd "$SCRIPT_DIR"
|
| 14 |
-
|
| 15 |
-
# Run the wipe script
|
| 16 |
-
python wipe.py "$@"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|