| | --- |
| | title: Dadc |
| | emoji: 🏢 |
| | colorFrom: red |
| | colorTo: gray |
| | sdk: gradio |
| | sdk_version: 3.0.17 |
| | app_file: app.py |
| | pinned: false |
| | license: bigscience-bloom-rail-1.0 |
| | --- |
| | |
| | A basic example of dynamic adversarial data collection with a Gradio app. |
| |
|
| | **Instructions for someone to use for their own project:** |
| |
|
| | *Setting up the Space* |
| | 1. Clone this repo and deploy it on your own Hugging Face space. |
| | 2. Add the following secrets to your space: |
| | - `HF_TOKEN`: One of your Hugging Face tokens. |
| | - `DATASET_REPO_URL`: The url to an empty dataset that you created the hub. It |
| | can be a private or public dataset. |
| | - `FORCE_PUSH`: "yes" |
| | When you run this space on mturk and when people visit your space on |
| | huggingface.co, the app will use your token to automatically store new HITs |
| | in your dataset. Setting `FORCE_PUSH` to "yes" ensures that your repo will |
| | force push changes to the dataset during data collection. Otherwise, |
| | accidental manual changes to your dataset could result in your space gettin |
| | merge conflicts as it automatically tries to push the dataset to the hub. For |
| | local development, add these three keys to a `.env` file, and consider setting |
| | `FORCE_PUSH` to "no". |
| | |
| | *Running Data Collection* |
| | 1. On your local repo that you pulled, create a copy of `config.py.example`, |
| | just called `config.py`. Now, put keys from your AWS account in `config.py`. |
| | These keys should be for an AWS account that has the |
| | AmazonMechanicalTurkFullAccess permission. You also need to |
| | create an mturk requestor account associated with your AWS account. |
| | 2. Run `python collect.py` locally. |
| |
|
| | *Profit* |
| | Now, you should be watching hits come into your Hugging Face dataset |
| | automatically! |
| |
|
| | *Tips and Tricks* |
| | - Use caution while doing local development of your space and |
| | simultaneously running it on mturk. Consider setting `FORCE_PUSH` to "no" in |
| | your local `.env` file. |
| | - huggingface spaces have limited computational resources and memory. If you |
| | run too many HITs and/or assignments at once, then you could encounter issues. |
| | You could also encounter issues if you are trying to create a dataset that is |
| | very large. Check the log of your space for any errors that could be happening. |
| |
|
| |
|