| # Download Dataset | |
| ## Overview | |
| Python script to download a dataset from Hugging Face Hub and extract zip files using `huggingface_hub`. | |
| ## Accessing the Anime Images Dataset | |
| To download the Anime Images dataset, please contact me through the Issue tab on GitHub: [https://github.com/danhtran2mind/Anime-Super-Resolution/issues](https://github.com/danhtran2mind/Anime-Super-Resolution/issues). | |
| Once you reach out, I will provide: | |
| - A direct link to the dataset. | |
| - Access permissions for the dataset. | |
| - Detailed instructions for downloading. | |
| To download the dataset, use the following command after receiving the necessary credentials: | |
| ```bash | |
| python scripts/download_datasets.py \ | |
| --dataset_id "<huggingface_dataset_id>" \ | |
| --huggingface_token "<your_huggingface_token>" | |
| ``` | |
| **Notes**: | |
| - Replace <huggingface_dataset_id> with the dataset ID provided. | |
| - Replace <your_huggingface_token> with the Hugging Face token I share with you. | |
| - Ensure you have the required dependencies installed (e.g., Python, Hugging Face CLI). | |
| - For any issues, refer to the GitHub repository or contact me via the Issue tab. | |
| ## Prerequisites | |
| - Python 3.10+ | |
| - Install: `pip install huggingface_hub` | |
| - Optional: Hugging Face API token for private datasets | |
| ## Usage | |
| ```bash | |
| python download_dataset.py --dataset_id <dataset_id> [--huggingface_token <token>] [--output_dir <directory>] | |
| ``` | |
| ### Arguments | |
| | Argument | Type | Required | Description | | |
| |---------------------|--------|----------|-------------------------------------------------------| | |
| | `--dataset_id` | String | Yes | Dataset ID (e.g., `ejhf743b/anime-images`) | | |
| | `--huggingface_token`| String | No | API token for private datasets | | |
| | `--output_dir` | String | No | Save directory (default: `./data`) | | |
| ### Example | |
| ```bash | |
| python download_dataset.py --dataset_id ejhf743b/anime-images --output_dir ./my_datasets | |
| ``` | |
| ## Functionality | |
| 1. Initializes Hugging Face API client. | |
| 2. Creates output directory if needed. | |
| 3. Downloads dataset to `output_dir` using `snapshot_download`. | |
| 4. Extracts `.zip` files to `<zip_filename>-raw` subdirectories and deletes zips. | |
| 5. Prints extraction status or errors. | |
| ## Notes | |
| - Use `HF_TOKEN` env variable instead of `--huggingface_token` if preferred. | |
| - Handles only `.zip` files. | |
| - Errors during extraction are logged but do not stop the script. | |
| ## Example Output | |
| ```bash | |
| Extracted ./data/dataset.zip to ./data/dataset-raw | |
| Removed ./data/dataset.zip | |
| ``` | |
| ## License | |
| Provided as-is. Check dataset license on Hugging Face Hub. |