| # Dataset Upload & Download - Implementation Complete | |
| Dataset upload and download functionality has been implemented for ARKit datasets. | |
| ## β Implemented Features | |
| ### 1. Dataset Upload (`ylff/utils/dataset_upload.py`) | |
| **Functions:** | |
| - β `validate_arkit_zip()` - Validate zip file contains valid ARKit video-metadata pairs | |
| - β `extract_arkit_zip()` - Extract and organize ARKit zip file into sequence directories | |
| - β `process_uploaded_dataset()` - Complete upload processing pipeline | |
| **Features:** | |
| - Validates zip file format | |
| - Checks for matching video-metadata pairs (same base name) | |
| - Validates JSON metadata format | |
| - Organizes files into sequence directories | |
| - Reports validation errors and statistics | |
| ### 2. Dataset Download (`ylff/utils/dataset_download.py`) | |
| **S3DatasetDownloader Class:** | |
| - β S3 client initialization with credentials | |
| - β `list_datasets()` - List available datasets in S3 bucket | |
| - β `download_dataset()` - Download dataset from S3 with progress | |
| - β `download_and_extract()` - Download and extract dataset | |
| **Features:** | |
| - AWS credentials support (access key or credentials chain) | |
| - Progress bar for downloads | |
| - Automatic extraction (zip, tar.gz, tar) | |
| - Error handling and reporting | |
| ## π API Endpoints | |
| ### `/api/v1/dataset/upload` (POST) | |
| **Request**: Multipart form data | |
| - `file`: Zip file containing ARKit video and metadata pairs | |
| - `output_dir`: Directory to extract dataset (default: "data/uploaded_datasets") | |
| - `validate`: Validate ARKit pairs before extraction (default: true) | |
| **Response**: `JobResponse` (async job) | |
| **Example:** | |
| ```bash | |
| curl -X POST "http://localhost:8000/api/v1/dataset/upload" \ | |
| -F "file=@arkit_dataset.zip" \ | |
| -F "output_dir=data/uploaded_datasets" \ | |
| -F "validate=true" | |
| ``` | |
| ### `/api/v1/dataset/download` (POST) | |
| **Request Model**: `DownloadDatasetRequest` | |
| ```json | |
| { | |
| "bucket_name": "my-datasets-bucket", | |
| "s3_key": "datasets/arkit_sequences.zip", | |
| "output_dir": "data/downloaded_datasets", | |
| "extract": true, | |
| "aws_access_key_id": null, | |
| "aws_secret_access_key": null, | |
| "region_name": "us-east-1" | |
| } | |
| ``` | |
| **Response**: `DownloadDatasetResponse` | |
| - `success`: Boolean | |
| - `output_path`: Path to downloaded file (if not extracted) | |
| - `output_dir`: Directory where dataset was extracted (if extracted) | |
| - `file_size`: Size of downloaded file in bytes | |
| - `error`: Error message if download failed | |
| ## π§ CLI Commands | |
| ### `ylff dataset upload` | |
| ```bash | |
| ylff dataset upload arkit_dataset.zip \ | |
| --output-dir data/uploaded_datasets \ | |
| --validate | |
| ``` | |
| **Options:** | |
| - `zip_path`: Path to zip file (required) | |
| - `--output-dir`: Directory to extract dataset (default: "data/uploaded_datasets") | |
| - `--validate`: Validate ARKit pairs before extraction (default: true) | |
| ### `ylff dataset download` | |
| ```bash | |
| ylff dataset download my-bucket datasets/arkit.zip \ | |
| --output-dir data/downloaded_datasets \ | |
| --extract \ | |
| --region-name us-east-1 | |
| ``` | |
| **Options:** | |
| - `bucket_name`: S3 bucket name (required) | |
| - `s3_key`: S3 object key (required) | |
| - `--output-dir`: Directory to save dataset (default: "data/downloaded_datasets") | |
| - `--extract`: Extract downloaded archive (default: true) | |
| - `--aws-access-key-id`: AWS access key ID (optional) | |
| - `--aws-secret-access-key`: AWS secret access key (optional) | |
| - `--region-name`: AWS region name (default: "us-east-1") | |
| ## π¦ Requirements | |
| ### Upload | |
| - No additional dependencies (uses standard library) | |
| ### Download | |
| - `boto3` - AWS SDK for Python | |
| ```bash | |
| pip install boto3 | |
| ``` | |
| ## π Usage Examples | |
| ### Upload ARKit Dataset | |
| **CLI:** | |
| ```bash | |
| ylff dataset upload my_arkit_data.zip --output-dir data/sequences | |
| ``` | |
| **API:** | |
| ```python | |
| import requests | |
| with open("my_arkit_data.zip", "rb") as f: | |
| response = requests.post( | |
| "http://localhost:8000/api/v1/dataset/upload", | |
| files={"file": f}, | |
| data={"output_dir": "data/sequences", "validate": "true"} | |
| ) | |
| job_id = response.json()["job_id"] | |
| ``` | |
| ### Download from S3 | |
| **CLI:** | |
| ```bash | |
| ylff dataset download my-bucket datasets/v1.zip \ | |
| --output-dir data/downloaded \ | |
| --extract | |
| ``` | |
| **API:** | |
| ```python | |
| import requests | |
| response = requests.post( | |
| "http://localhost:8000/api/v1/dataset/download", | |
| json={ | |
| "bucket_name": "my-bucket", | |
| "s3_key": "datasets/v1.zip", | |
| "output_dir": "data/downloaded", | |
| "extract": True, | |
| } | |
| ) | |
| result = response.json() | |
| ``` | |
| ## π Validation | |
| The upload process validates: | |
| - β Zip file format | |
| - β Matching video-metadata pairs (same base name) | |
| - β Valid JSON metadata format | |
| - β File organization | |
| **Validation Report:** | |
| - Total files in zip | |
| - Video files count | |
| - Metadata files count | |
| - Valid pairs count | |
| - Invalid pairs list | |
| - Organized sequences count | |
| ## π AWS Credentials | |
| The download functionality supports multiple credential methods: | |
| 1. **Explicit credentials** (via API/CLI parameters) | |
| 2. **Environment variables** (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) | |
| 3. **IAM role** (when running on EC2/ECS) | |
| 4. **Credentials file** (`~/.aws/credentials`) | |
| All methods are supported via boto3's default credentials chain. | |
| ## π Next Steps | |
| 1. **S3 Upload** - Add ability to upload datasets to S3 | |
| 2. **Dataset Listing** - API endpoint to list available datasets in S3 | |
| 3. **Incremental Downloads** - Support for partial dataset downloads | |
| 4. **Compression Options** - Configurable compression for uploads | |
| 5. **Metadata Validation** - Enhanced ARKit metadata schema validation | |
| All core functionality is implemented and ready to use! π | |