Dataset Upload & Download - Implementation Complete
Dataset upload and download functionality has been implemented for ARKit datasets.
β Implemented Features
1. Dataset Upload (ylff/utils/dataset_upload.py)
Functions:
- β
validate_arkit_zip()- Validate zip file contains valid ARKit video-metadata pairs - β
extract_arkit_zip()- Extract and organize ARKit zip file into sequence directories - β
process_uploaded_dataset()- Complete upload processing pipeline
Features:
- Validates zip file format
- Checks for matching video-metadata pairs (same base name)
- Validates JSON metadata format
- Organizes files into sequence directories
- Reports validation errors and statistics
2. Dataset Download (ylff/utils/dataset_download.py)
S3DatasetDownloader Class:
- β S3 client initialization with credentials
- β
list_datasets()- List available datasets in S3 bucket - β
download_dataset()- Download dataset from S3 with progress - β
download_and_extract()- Download and extract dataset
Features:
- AWS credentials support (access key or credentials chain)
- Progress bar for downloads
- Automatic extraction (zip, tar.gz, tar)
- Error handling and reporting
π API Endpoints
/api/v1/dataset/upload (POST)
Request: Multipart form data
file: Zip file containing ARKit video and metadata pairsoutput_dir: Directory to extract dataset (default: "data/uploaded_datasets")validate: Validate ARKit pairs before extraction (default: true)
Response: JobResponse (async job)
Example:
curl -X POST "http://localhost:8000/api/v1/dataset/upload" \
-F "file=@arkit_dataset.zip" \
-F "output_dir=data/uploaded_datasets" \
-F "validate=true"
/api/v1/dataset/download (POST)
Request Model: DownloadDatasetRequest
{
"bucket_name": "my-datasets-bucket",
"s3_key": "datasets/arkit_sequences.zip",
"output_dir": "data/downloaded_datasets",
"extract": true,
"aws_access_key_id": null,
"aws_secret_access_key": null,
"region_name": "us-east-1"
}
Response: DownloadDatasetResponse
success: Booleanoutput_path: Path to downloaded file (if not extracted)output_dir: Directory where dataset was extracted (if extracted)file_size: Size of downloaded file in byteserror: Error message if download failed
π§ CLI Commands
ylff dataset upload
ylff dataset upload arkit_dataset.zip \
--output-dir data/uploaded_datasets \
--validate
Options:
zip_path: Path to zip file (required)--output-dir: Directory to extract dataset (default: "data/uploaded_datasets")--validate: Validate ARKit pairs before extraction (default: true)
ylff dataset download
ylff dataset download my-bucket datasets/arkit.zip \
--output-dir data/downloaded_datasets \
--extract \
--region-name us-east-1
Options:
bucket_name: S3 bucket name (required)s3_key: S3 object key (required)--output-dir: Directory to save dataset (default: "data/downloaded_datasets")--extract: Extract downloaded archive (default: true)--aws-access-key-id: AWS access key ID (optional)--aws-secret-access-key: AWS secret access key (optional)--region-name: AWS region name (default: "us-east-1")
π¦ Requirements
Upload
- No additional dependencies (uses standard library)
Download
boto3- AWS SDK for Pythonpip install boto3
π Usage Examples
Upload ARKit Dataset
CLI:
ylff dataset upload my_arkit_data.zip --output-dir data/sequences
API:
import requests
with open("my_arkit_data.zip", "rb") as f:
response = requests.post(
"http://localhost:8000/api/v1/dataset/upload",
files={"file": f},
data={"output_dir": "data/sequences", "validate": "true"}
)
job_id = response.json()["job_id"]
Download from S3
CLI:
ylff dataset download my-bucket datasets/v1.zip \
--output-dir data/downloaded \
--extract
API:
import requests
response = requests.post(
"http://localhost:8000/api/v1/dataset/download",
json={
"bucket_name": "my-bucket",
"s3_key": "datasets/v1.zip",
"output_dir": "data/downloaded",
"extract": True,
}
)
result = response.json()
π Validation
The upload process validates:
- β Zip file format
- β Matching video-metadata pairs (same base name)
- β Valid JSON metadata format
- β File organization
Validation Report:
- Total files in zip
- Video files count
- Metadata files count
- Valid pairs count
- Invalid pairs list
- Organized sequences count
π AWS Credentials
The download functionality supports multiple credential methods:
- Explicit credentials (via API/CLI parameters)
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - IAM role (when running on EC2/ECS)
- Credentials file (
~/.aws/credentials)
All methods are supported via boto3's default credentials chain.
π Next Steps
- S3 Upload - Add ability to upload datasets to S3
- Dataset Listing - API endpoint to list available datasets in S3
- Incremental Downloads - Support for partial dataset downloads
- Compression Options - Configurable compression for uploads
- Metadata Validation - Enhanced ARKit metadata schema validation
All core functionality is implemented and ready to use! π