3d_model / docs /DATASET_UPLOAD_DOWNLOAD.md
Azan
Clean deployment build (Squashed)
7a87926

Dataset Upload & Download - Implementation Complete

Dataset upload and download functionality has been implemented for ARKit datasets.

βœ… Implemented Features

1. Dataset Upload (ylff/utils/dataset_upload.py)

Functions:

  • βœ… validate_arkit_zip() - Validate zip file contains valid ARKit video-metadata pairs
  • βœ… extract_arkit_zip() - Extract and organize ARKit zip file into sequence directories
  • βœ… process_uploaded_dataset() - Complete upload processing pipeline

Features:

  • Validates zip file format
  • Checks for matching video-metadata pairs (same base name)
  • Validates JSON metadata format
  • Organizes files into sequence directories
  • Reports validation errors and statistics

2. Dataset Download (ylff/utils/dataset_download.py)

S3DatasetDownloader Class:

  • βœ… S3 client initialization with credentials
  • βœ… list_datasets() - List available datasets in S3 bucket
  • βœ… download_dataset() - Download dataset from S3 with progress
  • βœ… download_and_extract() - Download and extract dataset

Features:

  • AWS credentials support (access key or credentials chain)
  • Progress bar for downloads
  • Automatic extraction (zip, tar.gz, tar)
  • Error handling and reporting

πŸ“‹ API Endpoints

/api/v1/dataset/upload (POST)

Request: Multipart form data

  • file: Zip file containing ARKit video and metadata pairs
  • output_dir: Directory to extract dataset (default: "data/uploaded_datasets")
  • validate: Validate ARKit pairs before extraction (default: true)

Response: JobResponse (async job)

Example:

curl -X POST "http://localhost:8000/api/v1/dataset/upload" \
  -F "file=@arkit_dataset.zip" \
  -F "output_dir=data/uploaded_datasets" \
  -F "validate=true"

/api/v1/dataset/download (POST)

Request Model: DownloadDatasetRequest

{
  "bucket_name": "my-datasets-bucket",
  "s3_key": "datasets/arkit_sequences.zip",
  "output_dir": "data/downloaded_datasets",
  "extract": true,
  "aws_access_key_id": null,
  "aws_secret_access_key": null,
  "region_name": "us-east-1"
}

Response: DownloadDatasetResponse

  • success: Boolean
  • output_path: Path to downloaded file (if not extracted)
  • output_dir: Directory where dataset was extracted (if extracted)
  • file_size: Size of downloaded file in bytes
  • error: Error message if download failed

πŸ”§ CLI Commands

ylff dataset upload

ylff dataset upload arkit_dataset.zip \
    --output-dir data/uploaded_datasets \
    --validate

Options:

  • zip_path: Path to zip file (required)
  • --output-dir: Directory to extract dataset (default: "data/uploaded_datasets")
  • --validate: Validate ARKit pairs before extraction (default: true)

ylff dataset download

ylff dataset download my-bucket datasets/arkit.zip \
    --output-dir data/downloaded_datasets \
    --extract \
    --region-name us-east-1

Options:

  • bucket_name: S3 bucket name (required)
  • s3_key: S3 object key (required)
  • --output-dir: Directory to save dataset (default: "data/downloaded_datasets")
  • --extract: Extract downloaded archive (default: true)
  • --aws-access-key-id: AWS access key ID (optional)
  • --aws-secret-access-key: AWS secret access key (optional)
  • --region-name: AWS region name (default: "us-east-1")

πŸ“¦ Requirements

Upload

  • No additional dependencies (uses standard library)

Download

  • boto3 - AWS SDK for Python
    pip install boto3
    

πŸ”„ Usage Examples

Upload ARKit Dataset

CLI:

ylff dataset upload my_arkit_data.zip --output-dir data/sequences

API:

import requests

with open("my_arkit_data.zip", "rb") as f:
    response = requests.post(
        "http://localhost:8000/api/v1/dataset/upload",
        files={"file": f},
        data={"output_dir": "data/sequences", "validate": "true"}
    )
    job_id = response.json()["job_id"]

Download from S3

CLI:

ylff dataset download my-bucket datasets/v1.zip \
    --output-dir data/downloaded \
    --extract

API:

import requests

response = requests.post(
    "http://localhost:8000/api/v1/dataset/download",
    json={
        "bucket_name": "my-bucket",
        "s3_key": "datasets/v1.zip",
        "output_dir": "data/downloaded",
        "extract": True,
    }
)
result = response.json()

πŸ“Š Validation

The upload process validates:

  • βœ… Zip file format
  • βœ… Matching video-metadata pairs (same base name)
  • βœ… Valid JSON metadata format
  • βœ… File organization

Validation Report:

  • Total files in zip
  • Video files count
  • Metadata files count
  • Valid pairs count
  • Invalid pairs list
  • Organized sequences count

πŸ” AWS Credentials

The download functionality supports multiple credential methods:

  1. Explicit credentials (via API/CLI parameters)
  2. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  3. IAM role (when running on EC2/ECS)
  4. Credentials file (~/.aws/credentials)

All methods are supported via boto3's default credentials chain.

πŸš€ Next Steps

  1. S3 Upload - Add ability to upload datasets to S3
  2. Dataset Listing - API endpoint to list available datasets in S3
  3. Incremental Downloads - Support for partial dataset downloads
  4. Compression Options - Configurable compression for uploads
  5. Metadata Validation - Enhanced ARKit metadata schema validation

All core functionality is implemented and ready to use! πŸŽ‰