feat(eda): normalize dataset paths and deduplicate results in summary 95969f7 whats2000 commited on Feb 13
feat(eda): retrieve chunk size for each dataset in batch processing 5e1e99a whats2000 commited on Feb 12
feat(eda): update large file processing to support parallel workers and enhance metadata caching 75aa70e whats2000 commited on Feb 12
feat(eda): categorize datasets into small, dask-ready, and xlarge for improved processing f06cfcb whats2000 commited on Feb 12
feat(eda): adjust worker settings and add emergency mode for handling failed slices for extremly large db122fd whats2000 commited on Feb 12
feat(eda): add adaptive scaling parameters and initial worker configuration for improved resource management d94a334 whats2000 commited on Feb 12
feat(eda): enhance resource utilization by optimizing worker allocation and processing parameters 32516b1 whats2000 commited on Feb 12
feat(metadata): add handling for missing datasets in CELLxGENE metadata and update status reporting 596560a whats2000 commited on Feb 12
feat(eda): add cache validation and retry mechanism for metadata build 6eb2e4a whats2000 commited on Feb 12
fix(eda): optimize gene statistics calculation in distributed EDA ac07329 whats2000 commited on Feb 12
fix(eda): remove undefined 'info' variable reference causing crash 19a6596 whats2000 commited on Feb 12
refactor(slurm): update resource allocation and remove deprecated script 08c5297 whats2000 commited on Feb 12
feat(eda): implement hybrid processing strategy for small and large datasets b8d98f3 whats2000 commited on Feb 11
feat(eda): refactor distributed EDA script for improved performance and memory management 74e20c3 whats2000 commited on Feb 11
fix(eda): optimize memory usage and ensure complete data computation 14cc169 whats2000 commited on Feb 11
feat(eda): migrate to Dask distributed with adaptive scaling and memory limits 450c8b2 whats2000 commited on Feb 11
fix(eda): use recent throughput instead of cumulative average for adaptive scaling d25b7a0 whats2000 commited on Feb 11
feat(eda): add adaptive worker reduction based on throughput monitoring e4396ec whats2000 commited on Feb 11
fix(config): remove mem_per_worker_gib from config files and calculate dynamically in resource_probe script 0c8f912 whats2000 commited on Feb 11
fix(config): clarify max_memory_gib allocation for staged processing 2cae30c whats2000 commited on Feb 11
fix(retry): add size categorization after merge to prevent null categories 05143cc whats2000 commited on Feb 11
fix(eda): include all successfully scanned datasets (ok_retry, ok_h5py) d2cd091 whats2000 commited on Feb 11
feat(recovery): add corrupted file redownload script and documentation cee344c whats2000 commited on Feb 11
feat(retry_failed_cache): implement dataset retry mechanism and merging of results 874e4c6 whats2000 commited on Feb 11
fix(cache): implement two-phase scanning to handle large files serially and prevent OOM 3ec846f whats2000 commited on Feb 11
feat(pipeline): add YAML config, metadata-aware scheduling, and dataset slicing 856e1ba whats2000 commited on Feb 11
Initial commit: distributed EDA pipeline, max non-zero reporting, and notebook dddcc0f whats2000 commited on Feb 11