Commit History

feat(eda): normalize dataset paths and deduplicate results in summary
95969f7

whats2000 commited on

feat(eda): retrieve chunk size for each dataset in batch processing
5e1e99a

whats2000 commited on

feat(eda): update large file processing to support parallel workers and enhance metadata caching
75aa70e

whats2000 commited on

feat(config): fix some hard code config and docs
685b361

whats2000 commited on

feat(eda): categorize datasets into small, dask-ready, and xlarge for improved processing
f06cfcb

whats2000 commited on

feat(eda): adjust worker settings and add emergency mode for handling failed slices for extremly large
db122fd

whats2000 commited on

feat(eda): enhance dataset processing for extra large
2cdb847

whats2000 commited on

feat(eda): add adaptive scaling parameters and initial worker configuration for improved resource management
d94a334

whats2000 commited on

feat(eda): enhance resource utilization by optimizing worker allocation and processing parameters
32516b1

whats2000 commited on

feat(metadata): add handling for missing datasets in CELLxGENE metadata and update status reporting
596560a

whats2000 commited on

feat(eda): add cache validation and retry mechanism for metadata build
6eb2e4a

whats2000 commited on

fix(eda): optimize gene statistics calculation in distributed EDA
ac07329

whats2000 commited on

fix(eda): remove undefined 'info' variable reference causing crash
19a6596

whats2000 commited on

refactor(slurm): update resource allocation and remove deprecated script
08c5297

whats2000 commited on

feat(eda): add resume capability and graceful error handling
34c9a87

whats2000 commited on

feat(eda): implement hybrid processing strategy for small and large datasets
b8d98f3

whats2000 commited on

feat(eda): refactor distributed EDA script for improved performance and memory management
74e20c3

whats2000 commited on

fix(eda): optimize memory usage and ensure complete data computation
14cc169

whats2000 commited on

feat(eda): migrate to Dask distributed with adaptive scaling and memory limits
450c8b2

whats2000 commited on

fix(eda): use recent throughput instead of cumulative average for adaptive scaling
d25b7a0

whats2000 commited on

feat(eda): add adaptive worker reduction based on throughput monitoring
e4396ec

whats2000 commited on

fix(config): remove mem_per_worker_gib from config files and calculate dynamically in resource_probe script
0c8f912

whats2000 commited on

fix(config): clarify max_memory_gib allocation for staged processing
2cae30c

whats2000 commited on

fix(retry): add size categorization after merge to prevent null categories
05143cc

whats2000 commited on

fix(retry): include corrupted status in retry logic
8822a1a

whats2000 commited on

fix(cache): categorize ok_retry and ok_h5py datasets by size
cc36ee1

whats2000 commited on

fix(eda): include all successfully scanned datasets (ok_retry, ok_h5py)
d2cd091

whats2000 commited on

feat(recovery): add corrupted file redownload script and documentation
cee344c

whats2000 commited on

feat(retry_failed_cache): implement dataset retry mechanism and merging of results
874e4c6

whats2000 commited on

fix(cache): implement two-phase scanning to handle large files serially and prevent OOM
3ec846f

whats2000 commited on

fix(cache): use ProcessPoolExecutor for HDF5 thread-safety
655350f

whats2000 commited on

feat(pipeline): add YAML config, metadata-aware scheduling, and dataset slicing
856e1ba

whats2000 commited on

fix(eda): prevent BrokenProcessPool cascade failures
790f422

whats2000 commited on

Initial commit: distributed EDA pipeline, max non-zero reporting, and notebook
dddcc0f

whats2000 commited on