NCHC-bio
/

cell_x_gene_exploratory_data_analysis

Model card Files Files and versions

cell_x_gene_exploratory_data_analysis / scripts

Commit History

feat(eda): normalize dataset paths and deduplicate results in summary

95969f7

whats2000 commited on Feb 13

feat(eda): retrieve chunk size for each dataset in batch processing

5e1e99a

whats2000 commited on Feb 12

feat(eda): update large file processing to support parallel workers and enhance metadata caching

75aa70e

whats2000 commited on Feb 12

feat(config): fix some hard code config and docs

685b361

whats2000 commited on Feb 12

feat(eda): categorize datasets into small, dask-ready, and xlarge for improved processing

f06cfcb

whats2000 commited on Feb 12

feat(eda): adjust worker settings and add emergency mode for handling failed slices for extremly large

db122fd

whats2000 commited on Feb 12

feat(eda): enhance dataset processing for extra large

2cdb847

whats2000 commited on Feb 12

feat(eda): add adaptive scaling parameters and initial worker configuration for improved resource management

d94a334

whats2000 commited on Feb 12

feat(eda): enhance resource utilization by optimizing worker allocation and processing parameters

32516b1

whats2000 commited on Feb 12

feat(metadata): add handling for missing datasets in CELLxGENE metadata and update status reporting

596560a

whats2000 commited on Feb 12

feat(eda): add cache validation and retry mechanism for metadata build

6eb2e4a

whats2000 commited on Feb 12

fix(eda): optimize gene statistics calculation in distributed EDA

ac07329

whats2000 commited on Feb 12

fix(eda): remove undefined 'info' variable reference causing crash

19a6596

whats2000 commited on Feb 12

refactor(slurm): update resource allocation and remove deprecated script

08c5297

whats2000 commited on Feb 12

feat(eda): add resume capability and graceful error handling

34c9a87

whats2000 commited on Feb 12

feat(eda): implement hybrid processing strategy for small and large datasets

b8d98f3

whats2000 commited on Feb 11

feat(eda): refactor distributed EDA script for improved performance and memory management

74e20c3

whats2000 commited on Feb 11

fix(eda): optimize memory usage and ensure complete data computation

14cc169

whats2000 commited on Feb 11

feat(eda): migrate to Dask distributed with adaptive scaling and memory limits

450c8b2

whats2000 commited on Feb 11

fix(eda): use recent throughput instead of cumulative average for adaptive scaling

d25b7a0

whats2000 commited on Feb 11

feat(eda): add adaptive worker reduction based on throughput monitoring

e4396ec

whats2000 commited on Feb 11

fix(config): remove mem_per_worker_gib from config files and calculate dynamically in resource_probe script

0c8f912

whats2000 commited on Feb 11

fix(config): clarify max_memory_gib allocation for staged processing

2cae30c

whats2000 commited on Feb 11

fix(retry): add size categorization after merge to prevent null categories

05143cc

whats2000 commited on Feb 11

fix(retry): include corrupted status in retry logic

8822a1a

whats2000 commited on Feb 11

fix(cache): categorize ok_retry and ok_h5py datasets by size

cc36ee1

whats2000 commited on Feb 11

fix(eda): include all successfully scanned datasets (ok_retry, ok_h5py)

d2cd091

whats2000 commited on Feb 11

feat(recovery): add corrupted file redownload script and documentation

cee344c

whats2000 commited on Feb 11

feat(retry_failed_cache): implement dataset retry mechanism and merging of results

874e4c6

whats2000 commited on Feb 11

fix(cache): implement two-phase scanning to handle large files serially and prevent OOM

3ec846f

whats2000 commited on Feb 11

fix(cache): use ProcessPoolExecutor for HDF5 thread-safety

655350f

whats2000 commited on Feb 11

feat(pipeline): add YAML config, metadata-aware scheduling, and dataset slicing

856e1ba

whats2000 commited on Feb 11

fix(eda): prevent BrokenProcessPool cascade failures

790f422

whats2000 commited on Feb 11

Initial commit: distributed EDA pipeline, max non-zero reporting, and notebook

dddcc0f

whats2000 commited on Feb 11