Commit History

feat(data): add new .h5ad sample file for scRNA-seq dataset
febaca1

whats2000 commited on

fix(gitattributes): add .h5ad file type to LFS tracking
e829063

whats2000 commited on

Merge branch 'main' of https://huggingface.co/NCHC-bio/cell_x_gene_visualization into main
a61de1d

whats2000 commited on

feat(visualization): add detail gene visualization reuslt
0aa3021

whats2000 commited on

Upload d7476ae2-e320-4703-8304-da5c42627e71__HTAPP-330-SMP-1082_scRNA-seq.h5ad
e0bac17
verified

freshnemo commited on

fix(notebook): fix the notebook code
766b5e7

whats2000 commited on

feat(eda): normalize dataset paths and deduplicate results in summary
95969f7

whats2000 commited on

feat(eda): retrieve chunk size for each dataset in batch processing
5e1e99a

whats2000 commited on

feat(config): update dataset size thresholds for improved processing efficiency
b1d3f22

whats2000 commited on

feat(eda): update large file processing to support parallel workers and enhance metadata caching
75aa70e

whats2000 commited on

feat(config): fix some hard code config and docs
685b361

whats2000 commited on

feat(eda): categorize datasets into small, dask-ready, and xlarge for improved processing
f06cfcb

whats2000 commited on

feat(eda): adjust worker settings and add emergency mode for handling failed slices for extremly large
db122fd

whats2000 commited on

feat(eda): enhance dataset processing for extra large
2cdb847

whats2000 commited on

fix(eda): correct max_workers and min_workers values for optimal resource allocation
311496c

whats2000 commited on

feat(eda): add adaptive scaling parameters and initial worker configuration for improved resource management
d94a334

whats2000 commited on

feat(eda): enhance resource utilization by optimizing worker allocation and processing parameters
32516b1

whats2000 commited on

feat(eda): optimize resource allocation and processing parameters for enhanced performance
5910420

whats2000 commited on

feat(metadata): add handling for missing datasets in CELLxGENE metadata and update status reporting
596560a

whats2000 commited on

feat(eda): add cache validation and retry mechanism for metadata build
6eb2e4a

whats2000 commited on

fix(eda): optimize gene statistics calculation in distributed EDA
ac07329

whats2000 commited on

feat(eda): update resource specifications for optimized performance
4e03c42

whats2000 commited on

feat(slurm): create logs directory and add to gitignore
cbe0341

whats2000 commited on

fix(eda): remove undefined 'info' variable reference causing crash
19a6596

whats2000 commited on

fix(slurm): correct job time allocation in SLURM script
6379f62

whats2000 commited on

refactor(slurm): update resource allocation and remove deprecated script
08c5297

whats2000 commited on

feat(slurm): add SKIP_CACHE_BUILD option to skip metadata cache building
5d80e52

whats2000 commited on

feat(eda): add resume capability and graceful error handling
34c9a87

whats2000 commited on

feat(eda): implement hybrid processing strategy for small and large datasets
b8d98f3

whats2000 commited on

feat(eda): refactor distributed EDA script for improved performance and memory management
74e20c3

whats2000 commited on

fix(eda): optimize memory usage and ensure complete data computation
14cc169

whats2000 commited on

feat(eda): migrate to Dask distributed with adaptive scaling and memory limits
450c8b2

whats2000 commited on

fix(eda): use recent throughput instead of cumulative average for adaptive scaling
d25b7a0

whats2000 commited on

feat(eda): add adaptive worker reduction based on throughput monitoring
e4396ec

whats2000 commited on

fix(config): remove mem_per_worker_gib from config files and calculate dynamically in resource_probe script
0c8f912

whats2000 commited on

fix(config): clarify max_memory_gib allocation for staged processing
2cae30c

whats2000 commited on

fix(config): increase max_entries to 1T to include 520B entry dataset
2138486

whats2000 commited on

fix(retry): add size categorization after merge to prevent null categories
05143cc

whats2000 commited on

fix(retry): include corrupted status in retry logic
8822a1a

whats2000 commited on

fix(cache): categorize ok_retry and ok_h5py datasets by size
cc36ee1

whats2000 commited on

fix(eda): include all successfully scanned datasets (ok_retry, ok_h5py)
d2cd091

whats2000 commited on

docs(cache): clarify incremental cache behavior and metadata skip option
436909b

whats2000 commited on

feat(cache): add enhanced metadata cache to repository
2581661

whats2000 commited on

feat(recovery): add corrupted file redownload script and documentation
cee344c

whats2000 commited on

feat(retry_failed_cache): implement dataset retry mechanism and merging of results
874e4c6

whats2000 commited on

fix(cache): implement two-phase scanning to handle large files serially and prevent OOM
3ec846f

whats2000 commited on

fix(cache): use ProcessPoolExecutor for HDF5 thread-safety
655350f

whats2000 commited on

feat(pipeline): add YAML config, metadata-aware scheduling, and dataset slicing
856e1ba

whats2000 commited on

fix(eda): prevent BrokenProcessPool cascade failures
790f422

whats2000 commited on

Initial commit: distributed EDA pipeline, max non-zero reporting, and notebook
dddcc0f

whats2000 commited on