NCHC-bio
/

cell_x_gene_exploratory_data_analysis

Model card Files Files and versions

cell_x_gene_exploratory_data_analysis

Commit History

feat(data): add new .h5ad sample file for scRNA-seq dataset

febaca1

whats2000 commited on Feb 13

fix(gitattributes): add .h5ad file type to LFS tracking

e829063

whats2000 commited on Feb 13

Merge branch 'main' of https://huggingface.co/NCHC-bio/cell_x_gene_visualization into main

a61de1d

whats2000 commited on Feb 13

feat(visualization): add detail gene visualization reuslt

0aa3021

whats2000 commited on Feb 13

Upload d7476ae2-e320-4703-8304-da5c42627e71__HTAPP-330-SMP-1082_scRNA-seq.h5ad

e0bac17
verified

freshnemo commited on Feb 13

fix(notebook): fix the notebook code

766b5e7

whats2000 commited on Feb 13

feat(eda): normalize dataset paths and deduplicate results in summary

95969f7

whats2000 commited on Feb 13

feat(eda): retrieve chunk size for each dataset in batch processing

5e1e99a

whats2000 commited on Feb 12

feat(config): update dataset size thresholds for improved processing efficiency

b1d3f22

whats2000 commited on Feb 12

feat(eda): update large file processing to support parallel workers and enhance metadata caching

75aa70e

whats2000 commited on Feb 12

feat(config): fix some hard code config and docs

685b361

whats2000 commited on Feb 12

feat(eda): categorize datasets into small, dask-ready, and xlarge for improved processing

f06cfcb

whats2000 commited on Feb 12

feat(eda): adjust worker settings and add emergency mode for handling failed slices for extremly large

db122fd

whats2000 commited on Feb 12

feat(eda): enhance dataset processing for extra large

2cdb847

whats2000 commited on Feb 12

fix(eda): correct max_workers and min_workers values for optimal resource allocation

311496c

whats2000 commited on Feb 12

feat(eda): add adaptive scaling parameters and initial worker configuration for improved resource management

d94a334

whats2000 commited on Feb 12

feat(eda): enhance resource utilization by optimizing worker allocation and processing parameters

32516b1

whats2000 commited on Feb 12

feat(eda): optimize resource allocation and processing parameters for enhanced performance

5910420

whats2000 commited on Feb 12

feat(metadata): add handling for missing datasets in CELLxGENE metadata and update status reporting

596560a

whats2000 commited on Feb 12

feat(eda): add cache validation and retry mechanism for metadata build

6eb2e4a

whats2000 commited on Feb 12

fix(eda): optimize gene statistics calculation in distributed EDA

ac07329

whats2000 commited on Feb 12

feat(eda): update resource specifications for optimized performance

4e03c42

whats2000 commited on Feb 12

feat(slurm): create logs directory and add to gitignore

cbe0341

whats2000 commited on Feb 12

fix(eda): remove undefined 'info' variable reference causing crash

19a6596

whats2000 commited on Feb 12

fix(slurm): correct job time allocation in SLURM script

6379f62

whats2000 commited on Feb 12

refactor(slurm): update resource allocation and remove deprecated script

08c5297

whats2000 commited on Feb 12

feat(slurm): add SKIP_CACHE_BUILD option to skip metadata cache building

5d80e52

whats2000 commited on Feb 12

feat(eda): add resume capability and graceful error handling

34c9a87

whats2000 commited on Feb 12

feat(eda): implement hybrid processing strategy for small and large datasets

b8d98f3

whats2000 commited on Feb 11

feat(eda): refactor distributed EDA script for improved performance and memory management

74e20c3

whats2000 commited on Feb 11

fix(eda): optimize memory usage and ensure complete data computation

14cc169

whats2000 commited on Feb 11

feat(eda): migrate to Dask distributed with adaptive scaling and memory limits

450c8b2

whats2000 commited on Feb 11

fix(eda): use recent throughput instead of cumulative average for adaptive scaling

d25b7a0

whats2000 commited on Feb 11

feat(eda): add adaptive worker reduction based on throughput monitoring

e4396ec

whats2000 commited on Feb 11

fix(config): remove mem_per_worker_gib from config files and calculate dynamically in resource_probe script

0c8f912

whats2000 commited on Feb 11

fix(config): clarify max_memory_gib allocation for staged processing

2cae30c

whats2000 commited on Feb 11

fix(config): increase max_entries to 1T to include 520B entry dataset

2138486

whats2000 commited on Feb 11

fix(retry): add size categorization after merge to prevent null categories

05143cc

whats2000 commited on Feb 11

fix(retry): include corrupted status in retry logic

8822a1a

whats2000 commited on Feb 11

fix(cache): categorize ok_retry and ok_h5py datasets by size

cc36ee1

whats2000 commited on Feb 11

fix(eda): include all successfully scanned datasets (ok_retry, ok_h5py)

d2cd091

whats2000 commited on Feb 11

docs(cache): clarify incremental cache behavior and metadata skip option

436909b

whats2000 commited on Feb 11

feat(cache): add enhanced metadata cache to repository

2581661

whats2000 commited on Feb 11

feat(recovery): add corrupted file redownload script and documentation

cee344c

whats2000 commited on Feb 11

feat(retry_failed_cache): implement dataset retry mechanism and merging of results

874e4c6

whats2000 commited on Feb 11

fix(cache): implement two-phase scanning to handle large files serially and prevent OOM

3ec846f

whats2000 commited on Feb 11

fix(cache): use ProcessPoolExecutor for HDF5 thread-safety

655350f

whats2000 commited on Feb 11

feat(pipeline): add YAML config, metadata-aware scheduling, and dataset slicing

856e1ba

whats2000 commited on Feb 11

fix(eda): prevent BrokenProcessPool cascade failures

790f422

whats2000 commited on Feb 11

Initial commit: distributed EDA pipeline, max non-zero reporting, and notebook

dddcc0f

whats2000 commited on Feb 11