jruffle commited on
Commit
1e555fe
·
verified ·
1 Parent(s): 395bdbc

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +20 -44
  2. metadata.json +15 -0
  3. model.joblib +2 -2
README.md CHANGED
@@ -1,63 +1,39 @@
1
  ---
2
- title: Classical Methods (Transcriptome-centric, 2D)
3
- emoji: 📊
4
- colorFrom: purple
5
- colorTo: blue
6
- sdk: python
7
  tags:
8
  - transcriptomics
9
  - dimensionality-reduction
10
- - pca
11
- - umap
 
 
12
  license: mit
13
  ---
14
 
15
- # Classical Dimensionality Reduction (Transcriptome-centric, 2D)
16
 
17
- Pre-trained PCA and UMAP models for transcriptomics data compression, part of the TRACERx Datathon 2025 project.
18
 
19
- ## Model Details
20
 
21
- - **Methods**: PCA and UMAP
22
- - **Compression Mode**: Transcriptome-centric
23
- - **Output Dimensions**: 2
24
- - **Training Data**: TRACERx open dataset (VST-normalized counts)
25
-
26
- ## Contents
27
-
28
- The model file contains:
29
- - **PCA**: Principal Component Analysis model
30
- - **UMAP**: Uniform Manifold Approximation and Projection model (2-4D only)
31
- - **Scaler**: StandardScaler fitted on TRACERx data
32
- - **Feature Order**: Gene/sample order for alignment
33
 
34
  ## Usage
35
 
36
- These models are designed to be used with the TRACERx Datathon 2025 analysis pipeline.
37
- They will be automatically downloaded and cached when needed.
38
-
39
  ```python
40
  import joblib
 
41
 
42
- # Load the model bundle
43
- model_data = joblib.load("model.joblib")
 
44
 
45
- # Access components
46
- pca = model_data['pca']
47
- scaler = model_data['scaler']
48
- gene_order = model_data.get('gene_order') # For sample-centric
49
 
50
- # Transform new data
51
- scaled_data = scaler.transform(aligned_data)
52
- embeddings = pca.transform(scaled_data)
53
  ```
54
-
55
- ## Training Details
56
-
57
- - **Input Features**: 1,051 samples
58
- - **Training Samples**: 20,136 genes
59
- - **Preprocessing**: StandardScaler normalization
60
-
61
- ## Files
62
-
63
- - `model.joblib`: Model bundle containing PCA, UMAP, scaler, and feature order
 
1
  ---
 
 
 
 
 
2
  tags:
3
  - transcriptomics
4
  - dimensionality-reduction
5
+ - classical
6
+ - TRACERx
7
+ - UMAP
8
+ - PCA
9
  license: mit
10
  ---
11
 
12
+ # Classical Models (PCA + UMAP) - transcriptome mode - 2D
13
 
14
+ Pre-trained PCA and UMAP models for transcriptomic data compression.
15
 
16
+ **UMAP models support transform()** - new data can be projected into the same embedding space.
17
 
18
+ ## Details
19
+ - **Mode**: transcriptome-centric compression
20
+ - **Dimensions**: 2
21
+ - **Training data**: TRACERx lung cancer transcriptomics
22
+ - **Created**: 2026-01-13T16:56:13.982002
23
+ - **UMAP transform**: Enabled (low_memory=False)
 
 
 
 
 
 
24
 
25
  ## Usage
26
 
 
 
 
27
  ```python
28
  import joblib
29
+ from huggingface_hub import snapshot_download
30
 
31
+ # Download model
32
+ local_dir = snapshot_download("jruffle/classical_transcriptome_2d")
33
+ model = joblib.load(f"{local_dir}/model.joblib")
34
 
35
+ # Model contains: 'pca', 'umap', 'robust_scaler', 'gene_order'
 
 
 
36
 
37
+ # Use UMAP transform on new data:
38
+ new_embeddings = model['umap'].transform(preprocessed_new_data)
 
39
  ```
 
 
 
 
 
 
 
 
 
 
metadata.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "classical",
3
+ "mode": "transcriptome",
4
+ "dimensions": 2,
5
+ "created": "2026-01-13T16:56:13.982185",
6
+ "umap_transform_enabled": true,
7
+ "keys": [
8
+ "pca",
9
+ "robust_scaler",
10
+ "gene_order",
11
+ "sample_ids",
12
+ "umap",
13
+ "norm_params"
14
+ ]
15
+ }
model.joblib CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7a078234fb7a7d98c93ee007defea741d40518906180092294af12199bed19d2
3
- size 349722542
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8958280fa3dfedb395e63983e543a50f580da35f2f325f9d5ef9d81b37a39e8
3
+ size 264433435