statLens

Key features

Self-hosted, no external API calls. Your data never leaves the box.
13 differential-expression pipelines covering every common DEA scenario (Count / Continuous × basic / batch / paired / multi-group / time-course / interaction, plus ZINB) — or none_of_these if your study sits outside the supported space.
Editable schema in the middle. statLens shows you the 21-field study-design summary it extracted, lets you fix anything wrong, then picks the pipeline.
End-to-end ≈ 25–45 s per request on a single 24 GB-class NVIDIA GPU.
Wheel install + one command to launch — pip install and statlens serve is all you need.
Reproducible — every run is a self-contained folder you can zip and ship.

Prerequisite
Quick start
Where statlens looks for the base model
Screenshots
Hardware
TSV format
Output
Use cases
Pipeline labels
Interpreting results
Headless server
Subcommands
API endpoints
Models
Training (LoRA only)
Troubleshooting
Citation

Prerequisite

You need Qwen3-32B (~64 GB BF16) on local disk. If you don't have it:

huggingface-cli download Qwen/Qwen3-32B --local-dir ~/models/qwen3-32b

Quick start

# 1. Install statLens (~5 min on first run; pulls ~2 GB of CUDA dependencies)
pip install ${HF_ENDPOINT:-https://huggingface.co}/domizzz2025/statLens/resolve/main/statlens-0.1.11-py3-none-any.whl

# 2. Launch
statlens serve

After ~80 s you should see:

══════════════════════════════════════════════════════
  ✅ statLens ready
     open in browser:  http://localhost:7860/
     Ctrl+C to stop.
══════════════════════════════════════════════════════

Open http://localhost:7860, drop a TSV, write a short study-context description, click Classify & run, review the extracted schema, then Run pipeline →.

Where statlens looks for the base model

When you run statlens serve, it auto-discovers Qwen3-32B in any of:

~/models/qwen3-32b/
/root/autodl-tmp/models/qwen3-32b/ (AutoDL)
/workspace/models/qwen3-32b/ (RunPod / Lambda)
/data/models/qwen3-32b/, /mnt/models/qwen3-32b/
the HuggingFace Hub cache

If yours is elsewhere, point statlens serve at it explicitly:

statlens serve --base-model /path/to/qwen3-32b
# or, persistently:
export STATLENS_BASE_MODEL=/path/to/qwen3-32b

Screenshots


	Step 1 — Upload + describe. Drop a wide-format TSV and write a short study description.
	Step 2 — Review the extracted schema. statLens shows the 21 fields it inferred from your data; edit anything that looks wrong before continuing.
	Step 3 — Get plots and tables. The matched DESeq2 / limma pipeline runs and returns 5 plots and the result tables, packaged as a downloadable zip.

Hardware


GPU	NVIDIA, ≥ 22 GB VRAM (RTX 3090 / 4090 / A40 / A100 / 5090 …)
OS	Linux x86_64
RAM	32 GB+
Disk	75 GB free (64 GB Qwen + 1 GB LoRA + working space)

Mac / Windows / AMD: not supported (LLaMA-Factory + bitsandbytes are CUDA-only).

TSV format

Wide format, one row per sample. Required columns:

column	meaning
`sample_id`	unique per row
`subject_id`	repeats for paired or longitudinal samples
a design column	`group` / `condition` / `treatment` / `clinical_group` / `tumor_stage` / `subtype` / `arm` / … (fuzzy-matched)
feature columns	prefixed exactly one of: `gene_`, `asv_`, `prot_`, `metab_`, `otu_`, `feat_`. Other prefixes are not recognised and the adapter will report `No feature columns found`.

Optional: time-like columns (time_day, collection_day, time_week, …) and batch-like columns (batch, site, run, ms_batch, …).

13 demonstration TSVs ship with the package — list them with:

ls $(python3 -c 'import statlens, os; print(os.path.dirname(statlens.__file__))')/data/examples/

Output

Each run lands under ~/.cache/statlens/runs/<run_id>/:

out/
├── statlens_report.md              # human-readable summary + reasoning
├── statlens_report.json            # machine-readable sidecar
└── pipeline_output/
    ├── volcano_plot.png
    ├── PCA_plot.png
    ├── MA_plot.png
    ├── top_DE_genes_heatmap.png
    ├── top20_DE_genes_barplot.png
    ├── results.csv
    ├── significant_genes.csv
    └── run.log
result.zip                          # everything above, packaged

The Download all button in the web UI returns result.zip.

Use cases

Three representative scenarios from the 13 supported pipelines:

Scenario	Study	Required columns	Expected label
Bulk RNA-seq case-control	30 patients, 15 case vs 15 control, single sequencing batch, looking for DE genes	`sample_id`, `subject_id`, `group`, `gene_*`	`Count_DESeq2_basic`
Plasma proteomics two-arm	LC-MS/MS, 12 cases vs 12 controls, log2 intensity, single MS run	`sample_id`, `subject_id`, `group`, `prot_*`	`Continuous_limma_basic`
16S microbiome IBD vs Healthy	sparse ASV counts dominated by zeros (>40 %)	`sample_id`, `subject_id`, `condition`, `asv_*`	`Count_DESeq2_ZINB`

A full set of 13 paired demos — one (.tsv + matching .context.txt) per label — lives under examples/. Drop any .tsv into the web UI and paste the matching .context.txt as your study description to reproduce the scenario in one click.

Pipeline labels

statLens classifies a study into one of 13 DEA scenarios, or returns none_of_these (a 14th "kill-switch" output) when the design falls outside its training space.

family	label	when
Count (DESeq2)	`Count_DESeq2_basic`	2 groups, no batch / time / pairing
	`Count_DESeq2_with_batch`	2 groups + batch covariate
	`Count_DESeq2_paired_or_repeated`	matched samples within subject
	`Count_DESeq2_multi_group`	≥ 3 independent groups
	`Count_DESeq2_time_course`	single cohort, ≥ 3 time points
	`Count_DESeq2_group_time_interaction`	≥ 2 groups × multiple time points
	`Count_DESeq2_ZINB`	counts dominated by zeros (>40 %), e.g. 16S
Continuous (limma)	`Continuous_limma_basic`	2 groups, no batch / time / pairing
	`Continuous_limma_with_batch`	2 groups + batch covariate
	`Continuous_limma_paired_or_repeated`	pre/post or matched samples
	`Continuous_limma_multi_group`	≥ 3 independent groups
	`Continuous_limma_time_course`	single cohort, ≥ 3 time points
	`Continuous_limma_group_time_interaction`	groups × time within subject
(decline)	`none_of_these`	survival / network inference / single-sample / non-omics — no forced fit

Interpreting results

Every successful run produces 5 plots and 2 result tables under pipeline_output/:

File	What it shows
`volcano_plot.png`	Each feature plotted by log2 fold-change (x) vs −log10 adjusted p-value (y). Top-right and top-left points are the significantly up- and down-regulated features.
`MA_plot.png`	Log2 fold-change (y) vs mean expression (x). Diagnostic for fold-change vs abundance bias.
`PCA_plot.png`	First two principal components of the normalized expression matrix, colored by group. Sanity check for class separation.
`top_DE_genes_heatmap.png`	Top 20 most-significant DE features as a heatmap of z-scored expression across samples.
`top20_DE_genes_barplot.png`	Top 20 features by absolute log2 fold-change as a barplot.
`results.csv`	Full DE table — `feature_id`, `log2FoldChange`, `lfcSE`, `stat`, `pvalue`, `padj`.
`significant_genes.csv`	Subset of `results.csv` filtered at `padj < 0.05` (or the family default).

For paired / time-course / interaction designs the results.csv schema is the same; only the underlying model and the contrast definition change. See statlens_report.md produced alongside the run for the exact model formula used.

Headless server: reaching `localhost:7860` from elsewhere

If statlens serve cannot open a browser (AutoDL, RunPod, Lambda, …), use one of these:

	command	works for
Public URL	`cloudflared tunnel --url http://localhost:7860`	any device, any network
SSH tunnel	`ssh -fNL 7860:localhost:7860 user@server` (run on your laptop)	quick local dev
curl only	`curl -X POST http://localhost:7860/api/run -F "context=..." -F "tsv=@data.tsv"`	scripting / no browser

Subcommands

statlens serve                                   # main entry point
statlens download                                # pre-fetch the LoRA only (~1 GB)
statlens info                                    # show GPU / cache / paths
statlens classify --tsv DATA --context CTX --out DIR
                                                 # one-shot CLI mode (no browser)
statlens --version

statlens classify runs both LLM stages back-to-back without a review pause — useful for batch processing.

API endpoints

route	method	purpose
`/`	GET	serve the web UI
`/api/extract`	POST (multipart: `tsv`, `context`)	stage 1 — return a SchemaSummary
`/api/run_pipeline`	POST (form: `run_id`, `schema` JSON)	stage 3 — pick label + run pipeline
`/api/run`	POST (multipart: `tsv`, `context`)	legacy single-shot path (no review)
`/api/artifact/{run_id}/{filename}`	GET	fetch a single PNG/CSV
`/api/zip/{run_id}`	GET	fetch the packaged result
`/api/csv_preview/{run_id}/{filename}`	GET	first N rows of a result CSV as JSON

Models

component	source	size	license
base	`Qwen/Qwen3-32B` (BF16)	64 GB	Apache-2.0
LoRA	`domizzz2025/statLens`	1 GB	Apache-2.0

The LoRA is auto-downloaded on first run; the base model is yours to provide.

Training (LoRA only)

The classifier LoRA was fine-tuned on top of Qwen/Qwen3-32B with LLaMA-Factory:


Adapter rank / alpha	32 / 64
Target modules	q / k / v / o / up / down / gate proj
Optimizer · schedule	AdamW · cosine, 3 epochs (~636 steps)
Training data	curated study descriptions covering the 13 DEA scenarios

Loss curves and trainer state live under qwen3_32b_lora_v1/: training_loss.png, training_eval_loss.png, trainer_state.json, trainer_log.jsonl.

Generalization to real-world TSVs with non-canonical column conventions is recovered via the user-editable schema layer at run time.

Troubleshooting

symptom	fix
`Network is unreachable` during `pip install` (mainland China)	`export HF_ENDPOINT=https://hf-mirror.com` and retry
`LocalEntryNotFoundError` when LoRA auto-fetches	same as above — set `HF_ENDPOINT` before `statlens serve`
`no base model found`	put Qwen3-32B in one of the auto-search paths, or pass `--base-model PATH`
`CUDA out of memory` on startup	a previous `statlens serve` is still holding GPU memory: `pkill -9 -f statlens; nvidia-smi --query-compute-apps=pid --format=csv,noheader \| xargs -r kill -9`
`address already in use`	a previous instance is bound — kill it first
LLM never becomes ready	tail `~/.cache/statlens/llm.log` to see the LLaMA-Factory error
schema field looks wrong in the browser	edit it directly; the LLM picks the label from your edits, not the original extraction
`Schema specified <field>=…, but no such column in TSV`	a column-name field in the schema doesn't match your TSV. Either fix the field, or clear it to use auto-detection.
`Schema reference_level=… not in observed levels`	`reference_level` doesn't match any actual group level. Set it to one of the values shown in `group_levels`, or clear it.
`No feature columns found. Expected one of these prefixes: …`	rename your feature columns to start with `gene_` / `asv_` / `prot_` / `metab_` / `otu_` / `feat_`.
`upload exceeds N MB limit`	raise the cap with `STATLENS_MAX_UPLOAD_MB=500 statlens serve` (default 100 MB).

Source · License

Wheel + LoRA + source : https://huggingface.co/domizzz2025/statLens
License : Apache-2.0

Citation

If you use statLens in academic work, please cite:

@software{statlens_2025,
  title  = {statLens: A self-hosted DEA method selector backed by a Qwen3-32B + LoRA classifier},
  author = {statLens contributors},
  year   = {2025},
  url    = {https://huggingface.co/domizzz2025/statLens},
  note   = {Apache-2.0},
}

A peer-reviewed manuscript is in preparation.

Downloads last month: -

Model tree for domizzz2025/statLens

Base model

Qwen/Qwen3-32B

Adapter

(283)

this model

domizzz2025
/

statLens

statLens

Key features

Table of contents

Prerequisite

Quick start

Where statlens looks for the base model

Screenshots

Hardware

TSV format

Output

Use cases

Pipeline labels

Interpreting results

Headless server: reaching `localhost:7860` from elsewhere

Subcommands

API endpoints

Models

Training (LoRA only)

Troubleshooting

Source · License

Citation

Model tree for domizzz2025/statLens

statLens

Key features

Table of contents

Prerequisite

Quick start

Where statlens looks for the base model

Screenshots

Hardware

TSV format

Output

Use cases

Pipeline labels

Interpreting results

Headless server: reaching localhost:7860 from elsewhere

Subcommands

API endpoints

Models

Training (LoRA only)

Troubleshooting

Source · License

Citation

Model tree for domizzz2025/statLens

Headless server: reaching `localhost:7860` from elsewhere