📡 LWM-Spectro: Large Wireless Model for Spectrograms
🚀 Click here to try the Interactive Demo!
Space: https://huggingface.co/spaces/wi-lab/LWM-Spectro
🚀 This model repo provides
- MoE checkpoint:
moe_checkpoint.pth(router + experts + classifier; top-1 gating) - Per-tech experts:
experts/* - SNR-balanced demo tensors:
demo_data.pt/demo_data_moe.pt
(float16 spectrograms + precomputed embeddings used by the Space and plotting scripts)
This repo is intentionally drop-in and lightweight for inference, visualization, and reproducing the Space plots.
For the broader research / training workflow (pretraining + task scripts + MoE training), see README_code.md.
📡 What is LWM-Spectro? (the “why”)
Modern wireless learning models often work well on a single protocol / dataset / SNR range, but can break when the communication standard, propagation environment, or label budget changes.
LWM-Spectro is a foundation model for received baseband I/Q signals represented as time–frequency spectrograms. The core goal is to learn a representation that:
- captures physical-layer and channel characteristics from raw received waveforms,
- remains transferable across heterogeneous wireless standards (LTE / WiFi / 5G),
- enables strong downstream performance even in few-shot / limited-label settings.
🧩 What is the key idea? (I/Q → spectrogram → representation)
Received I/Q signals inherently encode both transmitter-side structure (e.g., modulation/coding) and channel-side structure (e.g., multipath, Doppler, SNR).
By converting the received waveform into an STFT spectrogram, we get a structured time–frequency “image” where both modulation signatures and channel dynamics appear as learnable patterns.
LWM-Spectro is pretrained on large-scale unlabeled (or weakly labeled) spectrogram data so that downstream tasks can reuse these learned features instead of training from scratch.
🏗️ How is LWM-Spectro built? (high-level)
At a high level, LWM-Spectro combines:
- Transformer encoders over spectrogram patches (foundation-model backbone),
- self-supervised masked spectrogram modeling to learn general spectro‑temporal structure,
- contrastive learning to encourage discriminative embeddings,
- a Mixture-of-Experts (MoE) design with protocol-specialized experts and a lightweight router, so the model can stay protocol-aware without training a separate model per standard.
In practice, the released checkpoint is packaged behind an inference wrapper (MoEPredictor) that returns:
- a predicted label (e.g., LTE / WiFi / 5G),
- confidence,
- optional routing weights (which expert(s) were used).
🧑🏫 How are the per‑protocol expert models trained?
LWM-Spectro uses protocol-specialized experts (e.g., WiFi / LTE / 5G) inside an MoE. Conceptually, training is two-stage:
Stage 1 — Expert pretraining (per protocol)
Each expert encoder (E_k) is pretrained on its own protocol’s spectrogram dataset using a self-supervised objective (masked spectrogram modeling) and optionally a contrastive objective to shape the embedding space.Stage 2 — Router (and head) training (multi-protocol)
Experts are frozen, then a lightweight router (R) is trained on a balanced mix of WiFi/LTE/5G to select which expert to use.
In the paper this is described as top‑1 routing (activate one expert per input) for efficiency. The implementation in this repo exposes top‑k routing for experimentation, but the released “drop-in” artifacts are used in a top‑1 style at inference time.
Expert backbone (what an “expert” is)
From the paper configuration, each expert is a Transformer encoder over spectrogram patches:
- Patch/tokenization: non-overlapping 4×4 patches → sequence length 1024
- Backbone: 12-layer Transformer, (d=128), 8 attention heads
- Masked modeling: mask ratio 70%
(Implementation pointers: expert pretraining scripts live in pretraining/train_lwm_spectro*.py and the backbone definition is in pretraining/pretrained_model.py.)
Router + aggregation (how experts are selected)
The MoE combines a small router with the experts:
- Router predicts a probability over experts (WiFi/LTE/5G).
- Top‑1 selects the single most likely expert (fast inference), while a “weighted” mixture is the general form.
Implementation note: the paper describes a small Transformer router, while the repo’s training implementation uses a lightweight CNN router for practicality (see mixture/train_embedding_router.py, RouterNet).
What is actually stored in this repo?
experts/: per‑protocol expert weights (protocol-specialized encoders)moe_checkpoint.pth: a packed checkpoint used byMoEPredictor(router + experts + classifier/head)
🎯 What can you do with it? (the “so what”)
LWM-Spectro representations transfer effectively to downstream tasks such as:
- Modulation classification
- Joint SNR / mobility recognition
The repo also includes scripts to reproduce common qualitative sanity checks, e.g., t‑SNE comparisons where raw spectrograms overlap heavily while learned embeddings form more coherent clusters.
📊 Pretraining data (conceptual summary)
The LWM-Spectro paper pipeline synthesizes diverse I/Q spectrograms across:
- Wireless standards: LTE / WiFi / 5G
- Propagation environments: multiple site-specific city scenarios (ray-tracing–derived multipath)
- Conditions: SNR sweeps + mobility regimes
The intent is to expose the model to realistic variability (protocol + channel + motion) so the learned representation generalizes.
🔍 Demo tensor spec
- Spectrogram shape: 128×128 (stored as float16 tensors under
sample["data"]) - Technologies: LTE, WiFi, 5G
- SNR:
{SNR-5dB, SNR0dB, SNR5dB, SNR10dB, SNR15dB, SNR20dB, SNR25dB} - Per-tech sample count: 3,500 (7×500)
- Per-sample fields:
{tech, snr, mod, mob, data, embedding, tech_embedding, moe_embedding, tech_embedding_dim}
✅ Advantages of using LWM-Spectro
- General-purpose representation: designed to transfer across standards and conditions
- Few-shot friendly: useful when labels are scarce
- Drop-in: no training required to run inference or plots
- Balanced demo: consistent legend ordering and sampling across SNRs for comparable plots
- Hub-native: supports
hf_hub_download()for fetching artifacts without cloning
🛠 How to Use
⚡ Quickstart (copy/paste)
A) Minimal Hub inference (no clone)
import torch
from huggingface_hub import hf_hub_download
from mixture.train_embedding_router import MoEPredictor
repo_id = "wi-lab/lwm-spectro"
ckpt_path = hf_hub_download(repo_id=repo_id, filename="moe_checkpoint.pth")
predictor = MoEPredictor.from_checkpoint(ckpt_path)
# One spectrogram: [H, W] = [128, 128]
spec = torch.rand(128, 128)
out = predictor.predict(spec, return_probabilities=True, return_routing=True)
print(out["label"], out["confidence"])
print("routing:", out.get("routing"))
B) Use the bundled demo tensors as input
import torch
from huggingface_hub import hf_hub_download
from mixture.train_embedding_router import MoEPredictor
repo_id = "wi-lab/lwm-spectro"
ckpt_path = hf_hub_download(repo_id=repo_id, filename="moe_checkpoint.pth")
demo_path = hf_hub_download(repo_id=repo_id, filename="demo_data.pt")
predictor = MoEPredictor.from_checkpoint(ckpt_path)
samples = torch.load(demo_path, weights_only=False)
# Each sample is a dict; the spectrogram lives under sample["data"] (float16 128x128).
spec0 = samples[0]["data"].float() # [128, 128]
out0 = predictor.predict(spec0, return_probabilities=False, return_routing=True)
print(out0["label"], out0["confidence"], out0.get("routing"))
# Batch inference: [B, H, W]
batch = torch.stack([s["data"].float() for s in samples[:32]], dim=0)
outb = predictor.predict(batch)
print(outb["labels"][:5], outb["confidence"][:5])
1) Install Conda
Use Miniconda/Anaconda (any recent version is fine).
2) Create a new environment
conda create -n lwm_spectro python=3.10 -y
conda activate lwm_spectro
3) Install required packages
pip install -r requirements.txt
4) Get the repo + large files (choose one)
Option A — Clone with Git LFS (recommended if you want everything locally)
git lfs install
git clone https://huggingface.co/wi-lab/lwm-spectro
cd lwm-spectro
git lfs pull
Option B — Download artifacts via Python (no clone)
from huggingface_hub import hf_hub_download
repo_id = "wi-lab/lwm-spectro"
ckpt = hf_hub_download(repo_id=repo_id, filename="moe_checkpoint.pth")
demo = hf_hub_download(repo_id=repo_id, filename="demo_data.pt")
5) Minimal inference sanity check
python hf_minimal_inference.py
This downloads moe_checkpoint.pth (if needed), runs a random 128×128 tensor through the MoE predictor, and prints the predicted label + confidence.
🧩 Predictor API (what goes in / what comes out)
The main inference entry point is:
mixture.train_embedding_router.MoEPredictor.from_checkpoint(checkpoint_path)MoEPredictor.predict(spectrogram, return_probabilities=False, return_routing=False)
Inputs
spectrogram:torch.Tensorof shape[H, W]or[B, H, W]
(the predictor performs per-sample normalization internally)
Outputs
The returned dict includes:
predicted_class:int(single) orlist[int](batch)confidence:float(single) orlist[float](batch)label/labels: human-readable label(s) when label mapping is availableprobabilities(optional): full softmax distribution ifreturn_probabilities=Truerouting(optional): top‑k expert routing weights ifreturn_routing=True
6) Load the bundled demo tensors
import torch
from huggingface_hub import hf_hub_download
repo_id = "wi-lab/lwm-spectro"
local = hf_hub_download(repo_id=repo_id, filename="demo_data.pt")
samples = torch.load(local, weights_only=False)
print(len(samples), samples[0].keys())
7) Reproduce t‑SNE plots (CLI parity with the Space)
python -m plot.plot_tsne --LTE --sampling-mode reservoir --label-field snr
python -m plot.plot_tsne --WiFi --sampling-mode reservoir --label-field modulation
python -m plot.plot_tsne --5G --sampling-mode reservoir --label-field modulation
Each command loads 3,500 samples (7×500) per technology when using the bundled demo tensors.
8) Explore training / research code (optional)
This repo includes scripts used in the research workflow:
- Pretraining:
pretraining/train_lwm_spectro*.py - Task 1:
task1/train_mcs_models.py - Task 2:
task2/train_joint_snr_mobility.py - MoE:
MoE/*andmixture/*
For a structured overview, see README_code.md.
🗂 Repo layout (where to look)
- Inference
hf_minimal_inference.py: minimal Hub inference demo (random tensor)mixture/run_moe_inference.py: CLI to run Task1+Task2 MoE checkpoints on real spectrogram pickle inputs
- Artifacts
moe_checkpoint.pth: main MoE checkpoint (Git LFS)experts/: per-tech expert weights (Git LFS)demo_data.pt,demo_data_moe.pt: SNR-balanced demo samples (Git LFS)
- Plots
plot/plot_tsne.py: raw-vs-embedding t‑SNE with SNR-balanced sampling, plus optional clustering metrics
- Training
pretraining/: pretraining scripts for the spectrogram backbonetask1/,task2/: downstream training/eval scriptsMoE/,mixture/: MoE training, baselines, and inference utilities
Please cite the following paper if you use the LWM model or any modified parts:
@misc{kim2026lwmspectrofoundationmodelwireless,
title={LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms},
author={Namhyun Kim and Sadjad Alikhani and Ahmed Alkhateeb},
year={2026},
eprint={2601.08780},
archivePrefix={arXiv},
primaryClass={cs.IT},
url={https://arxiv.org/abs/2601.08780},
}