📡 LWM-Spectro: Large Wireless Model for Spectrograms

🚀 Click here to try the Interactive Demo!
Space: https://huggingface.co/spaces/wi-lab/LWM-Spectro

🚀 This model repo provides

MoE checkpoint: moe_checkpoint.pth (router + experts + classifier; top-1 gating)
Per-tech experts: experts/*
SNR-balanced demo tensors: demo_data.pt / demo_data_moe.pt
(float16 spectrograms + precomputed embeddings used by the Space and plotting scripts)

This repo is intentionally drop-in and lightweight for inference, visualization, and reproducing the Space plots.
For the broader research / training workflow (pretraining + task scripts + MoE training), see README_code.md.

📡 What is LWM-Spectro? (the “why”)

Modern wireless learning models often work well on a single protocol / dataset / SNR range, but can break when the communication standard, propagation environment, or label budget changes.

LWM-Spectro is a foundation model for received baseband I/Q signals represented as time–frequency spectrograms. The core goal is to learn a representation that:

captures physical-layer and channel characteristics from raw received waveforms,
remains transferable across heterogeneous wireless standards (LTE / WiFi / 5G),
enables strong downstream performance even in few-shot / limited-label settings.

🧩 What is the key idea? (I/Q → spectrogram → representation)

Received I/Q signals inherently encode both transmitter-side structure (e.g., modulation/coding) and channel-side structure (e.g., multipath, Doppler, SNR).
By converting the received waveform into an STFT spectrogram, we get a structured time–frequency “image” where both modulation signatures and channel dynamics appear as learnable patterns.

LWM-Spectro is pretrained on large-scale unlabeled (or weakly labeled) spectrogram data so that downstream tasks can reuse these learned features instead of training from scratch.

🏗️ How is LWM-Spectro built? (high-level)

At a high level, LWM-Spectro combines:

Transformer encoders over spectrogram patches (foundation-model backbone),
self-supervised masked spectrogram modeling to learn general spectro‑temporal structure,
contrastive learning to encourage discriminative embeddings,
a Mixture-of-Experts (MoE) design with protocol-specialized experts and a lightweight router, so the model can stay protocol-aware without training a separate model per standard.

In practice, the released checkpoint is packaged behind an inference wrapper (MoEPredictor) that returns:

a predicted label (e.g., LTE / WiFi / 5G),
confidence,
optional routing weights (which expert(s) were used).

🧑‍🏫 How are the per‑protocol expert models trained?

LWM-Spectro uses protocol-specialized experts (e.g., WiFi / LTE / 5G) inside an MoE. Conceptually, training is two-stage:

Stage 1 — Expert pretraining (per protocol)
Each expert encoder (E_k) is pretrained on its own protocol’s spectrogram dataset using a self-supervised objective (masked spectrogram modeling) and optionally a contrastive objective to shape the embedding space.
Stage 2 — Router (and head) training (multi-protocol)
Experts are frozen, then a lightweight router (R) is trained on a balanced mix of WiFi/LTE/5G to select which expert to use.
In the paper this is described as top‑1 routing (activate one expert per input) for efficiency. The implementation in this repo exposes top‑k routing for experimentation, but the released “drop-in” artifacts are used in a top‑1 style at inference time.

Expert backbone (what an “expert” is)

From the paper configuration, each expert is a Transformer encoder over spectrogram patches:

Patch/tokenization: non-overlapping 4×4 patches → sequence length 1024
Backbone: 12-layer Transformer, (d=128), 8 attention heads
Masked modeling: mask ratio 70%

(Implementation pointers: expert pretraining scripts live in pretraining/train_lwm_spectro*.py and the backbone definition is in pretraining/pretrained_model.py.)

Router + aggregation (how experts are selected)

The MoE combines a small router with the experts:

Router predicts a probability over experts (WiFi/LTE/5G).
Top‑1 selects the single most likely expert (fast inference), while a “weighted” mixture is the general form.

Implementation note: the paper describes a small Transformer router, while the repo’s training implementation uses a lightweight CNN router for practicality (see mixture/train_embedding_router.py, RouterNet).

What is actually stored in this repo?

experts/: per‑protocol expert weights (protocol-specialized encoders)
moe_checkpoint.pth: a packed checkpoint used by MoEPredictor (router + experts + classifier/head)

🎯 What can you do with it? (the “so what”)

LWM-Spectro representations transfer effectively to downstream tasks such as:

Modulation classification
Joint SNR / mobility recognition

The repo also includes scripts to reproduce common qualitative sanity checks, e.g., t‑SNE comparisons where raw spectrograms overlap heavily while learned embeddings form more coherent clusters.

📊 Pretraining data (conceptual summary)

The LWM-Spectro paper pipeline synthesizes diverse I/Q spectrograms across:

Wireless standards: LTE / WiFi / 5G
Propagation environments: multiple site-specific city scenarios (ray-tracing–derived multipath)
Conditions: SNR sweeps + mobility regimes

The intent is to expose the model to realistic variability (protocol + channel + motion) so the learned representation generalizes.

🔍 Demo tensor spec

Spectrogram shape: 128×128 (stored as float16 tensors under sample["data"])
Technologies: LTE, WiFi, 5G
SNR: {SNR-5dB, SNR0dB, SNR5dB, SNR10dB, SNR15dB, SNR20dB, SNR25dB}
Per-tech sample count: 3,500 (7×500)
Per-sample fields:
{tech, snr, mod, mob, data, embedding, tech_embedding, moe_embedding, tech_embedding_dim}

✅ Advantages of using LWM-Spectro

General-purpose representation: designed to transfer across standards and conditions
Few-shot friendly: useful when labels are scarce
Drop-in: no training required to run inference or plots
Balanced demo: consistent legend ordering and sampling across SNRs for comparable plots
Hub-native: supports hf_hub_download() for fetching artifacts without cloning

🛠 How to Use

⚡ Quickstart (copy/paste)

A) Minimal Hub inference (no clone)

import torch
from huggingface_hub import hf_hub_download

from mixture.train_embedding_router import MoEPredictor

repo_id = "wi-lab/lwm-spectro"
ckpt_path = hf_hub_download(repo_id=repo_id, filename="moe_checkpoint.pth")

predictor = MoEPredictor.from_checkpoint(ckpt_path)

# One spectrogram: [H, W] = [128, 128]
spec = torch.rand(128, 128)
out = predictor.predict(spec, return_probabilities=True, return_routing=True)
print(out["label"], out["confidence"])
print("routing:", out.get("routing"))

B) Use the bundled demo tensors as input

import torch
from huggingface_hub import hf_hub_download

from mixture.train_embedding_router import MoEPredictor

repo_id = "wi-lab/lwm-spectro"
ckpt_path = hf_hub_download(repo_id=repo_id, filename="moe_checkpoint.pth")
demo_path = hf_hub_download(repo_id=repo_id, filename="demo_data.pt")

predictor = MoEPredictor.from_checkpoint(ckpt_path)
samples = torch.load(demo_path, weights_only=False)

# Each sample is a dict; the spectrogram lives under sample["data"] (float16 128x128).
spec0 = samples[0]["data"].float()  # [128, 128]
out0 = predictor.predict(spec0, return_probabilities=False, return_routing=True)
print(out0["label"], out0["confidence"], out0.get("routing"))

# Batch inference: [B, H, W]
batch = torch.stack([s["data"].float() for s in samples[:32]], dim=0)
outb = predictor.predict(batch)
print(outb["labels"][:5], outb["confidence"][:5])

1) Install Conda

Use Miniconda/Anaconda (any recent version is fine).

2) Create a new environment

conda create -n lwm_spectro python=3.10 -y
conda activate lwm_spectro

3) Install required packages

pip install -r requirements.txt

4) Get the repo + large files (choose one)

Option A — Clone with Git LFS (recommended if you want everything locally)

git lfs install
git clone https://huggingface.co/wi-lab/lwm-spectro
cd lwm-spectro
git lfs pull

Option B — Download artifacts via Python (no clone)

from huggingface_hub import hf_hub_download

repo_id = "wi-lab/lwm-spectro"
ckpt = hf_hub_download(repo_id=repo_id, filename="moe_checkpoint.pth")
demo = hf_hub_download(repo_id=repo_id, filename="demo_data.pt")

5) Minimal inference sanity check

python hf_minimal_inference.py

This downloads moe_checkpoint.pth (if needed), runs a random 128×128 tensor through the MoE predictor, and prints the predicted label + confidence.

🧩 Predictor API (what goes in / what comes out)

The main inference entry point is:

mixture.train_embedding_router.MoEPredictor.from_checkpoint(checkpoint_path)
MoEPredictor.predict(spectrogram, return_probabilities=False, return_routing=False)

Inputs

spectrogram: torch.Tensor of shape [H, W] or [B, H, W]
(the predictor performs per-sample normalization internally)

Outputs

The returned dict includes:

predicted_class: int (single) or list[int] (batch)
confidence: float (single) or list[float] (batch)
label / labels: human-readable label(s) when label mapping is available
probabilities (optional): full softmax distribution if return_probabilities=True
routing (optional): top‑k expert routing weights if return_routing=True

6) Load the bundled demo tensors

import torch
from huggingface_hub import hf_hub_download

repo_id = "wi-lab/lwm-spectro"
local = hf_hub_download(repo_id=repo_id, filename="demo_data.pt")
samples = torch.load(local, weights_only=False)
print(len(samples), samples[0].keys())

7) Reproduce t‑SNE plots (CLI parity with the Space)

python -m plot.plot_tsne --LTE  --sampling-mode reservoir --label-field snr
python -m plot.plot_tsne --WiFi --sampling-mode reservoir --label-field modulation
python -m plot.plot_tsne --5G   --sampling-mode reservoir --label-field modulation

Each command loads 3,500 samples (7×500) per technology when using the bundled demo tensors.

8) Explore training / research code (optional)

This repo includes scripts used in the research workflow:

Pretraining: pretraining/train_lwm_spectro*.py
Task 1: task1/train_mcs_models.py
Task 2: task2/train_joint_snr_mobility.py
MoE: MoE/* and mixture/*

For a structured overview, see README_code.md.

🗂 Repo layout (where to look)

Inference
- hf_minimal_inference.py: minimal Hub inference demo (random tensor)
- mixture/run_moe_inference.py: CLI to run Task1+Task2 MoE checkpoints on real spectrogram pickle inputs
Artifacts
- moe_checkpoint.pth: main MoE checkpoint (Git LFS)
- experts/: per-tech expert weights (Git LFS)
- demo_data.pt, demo_data_moe.pt: SNR-balanced demo samples (Git LFS)
Plots
- plot/plot_tsne.py: raw-vs-embedding t‑SNE with SNR-balanced sampling, plus optional clustering metrics
Training
- pretraining/: pretraining scripts for the spectrogram backbone
- task1/, task2/: downstream training/eval scripts
- MoE/, mixture/: MoE training, baselines, and inference utilities

Please cite the following paper if you use the LWM model or any modified parts:

@misc{kim2026lwmspectrofoundationmodelwireless,
      title={LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms}, 
      author={Namhyun Kim and Sadjad Alikhani and Ahmed Alkhateeb},
      year={2026},
      eprint={2601.08780},
      archivePrefix={arXiv},
      primaryClass={cs.IT},
      url={https://arxiv.org/abs/2601.08780}, 
}

Downloads last month: 13

Space using wi-lab/lwm-spectro 1

Paper for wi-lab/lwm-spectro

LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms

Paper • 2601.08780 • Published Jan 13