lwm-spectro / MoE /README.md
stevekor's picture
Sync local development code into HF repo
eaaeb1b

Mixture of Experts (MoE) for Wireless Spectrogram Classification

This directory contains the implementation of a Mixture of Experts (MoE) model for wireless spectrogram classification, along with baseline models for comparison.

Models

1. MoE (Mixture of Experts)

  • Architecture: Router network + Multiple expert LWM backbones + Shared classifier
  • Experts: 6 experts (LTE/WiFi/5G Γ— base/mobility)
  • Routing: Top-k expert selection (k=2 by default)
  • Already Trained: βœ… MoE/runs/embedding_router/moe_checkpoint.pth
  • Test Performance: F1 = 91.08%, Accuracy = 91.12%

2. Single CNN Baseline

  • Architecture: 4-layer CNN + classifier
  • Purpose: Simple baseline without expert specialization
  • Status: Needs training

3. ImageNet Pretrained Baseline

  • Architecture: ResNet18 (pretrained on ImageNet) + classifier
  • Purpose: Transfer learning baseline
  • Status: Needs training

Quick Start

1. Train Baseline Models

Train both baseline models at once:

cd /workspace/lwm-spectro
./MoE/train_baselines.sh

Or train individually:

Single CNN Baseline:

python MoE/train_embedding_router.py \
    --baseline single \
    --data-root spectrograms \
    --mobilities pedestrian vehicular \
    --task-epochs 25 \
    --batch-size 256 \
    --output-dir MoE/runs/baseline_cnn

ImageNet Pretrained Baseline:

python MoE/train_embedding_router.py \
    --baseline imagenet \
    --data-root spectrograms \
    --mobilities pedestrian vehicular \
    --task-epochs 25 \
    --batch-size 256 \
    --output-dir MoE/runs/baseline_imagenet

Options:

  • --freeze-backbone: Freeze ResNet18 weights (only train projection + classifier)
  • --patience 10: Early stopping patience

2. Compare Models

After training baselines, compare all three models:

python MoE/compare_baselines.py \
    --moe-checkpoint MoE/runs/embedding_router/moe_checkpoint.pth \
    --single-checkpoint MoE/runs/baseline_cnn/moe_checkpoint.pth \
    --imagenet-checkpoint MoE/runs/baseline_imagenet/moe_checkpoint.pth \
    --data-root spectrograms \
    --output-dir MoE/runs/comparison

This will generate:

  • comparison_results.json: Detailed metrics (accuracy, F1, FPS, latency, parameters)
  • comparison_chart.png: Visual comparison charts

3. Run Inference on New Spectrograms

Use the trained MoE model to classify spectrograms:

python MoE/run_moe_inference.py \
    --task1-checkpoint MoE/runs/task1_moe/moe_checkpoint.pth \
    --task2-checkpoint MoE/runs/embedding_router/moe_checkpoint.pth \
    --input spectrograms/city_1_losangeles/LTE/QPSK/rate3-4/SNR10dB/pedestrian/fft128/specs_0000.pkl \
    --index 0 \
    --show-routing

File Structure

MoE/
β”œβ”€β”€ train_embedding_router.py   # Main training script (MoE + baselines)
β”œβ”€β”€ train_baselines.sh          # Batch script to train all baselines
β”œβ”€β”€ compare_baselines.py        # Compare MoE vs baselines
β”œβ”€β”€ run_moe_inference.py        # Inference script for new samples
β”œβ”€β”€ README.md                   # This file
└── runs/
    β”œβ”€β”€ embedding_router/       # MoE model (already trained)
    β”œβ”€β”€ baseline_cnn/           # Single CNN baseline (train needed)
    β”œβ”€β”€ baseline_imagenet/      # ImageNet baseline (train needed)
    └── comparison/             # Comparison results

Training Details

MoE Training (Already Complete)

  • Task: SNR + Mobility classification
  • Experts: 6 LWM-based experts
  • Router warmup: 5 epochs
  • Joint training: 25 epochs
  • Data: LTE/WiFi/5G, pedestrian/vehicular
  • Performance: 91.08% F1 score

Baseline Training

  • Task: Same as MoE (SNR + Mobility)
  • Epochs: 25 (with early stopping)
  • Learning rate: 2e-3 with ReduceLROnPlateau
  • Batch size: 256
  • Data augmentation: Per-sample normalization

Fine-Tuning & Limited-Data Experiments

To evaluate under a data-limited setting while keeping the pretrained MoE:

python MoE/train_embedding_router.py \
    --resume-checkpoint MoE/runs/embedding_router/moe_checkpoint.pth \
    --data-root spectrograms \
    --cities city_4_phoenix \
    --max-samples-per-class 50 \
    --task-epochs 5 \
    --batch-size 128 \
    --num-workers 2 \
    --seed 42 \
    --output-dir MoE/runs/embedding_router_finetune_50

The router/classifier weights are restored from the checkpoint and the warm-up stage is skipped automatically, so only the joint fine-tuning epochs run on the sampled subset.

Retrain baselines from scratch on the identical subset (match the filters and --seed):

COMMON_FLAGS="--data-root spectrograms --cities city_4_phoenix --max-samples-per-class 50 --task-epochs 25 --batch-size 128 --num-workers 2 --seed 42 --output-dir"

python MoE/train_embedding_router.py \
    --baseline single \
    $COMMON_FLAGS MoE/runs/baseline_cnn_50

python MoE/train_embedding_router.py \
    --baseline imagenet \
    $COMMON_FLAGS MoE/runs/baseline_imagenet_50

Then compare using MoE/compare_baselines.py with the new checkpoints.

Expected Results

Model Accuracy F1 Score FPS Parameters
MoE 91.12% 91.08% ~TBD ~TBD
Single CNN ~TBD ~TBD ~TBD ~200K
ImageNet (ResNet18) ~TBD ~TBD ~TBD ~11M

Customization

Training on Different Tasks

Modulation Classification:

python MoE/train_embedding_router.py \
    --baseline single \
    --task modulation \
    --output-dir MoE/runs/baseline_cnn_modulation

Using Different Communication Standards

python MoE/train_embedding_router.py \
    --baseline imagenet \
    --comm-types LTE WiFi \
    --output-dir MoE/runs/baseline_lte_wifi

Custom Data Splits

python MoE/train_embedding_router.py \
    --baseline single \
    --train-ratio 0.7 \
    --val-ratio 0.15 \
    --max-samples-per-class 1000

Requirements

  • Python 3.8+
  • PyTorch 1.12+
  • torchvision (for ImageNet baseline)
  • scikit-learn (for metrics)
  • matplotlib (for plotting)
  • tqdm (optional, for progress bars)

Notes

  • MoE is already trained - no need to retrain unless you want to experiment
  • Baseline models need to be trained before comparison
  • All models use the same test set for fair comparison
  • Training takes ~1-2 hours per baseline on a modern GPU
  • Inference is fast (~1ms/sample on GPU)

Citation

If you use this code, please cite:

[Your paper citation here]