YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Tempo-SNN v2: Complexity-Aware Physics-Based RL PIM Controller

A fully refactored, literature-grounded routing framework for heterogeneous computing (PIM/ReRAM, CPU, GPU) with RL-based task scheduling and polyhedral compilation-aware complexity profiling.

Structure

.
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ profiler.py          # Task complexity profiler (M-1..M-5 fixes)
β”‚   β”œβ”€β”€ physics.py           # ReRAM/STT-MRAM physics model (RC thermal, Arrhenius)
β”‚   β”œβ”€β”€ rl_env.py            # RL environment (state normalization, retention)
β”‚   β”œβ”€β”€ rl_agent.py          # Dueling DQN + Noisy Nets + SAC variant
β”‚   β”œβ”€β”€ controller.py        # Seamless PIM/CPU/GPU controller
β”‚   β”œβ”€β”€ router_static.py     # Universal parser + hybrid decision tree
β”‚   β”œβ”€β”€ polyhedral.py        # Polyhedral AI estimator + aware router
β”‚   β”œβ”€β”€ baselines.py         # READYS, EdgeSched-DQN, threshold baselines
β”‚   β”œβ”€β”€ training.py          # Training loop + sample efficiency metrics
β”‚   β”œβ”€β”€ plots.py             # Monitoring + interpretability plots
β”‚   └── benchmarks/
β”‚       └── mlperf_tiny.py   # MLPerf Tiny model stubs + harness
β”œβ”€β”€ tests/
β”‚   └── test_profiler.py     # 62 comprehensive tests
β”œβ”€β”€ scripts/
β”‚   └── train.py             # CLI entry point
└── test_minimal.py          # Quick smoke test

Critical Bug Fixes

Bug Location Fix
B-1: DemoSNN dimension mismatch DemoSNN.__init__ FC input corrected for 16×16→pool→8×8
B-2: Profiler masks all errors TaskComplexityProfiler._analyze_layers Warning on failure + targeted shape fallback
B-3: Broken boolean in PolyhedralAwareRouter.route router_static.py Explicit None guards, proper fallback to COMPLEXITY_LIBRARY[0]
B-4: SNN init_hidden=True + manual init_leaky() Forward pass Removed in DemoSNN; init_hidden handles state internally
B-5: LR scheduler not saved/loaded Agent.save/load Added scheduler.state_dict() to checkpoint

Modeling Fixes (M-1..M-5)

Issue Fix
M-1: Activation memory = max(single tensor) ActivationMemoryTracker with live-range peak summation
M-2: SNN FLOPs ignore neuronal dynamics lif_flops = num_lif_neurons * timesteps * 4 added per-layer
M-3: Alias collisions silent overwrite ValueError on duplicate alias during library build
M-4: timesteps normalized inconsistently Single MAX_TIMESTEPS_REF = 100 used everywhere
M-5: PIM always applies sparsity skip pim_supports_sparse flag; only skips if hardware supports it

Performance Optimizations (O-1..O-5)

Opt Implementation
O-1: CPU→GPU tensor transfer PrioritizedReplayBuffer stores on CPU, batches directly to device tensors
O-2: Profile caching _profile_cache keyed by (id(model), input_shape, timesteps)
O-3: CosineAnnealingLR Replaces brittle StepLR; decays over full training horizon
O-4: RunningMeanStd NormalizationStats Welford-style online normalization (OpenAI Baselines)
O-5: N-step returns (3-step) NStepBuffer with discounted multi-step returns in store_transition

Literature-Grounded Additions

Category 1 β€” Polyhedral Compilation

  • PolyhedralAIEstimator with loop fusion and cache tiling models (PolyMage/Pluto-inspired)
  • PolyhedralAwareRouter computes post-compile AI and may change routing decision

Category 2 β€” STT-MRAM Thermal Reliability

  • RC thermal network (2-node: junction + case) with Zhang et al. parameters
  • Arrhenius retention time Ο„ = Ο„β‚€Β·exp(Ea_ret/kT) β€” emergency migration if Ο„ < 1ms
  • Temperature-dependent endurance N_end(T) = Nβ‚€Β·exp(–Ea_end/k(1/T – 1/T_ref))
  • Read disturb tracking Ξ²(T) per read cycle

Category 3 β€” Thermal-Aware Scheduling

  • Retention failure penalty in reward: -1.0 if retention < inference duration
  • DVFS-style frequency scaling placeholder in SAC variant

Category 4 β€” RL-Based Scheduling

  • Noisy Nets (NoisyLinear) for parametric exploration (Fortunato et al. 2017)
  • 3-step n-step returns for sample efficiency
  • Double Q-learning in all DQN variants (target net + policy net argmax)
  • SAC agent variant for continuous action space ablation
  • READYS baseline (deadline slack / execution time greedy)
  • EdgeSched-DQN flat baseline (no dueling/PER/hierarchy)

Category 5 β€” MLPerf Tiny Benchmarking

  • DS-CNN (Keyword Spotting), MobileNetV1 (Visual Wake Words)
  • FC Autoencoder (Anomaly Detection), ResNet-like (Image Classification)
  • PIMAccuracyModel degrades accuracy by fault density Γ— V_th deviation Γ— temperature
  • Harness runs 5–100 inferences, reports accuracy/latency/energy per task

Test Results

62 tests covering profiler, physics, RL env/agent, controller, router, baselines, MLPerf Tiny, and edge cases. All pass in <5s.

Run:

cd tempo-snn-v2
python test_minimal.py        # 6 core checks (<1s)
python tests/test_profiler.py # 62 tests (physics + profiler fast; RL ~120s)

Usage

from router_static import ComplexityRouter, HardwareState

router = ComplexityRouter()
report = router.route("FFT", hw=HardwareState.from_temperature(T=45.0))
# report.target -> "GPU", report.tier_used -> "TIER2_SKLEARN"

References

Key citations baked into the code:

  • Pluto (Bondhugula et al., PLDI 2008): affine scheduling baseline
  • PolyMage (Mullapudi et al., ASPLOS 2015): polyhedral image pipeline optimization
  • Zhang et al. (IEEE Trans. Nanotech 2018): STT-MRAM compact thermal model
  • Mnih et al. (Nature 2015): DQN foundation
  • Wang et al. (ICML 2016): Dueling network architectures
  • Fortunato et al. (2017): Noisy Networks for exploration
  • Banbury et al. (arXiv 2021): MLPerf Tiny Benchmark
  • Grinsztajn et al. (IEEE Cluster 2021): READYS heterogeneous scheduling
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support