Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeDFTS2: Simulating Deep Feature Transmission Over Packet Loss Channels
In edge-cloud collaborative intelligence (CI), an unreliable transmission channel exists in the information path of the AI model performing the inference. It is important to be able to simulate the performance of the CI system across an imperfect channel in order to understand system behavior and develop appropriate error control strategies. In this paper we present a simulation framework called DFTS2, which enables researchers to define the components of the CI system in TensorFlow~2, select a packet-based channel model with various parameters, and simulate system behavior under various channel conditions and error/loss control strategies. Using DFTS2, we also present the most comprehensive study to date of the packet loss concealment methods for collaborative image classification models.
Automated Attacker Synthesis for Distributed Protocols
Distributed protocols should be robust to both benign malfunction (e.g. packet loss or delay) and attacks (e.g. message replay) from internal or external adversaries. In this paper we take a formal approach to the automated synthesis of attackers, i.e. adversarial processes that can cause the protocol to malfunction. Specifically, given a formal threat model capturing the distributed protocol model and network topology, as well as the placement, goals, and interface (inputs and outputs) of potential attackers, we automatically synthesize an attacker. We formalize four attacker synthesis problems - across attackers that always succeed versus those that sometimes fail, and attackers that attack forever versus those that do not - and we propose algorithmic solutions to two of them. We report on a prototype implementation called KORG and its application to TCP as a case-study. Our experiments show that KORG can automatically generate well-known attacks for TCP within seconds or minutes.
PCA-Driven Adaptive Sensor Triage for Edge AI Inference
Multi-channel sensor networks in industrial IoT often exceed available bandwidth. We propose PCA-Triage, a streaming algorithm that converts incremental PCA loadings into proportional per-channel sampling rates under a bandwidth budget. PCA-Triage runs in O(wdk) time with zero trainable parameters (0.67 ms per decision). We evaluate on 7 benchmarks (8--82 channels) against 9 baselines. PCA-Triage is the best unsupervised method on 3 of 6 datasets at 50% bandwidth, winning 5 of 6 against every baseline with large effect sizes (r = 0.71--0.91). On TEP, it achieves F1 = 0.961 +/- 0.001 -- within 0.1% of full-data performance -- while maintaining F1 > 0.90 at 30% budget. Targeted extensions push F1 to 0.970. The algorithm is robust to packet loss and sensor noise (3.7--4.8% degradation under combined worst-case).
Blockchain Communication Vulnerabilities
Blockchains are diverse in the way they handle communications between their nodes to disseminate information, mitigate attacks, and agree on the next block. While security vulnerabilities have been identified, they rely on an attack custom-made for a specific blockchain communication protocol. To our knowledge, the vulnerabilities of multiple blockchain communication protocols to adversarial conditions have never been compared. In this paper, we compare empirically the vulnerabilities of the communication protocols of five modern in-production blockchains, Algorand, Aptos, Avalanche, Redbelly and Solana, when attacked in five different ways. We conclude that Algorand is vulnerable to packet loss attacks, Aptos is vulnerable to targeted load attacks and leader isolation attacks, Avalanche is vulnerable to transient failure attacks, Redbelly's performance is impacted by packet loss attacks and Solana is vulnerable to stopping attacks and leader isolation attacks. Our system is open source.
CSI-BERT2: A BERT-inspired Framework for Efficient CSI Prediction and Classification in Wireless Communication and Sensing
Channel state information (CSI) is a fundamental component in both wireless communication and sensing systems, enabling critical functions such as radio resource optimization and environmental perception. In wireless sensing, data scarcity and packet loss hinder efficient model training, while in wireless communication, high-dimensional CSI matrices and short coherent times caused by high mobility present challenges in CSI estimation.To address these issues, we propose a unified framework named CSI-BERT2 for CSI prediction and classification tasks. Building on CSI-BERT, we introduce a two-stage training method that first uses a mask language model (MLM) to enable the model to learn general feature extraction from scarce datasets in an unsupervised manner, followed by fine-tuning for specific downstream tasks. Specifically, we extend MLM into a mask prediction model (MPM), which efficiently addresses the CSI prediction task. We also introduce an adaptive re-weighting layer (ARL) to enhance subcarrier representation and a multi-layer perceptron (MLP) based temporal embedding module to mitigate permutation invariance issues in time-series CSI data. This significantly improves the CSI classification performance of the original CSI-BERT model. Extensive experiments on both real-world collected and simulated datasets demonstrate that CSI-BERT2 achieves state-of-the-art performance across all tasks. Our results further show that CSI-BERT2 generalizes effectively across varying sampling rates and robustly handles discontinuous CSI sequences caused by packet loss-challenges that conventional methods fail to address.
EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning
Distributed Mean Estimation (DME) is a central building block in federated learning, where clients send local gradients to a parameter server for averaging and updating the model. Due to communication constraints, clients often use lossy compression techniques to compress the gradients, resulting in estimation inaccuracies. DME is more challenging when clients have diverse network conditions, such as constrained communication budgets and packet losses. In such settings, DME techniques often incur a significant increase in the estimation error leading to degraded learning performance. In this work, we propose a robust DME technique named EDEN that naturally handles heterogeneous communication budgets and packet losses. We derive appealing theoretical guarantees for EDEN and evaluate it empirically. Our results demonstrate that EDEN consistently improves over state-of-the-art DME techniques.
STCTS: Generative Semantic Compression for Ultra-Low Bitrate Speech via Explicit Text-Prosody-Timbre Decomposition
Voice communication in bandwidth-constrained environments--maritime, satellite, and tactical networks--remains prohibitively expensive. Traditional codecs struggle below 1 kbps, while existing semantic approaches (STT-TTS) sacrifice prosody and speaker identity. We present STCTS, a generative semantic compression framework enabling natural voice communication at 80 bps. STCTS explicitly decomposes speech into linguistic content, prosodic expression, and speaker timbre, applying tailored compression: context-aware text encoding (70 bps), sparse prosody transmission via TTS interpolation (<14 bps at 0.1-1 Hz), and amortized speaker embedding. Evaluations on LibriSpeech demonstrate a 75x bitrate reduction versus Opus (6 kbps) and 12x versus EnCodec (1 kbps), while maintaining perceptual quality (NISQA MOS > 4.26), graceful degradation under packet loss and noise resilience. We also discover a bimodal quality distribution with prosody sampling rate: sparse and dense updates both achieve high quality, while mid-range rates degrade due to perceptual discontinuities--guiding optimal configuration design. Beyond efficiency, our modular architecture supports privacy-preserving encryption, human-interpretable transmission, and flexible deployment on edge devices, offering a robust solution for ultra-low bandwidth scenarios.
CHESTNUT: A QoS Dataset for Mobile Edge Environments
Quality of Service (QoS) is an important metric to measure the performance of network services. Nowadays, it is widely used in mobile edge environments to evaluate the quality of service when mobile devices request services from edge servers. QoS usually involves multiple dimensions, such as bandwidth, latency, jitter, and data packet loss rate. However, most existing QoS datasets, such as the common WS-Dream dataset, focus mainly on static QoS metrics of network services and ignore dynamic attributes such as time and geographic location. This means they should have detailed the mobile device's location at the time of the service request or the chronological order in which the request was made. However, these dynamic attributes are crucial for understanding and predicting the actual performance of network services, as QoS performance typically fluctuates with time and geographic location. To this end, we propose a novel dataset that accurately records temporal and geographic location information on quality of service during the collection process, aiming to provide more accurate and reliable data to support future QoS prediction in mobile edge environments.
Eloquent: A More Robust Transmission Scheme for LLM Token Streaming
To render each generated token in real-time for users, the Large Language Model (LLM) server generates tokens one by one and streams each token (or group of a few tokens) through the network to the user right after generation, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of later tokens even if the packets containing them arrive on time. With a measurement study, we show that current applications suffer from increased stalls under unstable networks. For this emerging token streaming problem in LLM Chatbots that differs from previous multimedia and text applications, we propose a novel transmission scheme, called Eloquent, which puts newly generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and, in the meantime, is independently rendered when received, avoiding the aforementioned stalls caused by missing packets. Through simulation under various networks, we show Eloquent reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the retransmission method commonly used by real chatbot applications and by 31.6% compared to the baseline packet duplication scheme. By tailoring Eloquent to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI.
Performance evaluation of conditional handover in 5G systems under fading scenario
To enhance the handover performance in fifth generation (5G) cellular systems, conditional handover (CHO) has been evolved as a promising solution. Unlike A3 based handover where handover execution is certain after receiving handover command from the serving access network, in CHO, handover execution is conditional on the RSRP measurements from both current and target access networks, as well as on mobility parameters such as preparation and execution offsets. Analytic evaluation of conditional handover performance is unprecedented in literature. In this work, handover performance of CHO has been carried out in terms of handover latency, handover packet loss and handover failure probability. A Markov model accounting the effect of different mobility parameters (e.g., execution offset, preparation offset, time-to-preparation and time-to-execution), UE velocity and channel fading characteristics; has been proposed to characterize handover failure. Results obtained from the analytic model has been validated against extensive simulation results. Our study reveal that optimal configuration of O_{exec}, O_{prep}, T_{exec} and T_{prep} is actually conditional on underlying UE velocity and fading characteristics. This study will be helpful for the mobile operators to choose appropriate thresholds of the mobility parameters under different channel condition and UE velocities.
Design and implementation of intelligent packet filtering in IoT microcontroller-based devices
Internet of Things (IoT) devices are increasingly pervasive and essential components in enabling new applications and services. However, their widespread use also exposes them to exploitable vulnerabilities and flaws that can lead to significant losses. In this context, ensuring robust cybersecurity measures is essential to protect IoT devices from malicious attacks. However, the current solutions that provide flexible policy specifications and higher security levels for IoT devices are scarce. To address this gap, we introduce T800, a low-resource packet filter that utilizes machine learning (ML) algorithms to classify packets in IoT devices. We present a detailed performance benchmarking framework and demonstrate T800's effectiveness on the ESP32 system-on-chip microcontroller and ESP-IDF framework. Our evaluation shows that T800 is an efficient solution that increases device computational capacity by excluding unsolicited malicious traffic from the processing pipeline. Additionally, T800 is adaptable to different systems and provides a well-documented performance evaluation strategy for security ML-based mechanisms on ESP32-based IoT systems. Our research contributes to improving the cybersecurity of resource-constrained IoT devices and provides a scalable, efficient solution that can be used to enhance the security of IoT systems.
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI
AI Video Chat emerges as a new paradigm for Real-time Communication (RTC), where one peer is not a human, but a Multimodal Large Language Model (MLLM). This makes interaction between humans and AI more intuitive, as if chatting face-to-face with a real person. However, this poses significant challenges to latency, because the MLLM inference takes up most of the response time, leaving very little time for video streaming. Due to network uncertainty and instability, transmission latency becomes a critical bottleneck preventing AI from being like a real person. To address this, we propose Artic, an AI-oriented Real-time Communication framework, exploring the network requirement shift from "humans watching video" to "AI understanding video". To reduce bitrate dramatically while maintaining MLLM accuracy, we propose Context-Aware Video Streaming that recognizes the importance of each video region for chat and allocates bitrate almost exclusively to chat-important regions. To avoid packet retransmission, we propose Loss-Resilient Adaptive Frame Rate that leverages previous frames to substitute for lost/delayed frames while avoiding bitrate waste. To evaluate the impact of video streaming quality on MLLM accuracy, we build the first benchmark, named Degraded Video Understanding Benchmark (DeViBench). Finally, we discuss some open questions and ongoing solutions for AI Video Chat.
Radio Map Estimation -- An Open Dataset with Directive Transmitter Antennas and Initial Experiments
Over the last years, several works have explored the application of deep learning algorithms to determine the large-scale signal fading (also referred to as ``path loss'') between transmitter and receiver pairs in urban communication networks. The central idea is to replace costly measurement campaigns, inaccurate statistical models or computationally expensive ray-tracing simulations by machine learning models which, once trained, produce accurate predictions almost instantly. Although the topic has attracted attention from many researchers, there are few open benchmark datasets and codebases that would allow everyone to test and compare the developed methods and algorithms. We take a step towards filling this gap by releasing a publicly available dataset of simulated path loss radio maps together with realistic city maps from real-world locations and aerial images from open datasources. Initial experiments regarding model architectures, input feature design and estimation of radio maps from aerial images are presented and the code is made available.
Cross-Layer Protocols for Multimedia Communications over Wireless Networks
In the last few years, the Internet throughput, usage and reliability have increased almost exponentially. The introduction of broadband wireless mobile ad hoc networks (MANETs) and cellular networks together with increased computational power have opened the door for a new breed of applications to be created, namely real-time multimedia applications. Delivering real-time multimedia traffic over a complex network like the Internet is a particularly challenging task since these applications have strict quality-of-service (QoS) requirements on bandwidth, delay, and delay jitter. Traditional Internet protocol (IP)-based best effort service is not able to meet these stringent requirements. The time-varying nature of wireless channels and resource constrained wireless devices make the problem even more difficult. To improve perceived media quality by end users over wireless Internet, QoS supports can be addressed in different layers, including application layer, transport layer and link layer. Cross layer design is a well-known approach to achieve this adaptation. In cross-layer design, the challenges from the physical wireless medium and the QoS-demands from the applications are taken into account so that the rate, power, and coding at the physical (PHY) layer can adapted to meet the requirements of the applications given the current channel and network conditions. A number of propositions for cross-layer designs exist in the literature. In this chapter, an extensive review has been made on these cross-layer architectures that combine the application-layer, transport layer and the link layer controls. Particularly, the issues like channel estimation techniques, adaptive controls at the application and link layers for energy efficiency, priority based scheduling, transmission rate control at the transport layer, and adaptive automatic repeat request (ARQ) are discussed in detail.
LLMcap: Large Language Model for Unsupervised PCAP Failure Detection
The integration of advanced technologies into telecommunication networks complicates troubleshooting, posing challenges for manual error identification in Packet Capture (PCAP) data. This manual approach, requiring substantial resources, becomes impractical at larger scales. Machine learning (ML) methods offer alternatives, but the scarcity of labeled data limits accuracy. In this study, we propose a self-supervised, large language model-based (LLMcap) method for PCAP failure detection. LLMcap leverages language-learning abilities and employs masked language modeling to learn grammar, context, and structure. Tested rigorously on various PCAPs, it demonstrates high accuracy despite the absence of labeled data during training, presenting a promising solution for efficient network analysis. Index Terms: Network troubleshooting, Packet Capture Analysis, Self-Supervised Learning, Large Language Model, Network Quality of Service, Network Performance.
Differentiated Services: an Experimental vs. Simulated Case Study
This paper aims to provide a proof of concept of the accuracy of simulations for advanced networking study. The particular target technology is the Differentiated Services (DiffServ) architecture. The method has been to apply experimental activities conducted in a real network to a simulation environment, to gather the same performance parameters and to compare results. A worthy re-engineering of the DiffServ module of the deployed software program has been carried out and significant contribution have been made to overcome the encountered limitations and to enrich its modeling capabilities. Final results give useful suggestions for a more critical approach to simulations targeted for advanced networking study.
RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks
Spiking Neural Networks (SNNs) as one of the biology-inspired models have received much attention recently. It can significantly reduce energy consumption since they quantize the real-valued membrane potentials to 0/1 spikes to transmit information thus the multiplications of activations and weights can be replaced by additions when implemented on hardware. However, this quantization mechanism will inevitably introduce quantization error, thus causing catastrophic information loss. To address the quantization error problem, we propose a regularizing membrane potential loss (RMP-Loss) to adjust the distribution which is directly related to quantization error to a range close to the spikes. Our method is extremely simple to implement and straightforward to train an SNN. Furthermore, it is shown to consistently outperform previous state-of-the-art methods over different network architectures and datasets.
Free-space model for a balloon-based quantum network
Long-distance communication is one of the main bottlenecks in the development of quantum communication networks. Free-space communication is a way to circumvent exponential fiber loss and to allow longer communication distances. Satellite nodes are the main devices currently studied for free-space communication, but they come with downsides such as high cost and low availability. In this work, we study an alternative to satellites, namely aerial platforms such as high-altitude balloons. We provide a loss model to simulate the channel efficiency of balloon-to-ground, ground-to-balloon, and balloon-to-balloon communication channels, considering a large set of hardware parameters. We perform a parameter exploration to exhibit important trade-offs in these channels, as well as simulations of different quantum key distribution network architectures including balloon nodes. We demonstrate that balloons are a realistic alternative to satellites for free-space communications in national network architectures.
Challenging the Need for Packet Spraying in Large-Scale Distributed Training
Large-scale distributed training in production datacenters constitutes a challenging workload bottlenecked by network communication. In response, both major industry players (e.g., Ultra Ethernet Consortium) and parts of academia have surprisingly, and almost unanimously, agreed that packet spraying is necessary to improve the performance of large-scale distributed training workloads. In this paper, we challenge this prevailing belief and pose the question: How close can a singlepath transport approach an optimal multipath transport? We demonstrate that singlepath transport (from a NIC's perspective) is sufficient and can perform nearly as well as an ideal multipath transport with packet spraying, particularly in the context of distributed training in leaf-spine topologies. Our assertion is based on four key observations about workloads driven by collective communication patterns: (i) flows within a collective start almost simultaneously, (ii) flow sizes are nearly equal, (iii) the completion time of a collective is more crucial than individual flow completion times, and (iv) flows can be split upon arrival. We analytically prove that singlepath transport, using minimal flow splitting (at the application layer), is equivalent to an ideal multipath transport with packet spraying in terms of maximum congestion. Our preliminary evaluations support our claims. This paper suggests an alternative agenda for developing next-generation transport protocols tailored for large-scale distributed training.
Learned Best-Effort LLM Serving
Many applications must provide low-latency LLM service to users or risk unacceptable user experience. However, over-provisioning resources to serve fluctuating request patterns is often prohibitively expensive. In this work, we present a best-effort serving system that employs deep reinforcement learning to adjust service quality based on the task distribution and system load. Our best-effort system can maintain availability with over 10x higher client request rates, serves above 96% of peak performance 4.1x more often, and serves above 98% of peak performance 2.3x more often than static serving on unpredictable workloads. Our learned router is robust to shifts in both the arrival and task distribution. Compared to static serving, learned best-effort serving allows for cost-efficient serving through increased hardware utility. Additionally, we argue that learned best-effort LLM serving is applicable in wide variety of settings and provides application developers great flexibility to meet their specific needs.
Rate limits in quantum networks with lossy repeaters
The derivation of ultimate limits to communication over certain quantum repeater networks have provided extremely valuable benchmarks for assessing near-term quantum communication protocols. However, these bounds are usually derived in the limit of ideal devices and leave questions about the performance of practical implementations unanswered. To address this challenge, we quantify how the presence of loss in repeater stations affect the maximum attainable rates for quantum communication over linear repeater chains and more complex quantum networks. Extending the framework of node splitting, we model the loss introduced at the repeater stations and then prove the corresponding limits. In the linear chain scenario we show that, by increasing the number of repeater stations, the maximum rate cannot overcome a quantity which solely depends on the loss of a single station. We introduce a way of adapting the standard machinery for obtaining bounds to this realistic scenario. The difference is that whilst ultimate limits for any strategy can be derived given a fixed channel, when the repeaters introduce additional decoherence, then the effective overall channel is itself a function of the chosen repeater strategy (e.g., one-way versus two-way classical communication). Classes of repeater strategies can be analysed using additional modelling and the subsequent bounds can be interpreted as the optimal rate within that class.
Outdoor-to-Indoor 28 GHz Wireless Measurements in Manhattan: Path Loss, Environmental Effects, and 90% Coverage
Outdoor-to-indoor (OtI) signal propagation further challenges the already tight link budgets at millimeter-wave (mmWave). To gain insight into OtI mmWave scenarios at 28 GHz, we conducted an extensive measurement campaign consisting of over 2,200 link measurements. In total, 43 OtI scenarios were measured in West Harlem, New York City, covering seven highly diverse buildings. The measured OtI path gain can vary by up to 40 dB for a given link distance, and the empirical path gain model for all data shows an average of 30 dB excess loss over free space at distances beyond 50 m, with an RMS fitting error of 11.7 dB. The type of glass is found to be the single dominant feature for OtI loss, with 20 dB observed difference between empirical path gain models for scenarios with low-loss and high-loss glass. The presence of scaffolding, tree foliage, or elevated subway tracks, as well as difference in floor height are each found to have an impact between 5-10 dB. We show that for urban buildings with high-loss glass, OtI coverage can support 500 Mbps for 90% of indoor user equipment (UEs) with a base station (BS) antenna placed up to 49 m away. For buildings with low-loss glass, such as our case study covering multiple classrooms of a public school, data rates over 2.5/1.2 Gbps are possible from a BS 68/175 m away from the school building, when a line-of-sight path is available. We expect these results to be useful for the deployment of mmWave networks in dense urban environments as well as the development of relevant scheduling and beam management algorithms.
PLUME: Building a Network-Native Foundation Model for Wireless Traces via Protocol-Aware Tokenization
Foundation models succeed when they learn in the native structure of a modality, whether morphology-respecting tokens in language or pixels in vision. Wireless packet traces deserve the same treatment: meaning emerges from layered headers, typed fields, timing gaps, and cross-packet state machines, not flat strings. We present Plume (Protocol Language Understanding Model for Exchanges), a compact 140M-parameter foundation model for 802.11 traces that learns from structured PDML dissections. A protocol-aware tokenizer splits along the dissector field tree, emits gap tokens for timing, and normalizes identifiers, yielding 6.2x shorter sequences than BPE with higher per token information density. Trained on a curated corpus, Plume achieves 74-97% next-packet token accuracy across five real-world failure categories and AUROC >= 0.99 for zero-shot anomaly detection. On the same prediction task, frontier LLMs (Claude Opus 4.6, GPT-5.4) score comparably despite receiving identical protocol context, yet Plume does so with > 600x fewer parameters, fitting on a single GPU at effectively zero marginal cost vs. cloud API pricing, enabling on-prem, privacy-preserving root cause analysis.
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Mixture-of-Experts (MoE) models lack explicit constraints to ensure the router's decisions align well with the experts' capabilities, which ultimately limits model performance. To address this, we propose expert-router coupling (ERC) loss, a lightweight auxiliary loss that tightly couples the router's decisions with expert capabilities. Our approach treats each expert's router embedding as a proxy token for the tokens assigned to that expert, and feeds perturbed router embeddings through the experts to obtain internal activations. The ERC loss enforces two constraints on these activations: (1) Each expert must exhibit higher activation for its own proxy token than for the proxy tokens of any other expert. (2) Each proxy token must elicit stronger activation from its corresponding expert than from any other expert. These constraints jointly ensure that each router embedding faithfully represents its corresponding expert's capability, while each expert specializes in processing the tokens actually routed to it. The ERC loss is computationally efficient, operating only on n^2 activations, where n is the number of experts. This represents a fixed cost independent of batch size, unlike prior coupling methods that scale with the number of tokens (often millions per batch). Through pre-training MoE-LLMs ranging from 3B to 15B parameters and extensive analysis on trillions of tokens, we demonstrate the effectiveness of the ERC loss. Moreover, the ERC loss offers flexible control and quantitative tracking of expert specialization levels during training, providing valuable insights into MoEs.
NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models
In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both ciphertext and plaintext. While many methods have been adopted to analyze network traffic, they often rely on different datasets for performance evaluation. This inconsistency results in substantial manual data processing efforts and unfair comparisons. Moreover, some data processing methods may cause data leakage due to improper separation of training and testing data. To address these issues, we introduce the NetBench, a large-scale and comprehensive benchmark dataset for assessing machine learning models, especially foundation models, in both network traffic classification and generation tasks. NetBench is built upon seven publicly available datasets and encompasses a broad spectrum of 20 tasks, including 15 classification tasks and 5 generation tasks. Furthermore, we evaluate eight State-Of-The-Art (SOTA) classification models (including two foundation models) and two generative models using our benchmark. The results show that foundation models significantly outperform the traditional deep learning methods in traffic classification. We believe NetBench will facilitate fair comparisons among various approaches and advance the development of foundation models for network traffic. Our benchmark is available at https://github.com/WM-JayLab/NetBench.
