Spaces:
Runtime error
Runtime error
File size: 10,756 Bytes
aaef24a 8eccedb aaef24a 8eccedb aaef24a f006b5f aaef24a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 | # /// script
# requires-python = ">=3.13"
# dependencies = [
# "altair",
# "asimpy",
# "marimo",
# "polars==1.24.0",
# ]
# ///
import marimo
__generated_with = "0.20.4"
app = marimo.App(width="medium")
@app.cell(hide_code=True)
def _():
import marimo as mo
import random
import statistics
import altair as alt
import polars as pl
from asimpy import Environment, Process, Queue, PriorityQueue
return Environment, Process, Queue, PriorityQueue, alt, mo, pl, random, statistics
@app.cell(hide_code=True)
def _(mo):
mo.md(r"""
# The Convoy Effect
## *One Slow Job Ruins Everyone's Day*
A single server processes jobs that arrive randomly (Poisson process). Most jobs are quick (exponential service with small mean), but a rare few are very slow (exponential service with large mean). This *hyperexponential* service distribution has high variance. This tutorial compares the performance of two scheduling disciplines in this situation:
- FIFO (First In, First Out): jobs are served in the order they arrive.
- SJF (Shortest Job First): the server always picks the shortest queued job next.
The surprising result is that SJF dramatically outperforms FIFO: not just for the small jobs that directly benefit from skipping ahead, but also for mean sojourn time across *all* jobs. The improvement is most visible at the tail (95th and 99th percentiles) because FIFO creates a *convoy effect*: one long job blocks many short jobs behind it, inflating everyone's wait.
### The Convoy Metaphor
Picture a one-lane road with one slow truck and many fast cars. Every car behind the truck must drive at truck speed; no overtaking allowed. The truck is the long job; the cars are the short jobs stuck behind it in FIFO order. SJF is like a passing lane: fast cars jump ahead of the truck and reach their destination much sooner. The truck itself arrives at the same time either way, but the total delay experienced by all vehicles plummets.
### Why FIFO Hurts with High Variance
In FIFO, the server's current job is chosen at arrival time, not at decision time. When a slow job begins service, every subsequent arrival must join the queue and wait. The expected excess work in service (the remaining time of the current job, seen by an arriving customer) under FIFO is:
$$W_{\text{FIFO}} = \frac{\lambda \overline{s^2}}{2(1-\rho)} + \frac{1}{\mu}$$
where $\overline{s^2}$ is the second moment of service time. High variance inflates $\overline{s^2}$ without changing $\rho$, directly worsening wait time.
### SJF Minimises Mean Sojourn Time
For a single server with non-preemptive SJF and any service-time distribution, the mean sojourn time is given by the formula below (which is discussed in "Understanding the Math" at the end of this lesson):
$$W_{\text{SJF}} = \frac{1}{\mu} + \frac{\lambda \overline{s^2}}{2(1-\rho)}$$
SJF achieves this minimum because short jobs that would otherwise be blocked by a long job are promoted ahead, reducing the total waiting work in the system.
## Practical Relevance
Operating system CPU schedulers use time-quanta and priority aging to approximate SJF without knowing job sizes in advance. Database query planners estimate query cost and reorder execution to minimize blocking. The phenomenon reappears as *head-of-line blocking* in HTTP/1.1 (one slow response stalls a connection), motivating HTTP/2 multiplexing and HTTP/3's QUIC stream independence.
""")
return
@app.cell(hide_code=True)
def _(mo):
mo.md(r"""
## Implementation
Jobs are placed in a `PriorityQueue` for SJF (tupled as `(service_time, job_id)` so shorter jobs sort earlier) or a plain `Queue()` for FIFO (tupled as `(job_id, service_time)` to preserve arrival order). The same hyperexponential service-time generator (90% short, 10% long) is used in both runs.
""")
return
@app.cell(hide_code=True)
def _(mo):
sim_time_slider = mo.ui.slider(
start=0,
stop=100_000,
step=1_000,
value=20_000,
label="Simulation time",
)
arrival_rate_slider = mo.ui.slider(
start=0.1,
stop=1.5,
step=0.05,
value=0.7,
label="Arrival rate",
)
seed_input = mo.ui.number(
value=192,
step=1,
label="Random seed",
)
run_button = mo.ui.run_button(label="Run simulation")
mo.vstack([
sim_time_slider,
arrival_rate_slider,
seed_input,
run_button,
])
return arrival_rate_slider, seed_input, sim_time_slider
@app.cell
def _(arrival_rate_slider, seed_input, sim_time_slider):
SIM_TIME = int(sim_time_slider.value)
ARRIVAL_RATE = float(arrival_rate_slider.value)
SEED = int(seed_input.value)
SHORT_RATE = 4.0
LONG_RATE = 0.2
LONG_PROB = 0.10
return ARRIVAL_RATE, LONG_PROB, LONG_RATE, SEED, SHORT_RATE, SIM_TIME
@app.cell
def _(LONG_PROB, LONG_RATE, SHORT_RATE, random):
def service_time():
if random.random() < LONG_PROB:
return random.expovariate(LONG_RATE)
return random.expovariate(SHORT_RATE)
return (service_time,)
@app.cell
def _(ARRIVAL_RATE, Process, random, service_time):
class JobSource(Process):
def init(self, job_queue, arrivals, sjf):
self.job_queue = job_queue
self.arrivals = arrivals
self.sjf = sjf
self._jid = 0
async def run(self):
while True:
await self.timeout(random.expovariate(ARRIVAL_RATE))
jid = self._jid
self._jid += 1
svc = service_time()
self.arrivals[jid] = (self.now, svc)
if self.sjf:
await self.job_queue.put((svc, jid))
else:
await self.job_queue.put((jid, svc))
return (JobSource,)
@app.cell
def _(Process):
class Server(Process):
def init(self, job_queue, arrivals, sojourn_times, sjf):
self.job_queue = job_queue
self.arrivals = arrivals
self.sojourn_times = sojourn_times
self.sjf = sjf
async def run(self):
while True:
item = await self.job_queue.get()
if self.sjf:
svc, jid = item
else:
jid, svc = item
await self.timeout(svc)
arrival_time, _ = self.arrivals[jid]
self.sojourn_times.append(self.now - arrival_time)
return (Server,)
@app.cell
def _(Environment, JobSource, Queue, PriorityQueue, SIM_TIME, Server, statistics):
def simulate(sjf):
arrivals = {}
sojourn_times = []
env = Environment()
q = PriorityQueue(env) if sjf else Queue(env)
JobSource(env, q, arrivals, sjf)
Server(env, q, arrivals, sojourn_times, sjf)
env.run(until=SIM_TIME)
return {
"mean": statistics.mean(sojourn_times),
"median": statistics.median(sojourn_times),
"p95": sorted(sojourn_times)[int(0.95 * len(sojourn_times))],
"p99": sorted(sojourn_times)[int(0.99 * len(sojourn_times))],
"n": len(sojourn_times),
}
return (simulate,)
@app.cell
def _(LONG_PROB, LONG_RATE, SEED, SHORT_RATE, pl, random, simulate):
def run_scenarios():
fifo = simulate(sjf=False)
sjf_res = simulate(sjf=True)
rows = [
{
"metric": m,
"fifo": fifo[m],
"sjf": sjf_res[m],
"improvement": fifo[m] / sjf_res[m],
}
for m in ("mean", "median", "p95", "p99")
]
return pl.DataFrame(rows)
random.seed(SEED)
df = run_scenarios()
mean_svc = (1 - LONG_PROB) / SHORT_RATE + LONG_PROB / LONG_RATE
return df, mean_svc
@app.cell(hide_code=True)
def _(ARRIVAL_RATE, LONG_PROB, LONG_RATE, SHORT_RATE, mean_svc, mo):
mo.md(f"""
## Summary Statistics
Arrival rate: {ARRIVAL_RATE}, estimated mean service: {mean_svc:.3f}
Short jobs: {100 * (1 - LONG_PROB):.0f}% (mean {1 / SHORT_RATE:.2f}),
Long jobs: {100 * LONG_PROB:.0f}% (mean {1 / LONG_RATE:.1f})
> **Note:** SJF is optimal for mean sojourn time but requires knowing job sizes in advance.
""")
return
@app.cell
def _(df):
df
return
@app.cell
def _(alt, df, pl):
df_plot = df.filter(pl.col("metric") != "n").unpivot(
on=["fifo", "sjf"],
index="metric",
variable_name="policy",
value_name="sojourn_time",
)
chart = (
alt.Chart(df_plot)
.mark_bar()
.encode(
x=alt.X("metric:N", title="Metric"),
y=alt.Y("sojourn_time:Q", title="Sojourn time"),
color=alt.Color("policy:N", title="Policy"),
xOffset="policy:N",
tooltip=["metric:N", "policy:N", "sojourn_time:Q"],
)
.properties(title="Convoy Effect: FIFO vs. Shortest Job First")
)
chart
return
@app.cell(hide_code=True)
def _(mo):
mo.md(r"""
## Understanding the Math
### The second moment
For a random variable $S$ representing service time, the second moment is $E[S^2]$. Recall from your statistics course that variance is $\text{Var}(S) = E[S^2] - (E[S])^2$, which rearranges to:
$$E[S^2] = \text{Var}(S) + (E[S])^2$$
This means high variance inflates $E[S^2]$ even if the mean $E[S]$ stays fixed. Doubling the spread of service times can quadruple $E[S^2]$, even with the same average service time.
### Why variance of service time hurts
Imagine a FIFO server handling jobs that are either 0.1 minutes or 10 minutes long, with 90% being short and 10% being long. The mean service time is $0.9 \times 0.1 + 0.1 \times 10 = 1.09$ minutes, so utilization $\rho = \lambda / \mu$ might be modest. But when a 10-minute job starts, every job arriving during those 10 minutes must join the queue and wait. The longer $E[S^2]$, the more average work sits ahead of each arriving job.
### The Pollaczek–Khinchine formula
The mean time a job spends waiting (not counting its own service time) in a FIFO single-server queue is:
$$W_q = \frac{\lambda \cdot E[S^2]}{2(1 - \rho)}$$
Here $\lambda$ is the arrival rate, $E[S^2]$ is the second moment of service time, and $\rho = \lambda \cdot E[S]$ is the server utilization. Both $\lambda$ and $E[S^2]$ appear in the numerator, so more variance means more waiting even at the same $\rho$. The $(1-\rho)$ denominator is the familiar blow-up term from M/M/1.
""")
return
if __name__ == "__main__":
app.run()
|