File size: 10,848 Bytes
aaef24a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8eccedb
aaef24a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8eccedb
aaef24a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f006b5f
aaef24a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "altair",
#     "asimpy",
#     "marimo",
#     "polars==1.24.0",
# ]
# ///

import marimo

__generated_with = "0.20.4"
app = marimo.App(width="medium")


@app.cell(hide_code=True)
def _():
    import marimo as mo
    import random
    import statistics

    import altair as alt
    import polars as pl

    from asimpy import Environment, Process

    return Environment, Process, alt, mo, pl, random, statistics


@app.cell(hide_code=True)
def _(mo):
    mo.md(r"""
    # The Inspector's Paradox

    ## *Why the Bus Is Always Late*

    Buses arrive at a stop with some average headway (gap between buses) of $\mu$ minutes. A passenger arrives at a uniformly random time and waits for the next bus. How long do they wait? The naive answer is $\mu / 2$: on average you land in the middle of a gap. The correct answer is almost always longer—sometimes much longer.

    The expected wait is not $\mu/2$ but:

    $$E[\text{wait}] = \frac{\mu}{2} + \frac{\sigma^2}{2\mu}$$

    where $\sigma^2 = \text{Var}[\text{headway}]$. The second term is always non-negative, so higher variance always means longer expected waits, even when the mean headway is unchanged.

    ### Three Bus Schedules with Mean Headway $\mu = 10$

    | Schedule    | $\sigma^2$ | Predicted wait | Naive wait |
    |-------------|-----------|----------------|-----------|
    | Regular     | 0         | 5.0            | 5.0       |
    | Exponential | 100       | 10.0           | 5.0       |
    | Clustered   | 64        | 8.2            | 5.0       |

    For exponentially distributed headways, $\sigma^2 = \mu^2$, so:

    $$E[\text{wait}] = \frac{\mu}{2} + \frac{\mu^2}{2\mu} = \mu$$

    A passenger waits on average for an *entire* mean headway — twice the naive expectation.

    ## Why This Happens: Length-Biased Sampling

    A passenger arriving at a random time is more likely to land inside a *long* gap than a short one, because long gaps occupy more time on the clock. This is called *length-biased sampling*. The interval containing your arrival is not a random headway: it is drawn from the length-biased distribution with density:

    $$f^*(h) = \frac{h \cdot f(h)}{\mu}$$

    The mean of this biased distribution is $\mu + \sigma^2/\mu$, and you arrive uniformly within it, giving expected wait $(\mu + \sigma^2/\mu)/2$.

    The same phenomenon explains why the average class size experienced by a student exceeds the average class size reported by the university (large classes have more students to report them).

    ## Why "Inspector's Paradox"?

    The name comes from quality control, where an inspector arrives at a random time to sample a production process and systematically encounters longer-than-average intervals. The paradox is that a random observer is more likely to land inside a long gap than a short one, so their experienced mean interval exceeds the true mean interval. It feels paradoxical because you'd expect a random arrival to see the average gap, but length-biased sampling guarantees they see worse-than-average gaps whenever there's any variance at all.
    """)
    return


@app.cell(hide_code=True)
def _(mo):
    mo.md(r"""
    ## Implementation

    A `BusService` process generates buses under three headway distributions (regular, exponential, clustered bimodal) and records their arrival times. After the simulation, passenger wait times are estimated by sampling $N$ uniformly random arrival times and finding the next bus for each, without needing explicit `Passenger` processes.
    """)
    return


@app.cell(hide_code=True)
def _(mo):
    sim_time_slider = mo.ui.slider(
        start=0,
        stop=100_000,
        step=1_000,
        value=20_000,
        label="Simulation time",
    )

    mean_headway_slider = mo.ui.slider(
        start=5.0,
        stop=30.0,
        step=1.0,
        value=10.0,
        label="Mean headway",
    )

    seed_input = mo.ui.number(
        value=192,
        step=1,
        label="Random seed",
    )

    run_button = mo.ui.run_button(label="Run simulation")

    mo.vstack([
        sim_time_slider,
        mean_headway_slider,
        seed_input,
        run_button,
    ])
    return mean_headway_slider, seed_input, sim_time_slider


@app.cell
def _(mean_headway_slider, seed_input, sim_time_slider):
    SIM_TIME = int(sim_time_slider.value)
    MEAN_HEADWAY = float(mean_headway_slider.value)
    SEED = int(seed_input.value)
    N_PASSENGERS = 20_000
    return MEAN_HEADWAY, N_PASSENGERS, SEED, SIM_TIME


@app.cell
def _(MEAN_HEADWAY, Process, random):
    class BusService(Process):
        def init(self, mode, bus_arrivals):
            self.mode = mode
            self.bus_arrivals = bus_arrivals

        async def run(self):
            while True:
                if self.mode == "regular":
                    headway = MEAN_HEADWAY
                elif self.mode == "exponential":
                    headway = random.expovariate(1.0 / MEAN_HEADWAY)
                elif self.mode == "clustered":
                    headway = MEAN_HEADWAY * 0.2 if random.random() < 0.5 else MEAN_HEADWAY * 1.8
                else:
                    raise ValueError(f"Unknown mode: {self.mode}")
                await self.timeout(headway)
                self.bus_arrivals.append(self.now)

    return (BusService,)


@app.cell
def _(BusService, Environment, SIM_TIME):
    def collect_buses(mode):
        bus_arrivals = []
        env = Environment()
        BusService(env, mode, bus_arrivals)
        env.run(until=SIM_TIME)
        return bus_arrivals

    return (collect_buses,)


@app.cell
def _(N_PASSENGERS, random, statistics):
    def expected_wait(bus_arrivals, n=N_PASSENGERS):
        max_t = bus_arrivals[-1]
        waits = []
        for _ in range(n):
            t = random.uniform(0.0, max_t * 0.95)
            for b in bus_arrivals:
                if b > t:
                    waits.append(b - t)
                    break
        return statistics.mean(waits) if waits else 0.0

    return (expected_wait,)


@app.cell
def _(statistics):
    def headway_variance(bus_arrivals):
        headways = [b - a for a, b in zip(bus_arrivals, bus_arrivals[1:])]
        return statistics.variance(headways) if len(headways) > 1 else 0.0

    return (headway_variance,)


@app.cell(hide_code=True)
def _(MEAN_HEADWAY, mo):
    mu = MEAN_HEADWAY
    naive = MEAN_HEADWAY / 2.0
    var_exp = mu ** 2
    var_clustered = 0.5 * (mu * 0.2 - mu) ** 2 + 0.5 * (mu * 1.8 - mu) ** 2
    mo.md(f"""
    ## Results

    Mean headway: {MEAN_HEADWAY} → naive expected wait = {naive:.1f}

    - **Exponential** (Var ≈ {var_exp:.1f}): predicted = {mu / 2 + var_exp / (2 * mu):.1f} (= full mean headway!)
    - **Clustered** (Var ≈ {var_clustered:.1f}): predicted = {mu / 2 + var_clustered / (2 * mu):.1f}
    """)
    return (naive,)


@app.cell
def _(MEAN_HEADWAY, collect_buses, expected_wait, headway_variance, naive, pl):
    def run_models():
        rows = []
        for mode in ["regular", "exponential", "clustered"]:
            buses = collect_buses(mode)
            var_h = headway_variance(buses)
            mean_w = expected_wait(buses)
            rows.append({
                "mode": mode,
                "var_headway": round(var_h, 4),
                "mean_wait": round(mean_w, 4),
                "predicted": round(MEAN_HEADWAY / 2.0 + var_h / (2.0 * MEAN_HEADWAY), 4),
                "ratio": round(mean_w / naive, 4),
            })
        return pl.DataFrame(rows)

    return (run_models,)


@app.cell
def _(SEED, random, run_models):
    random.seed(SEED)
    df = run_models()
    df
    return (df,)


@app.cell
def _(alt, df, naive, pl):
    chart = (
        alt.Chart(df)
        .mark_bar()
        .encode(
            x=alt.X("mode:N", title="Bus schedule type"),
            y=alt.Y("mean_wait:Q", title="Mean passenger wait"),
            color=alt.Color("mode:N", legend=None),
            tooltip=["mode:N", "mean_wait:Q", "ratio:Q"],
        )
        .properties(title="Inspector's Paradox: Mean Wait by Schedule Type")
    )
    naive_line = (
        alt.Chart(pl.DataFrame({"naive": [naive]}))
        .mark_rule(strokeDash=[4, 4], color="gray")
        .encode(y="naive:Q")
    )
    (chart + naive_line)
    return


@app.cell(hide_code=True)
def _(mo):
    mo.md(r"""
    ## Understanding the Math

    ### Length-biased sampling

    Suppose buses run on an irregular schedule where gaps between buses are either 2 minutes or 18 minutes, each with probability 1/2. The mean gap is $\mu = (2 + 18)/2 = 10$ minutes. Now ask: if you arrive at a completely random moment, which gap are you most likely to land inside?

    A 2-minute gap occupies only 2 minutes on the clock, but an 18-minute gap occupies 18. Out of every 20 minutes of clock time on average, 2 minutes belong to a short gap and 18 to a long one. So a random arrival lands in a short gap with probability $2/(2+18) = 1/10$ and in a long gap with probability $18/20 = 9/10$. The expected gap length you experience is:

    $$E[\text{gap experienced}] = \frac{1}{10} \cdot 2 + \frac{9}{10} \cdot 18 = 0.2 + 16.2 = 16.4 \text{ minutes}$$

    That is far above the mean gap of 10 minutes. You are disproportionately likely to land inside a long gap simply because it takes up more time.

    ### The wait formula

    Once you are inside a gap, you arrive uniformly within it, so on average you land in the middle. Your expected wait is half the gap length you experience. The full formula is:

    $$E[\text{wait}] = \frac{\mu}{2} + \frac{\sigma^2}{2\mu}$$

    Here $\mu$ is the mean gap and $\sigma^2 = \text{Var}[\text{gap}]$ is the variance of gap lengths. The first term, $\mu/2$, is what you would get if every gap were exactly $\mu$ (deterministic buses — arrive in the middle every time). The second term, $\sigma^2/(2\mu)$, is the extra waiting from length-biased sampling. It is always non-negative, so irregular buses always make you wait longer than regular buses with the same mean headway.

    ### Why variance matters

    The variance $\sigma^2$ measures how spread out the gap sizes are. A perfectly regular bus schedule has $\sigma^2 = 0$ and gives the naive answer $\mu/2$. An exponentially distributed schedule has $\sigma^2 = \mu^2$, which doubles the expected wait to $\mu$. More irregular buses, higher penalty.

    ### Connecting to expected values

    The formula arises from a standard result: the expected length of the gap containing a random arrival is $\mu + \sigma^2/\mu$. You can think of this as the mean gap plus a correction term proportional to the variance divided by the mean. Dividing by 2 (uniform arrival within the gap) gives the wait formula above.
    """)
    return


if __name__ == "__main__":
    app.run()