HomesteaderLabs commited on
Commit
6ef157d
Β·
verified Β·
1 Parent(s): 2d24284

Add Field Notes writeup + README link

Browse files
Files changed (1) hide show
  1. FIELD_NOTES.md +131 -0
FIELD_NOTES.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # No signal, no GPU, no second chances: building an AI for the woods
2
+
3
+ Picture the worst possible place to run a machine learning model. No cell signal.
4
+ No cloud to phone home to. No GPU, just a Raspberry Pi and a little accelerator
5
+ chip the size of a stamp, running off a battery I've been babying since sunrise.
6
+ And the user? The user is standing in a damp forest about to decide whether to put
7
+ something in their mouth.
8
+
9
+ That's foraging. That's the actual deployment target. I'm not shipping a chatbot
10
+ that gets to be wrong in an interesting way. I'm shipping a thing that, if it's
11
+ confidently wrong, helps somebody have the last bad afternoon of their life.
12
+
13
+ So when people ask why my models are so small, I laugh a little. Small isn't the
14
+ compromise here. Small is the only thing that survives contact with the woods. The
15
+ whole project is a study in constraints, and honestly, the constraints did most of
16
+ the design work for me. I just had to stop fighting them.
17
+
18
+ Here's the stack: a domain router plus three expert classifiers, all
19
+ `tf_efficientnet_lite2`, about nine million parameters each. Four models, roughly
20
+ 0.04 billion parameters total, which in the year of our lord 2026 is a rounding
21
+ error. They run offline on the device and they run on a plain CPU, no GPU required.
22
+ That's not me being humble. That's me being a forager with a dead phone two miles
23
+ from the trailhead. The "edge" everybody name-drops is, for me, just Tuesday.
24
+
25
+ ## Constraint number one: the model is not allowed to vote
26
+
27
+ My first real architecture was clever and I was proud of it, which is usually the
28
+ tell that something is about to go wrong. The router would decide a photo was a
29
+ "plant," then I'd run two expert models on it and take whichever was more
30
+ confident. Max confidence wins. Democratic. Elegant.
31
+
32
+ It also tried to feed people poison hemlock.
33
+
34
+ Here's the failure, and it's a beautiful one. A deadly plant β€” say, poison hemlock β€”
35
+ gets routed to "plant." My high-value forageables expert has never seen hemlock in
36
+ its life, but it has seen ramps, and hemlock and ramps are cousins that have killed
37
+ people who confused them. So the high-value model looks at a deadly plant and
38
+ announces, at 0.9 confidence, that it's ramps. Delicious ramps. Meanwhile my
39
+ medicinals expert, the one that actually knows hemlock is death, correctly flags it
40
+ as deadly at lower confidence. Max confidence wins. The confident idiot
41
+ out-votes the cautious expert. In my tests this leaked deadly-as-edible about six
42
+ percent of the time, and not one of my per-model benchmarks caught it, because no
43
+ single model was ever wrong. The *system* was wrong.
44
+
45
+ So I killed the vote. Now the router picks exactly one expert and that expert owns
46
+ the call, full stop. The mushroom expert never sees a plant, so it can never call a
47
+ plant anything. On top of that, any "deadly" verdict vetoes a more-confident "safe"
48
+ one, because in this domain a false reassurance is the only error that actually
49
+ matters. Deadly-as-edible dropped from around six percent to half a percent. The
50
+ constraint β€” keep it small, keep it dumb, keep the experts in their lanes β€” turned
51
+ out to be the safety feature.
52
+
53
+ ## Constraint number two: being safe is really easy if you don't mind being useless
54
+
55
+ Now the push and pull. Once you've been spooked by a near-miss, the temptation is to
56
+ crank every safety dial to the max. Refuse by default. When in doubt, abstain. Very
57
+ responsible. Very noble.
58
+
59
+ It's also how you build a tool nobody uses. I pushed the confidence gate up and
60
+ watched the thing turn into a coward. It started abstaining on blackberries.
61
+ Blackberries. At the safe-but-useless end of the curve it would refuse on roughly
62
+ half of perfectly edible finds, which is a fantastic way to teach your users that the
63
+ gadget is a paperweight and their own guessing is faster. And here's the part that
64
+ took me an embarrassing while to accept: you cannot gate your way to zero. I mapped
65
+ the whole curve. Loosen the gate and you get a decisive, useful tool that
66
+ occasionally calls something dangerous safe. Tighten it and you get a safe tool that
67
+ cries "I don't know" at a raspberry. There is no magic threshold that gives you both,
68
+ because the residual risk isn't low confidence β€” it's the model being confidently,
69
+ specifically wrong, and no confidence knob catches that.
70
+
71
+ Tightening the screw was a dead end. So instead of asking the model to be more sure,
72
+ I started asking it for more evidence.
73
+
74
+ The move we landed on β€” and I'll be straight, this lives in a test harness right now,
75
+ not in the shipped app yet β€” is to stop treating one photo as the whole story. Show
76
+ the model the same subject from a couple of angles, or even just a few augmented crops
77
+ of the one shot, and fuse the results before it commits. In our harness this was
78
+ close to a free lunch: the multi-angle version drove deadly-as-edible toward zero on
79
+ the domains we tested while *lifting* accuracy on the safe stuff, not crushing it.
80
+ Even better is making the second photo a targeted ask β€” only when the top guess is a
81
+ safe-looking thing that has a deadly twin. The app turns to you and says "okay,
82
+ photograph the stem before I sign off on this." The friction is the feature. The
83
+ moment of "hang on, show me more" is exactly the moment a careful forager would slow
84
+ down anyway. We're not nagging with a banner nobody reads; we're building the caution
85
+ into the interaction. That part's still being wired in, and I'll write it up properly
86
+ when it is.
87
+
88
+ ## The one-line bug that told me my safety net was broken
89
+
90
+ A short detour into humility, because this is a field notes post and field notes
91
+ should include the faceplants.
92
+
93
+ For ages I believed my out-of-distribution detector didn't work. This is the piece
94
+ that's supposed to notice when you point the camera at, I don't know, a car, or your
95
+ own shoe, and say "that's not food, I'm not playing." I was using an energy score for
96
+ it, and every time I measured the thing it scored about 0.25 AUROC, which for the
97
+ non-nerds means it was worse than a coin flip. Inverted. Useless. I shelved it and
98
+ moved on, mildly betrayed.
99
+
100
+ Then I actually read my own code. The energy formula subtracts the max value for
101
+ numerical stability and is supposed to add it right back. Mine subtracted it and
102
+ forgot to add it back. One term. That single missing piece didn't just make the
103
+ number wrong, it flipped its meaning, so my detector was confidently pointing the
104
+ wrong direction the entire time. Fixed the one line, re-ran it, and the same detector
105
+ jumped to around 0.90 AUROC on my hardest domain. It had been working all along. I
106
+ had been reading the dial upside down.
107
+
108
+ The lesson I keep taped to the inside of my skull now: when a detector says it's
109
+ failing, check that you're not holding it backwards before you believe it. SHOCKING,
110
+ I know.
111
+
112
+ ## The actual thesis
113
+
114
+ Here's why I think the constraints were a gift and not a punishment. When you've got
115
+ a giant model and infinite compute, you can paper over hard calls. You can be a
116
+ little wrong everywhere and call it nuance. Out here, on a stamp-sized chip in a
117
+ forest, there's nowhere to hide. Every tradeoff has to be made out loud: do I want
118
+ decisive or do I want safe, and where exactly do I put the line, and who gets hurt at
119
+ each setting. The edge didn't limit the engineering. It made the engineering honest.
120
+
121
+ Which, now that I say it out loud, is the whole reason I'm doing any of this. When
122
+ everything else goes digital and frictionless and confident, the stuff that keeps you
123
+ alive is still physical, still slow, still asks you to look twice. I built a machine
124
+ that's good enough to be worth beating. The goal was never to replace the forager. It
125
+ was to keep them sharp.
126
+
127
+ Or so I tell myself, two miles from the trailhead, photographing a stem.
128
+
129
+ What would you have done at the safety-versus-useful fork β€” held the line on refusing,
130
+ or chased the second photo? And what's the dumbest one-line bug that ever cost you a
131
+ week?