Spaces:

build-small-hackathon
/

Forager-Field-Notes

Running

App Files Files Community

HomesteaderLabs commited on 1 day ago

Commit

6ef157d

verified ·

1 Parent(s): 2d24284

Add Field Notes writeup + README link

Browse files

Files changed (1) hide show

FIELD_NOTES.md +131 -0

FIELD_NOTES.md ADDED Viewed

	@@ -0,0 +1,131 @@

+# No signal, no GPU, no second chances: building an AI for the woods
+Picture the worst possible place to run a machine learning model. No cell signal.
+No cloud to phone home to. No GPU, just a Raspberry Pi and a little accelerator
+chip the size of a stamp, running off a battery I've been babying since sunrise.
+And the user? The user is standing in a damp forest about to decide whether to put
+something in their mouth.
+That's foraging. That's the actual deployment target. I'm not shipping a chatbot
+that gets to be wrong in an interesting way. I'm shipping a thing that, if it's
+confidently wrong, helps somebody have the last bad afternoon of their life.
+So when people ask why my models are so small, I laugh a little. Small isn't the
+compromise here. Small is the only thing that survives contact with the woods. The
+whole project is a study in constraints, and honestly, the constraints did most of
+the design work for me. I just had to stop fighting them.
+Here's the stack: a domain router plus three expert classifiers, all
+`tf_efficientnet_lite2`, about nine million parameters each. Four models, roughly
+0.04 billion parameters total, which in the year of our lord 2026 is a rounding
+error. They run offline on the device and they run on a plain CPU, no GPU required.
+That's not me being humble. That's me being a forager with a dead phone two miles
+from the trailhead. The "edge" everybody name-drops is, for me, just Tuesday.
+## Constraint number one: the model is not allowed to vote
+My first real architecture was clever and I was proud of it, which is usually the
+tell that something is about to go wrong. The router would decide a photo was a
+"plant," then I'd run two expert models on it and take whichever was more
+confident. Max confidence wins. Democratic. Elegant.
+It also tried to feed people poison hemlock.
+Here's the failure, and it's a beautiful one. A deadly plant — say, poison hemlock —
+gets routed to "plant." My high-value forageables expert has never seen hemlock in
+its life, but it has seen ramps, and hemlock and ramps are cousins that have killed
+people who confused them. So the high-value model looks at a deadly plant and
+announces, at 0.9 confidence, that it's ramps. Delicious ramps. Meanwhile my
+medicinals expert, the one that actually knows hemlock is death, correctly flags it
+as deadly at lower confidence. Max confidence wins. The confident idiot
+out-votes the cautious expert. In my tests this leaked deadly-as-edible about six
+percent of the time, and not one of my per-model benchmarks caught it, because no
+single model was ever wrong. The *system* was wrong.
+So I killed the vote. Now the router picks exactly one expert and that expert owns
+the call, full stop. The mushroom expert never sees a plant, so it can never call a
+plant anything. On top of that, any "deadly" verdict vetoes a more-confident "safe"
+one, because in this domain a false reassurance is the only error that actually
+matters. Deadly-as-edible dropped from around six percent to half a percent. The
+constraint — keep it small, keep it dumb, keep the experts in their lanes — turned
+out to be the safety feature.
+## Constraint number two: being safe is really easy if you don't mind being useless
+Now the push and pull. Once you've been spooked by a near-miss, the temptation is to
+crank every safety dial to the max. Refuse by default. When in doubt, abstain. Very
+responsible. Very noble.
+It's also how you build a tool nobody uses. I pushed the confidence gate up and
+watched the thing turn into a coward. It started abstaining on blackberries.
+Blackberries. At the safe-but-useless end of the curve it would refuse on roughly
+half of perfectly edible finds, which is a fantastic way to teach your users that the
+gadget is a paperweight and their own guessing is faster. And here's the part that
+took me an embarrassing while to accept: you cannot gate your way to zero. I mapped
+the whole curve. Loosen the gate and you get a decisive, useful tool that
+occasionally calls something dangerous safe. Tighten it and you get a safe tool that
+cries "I don't know" at a raspberry. There is no magic threshold that gives you both,
+because the residual risk isn't low confidence — it's the model being confidently,
+specifically wrong, and no confidence knob catches that.
+Tightening the screw was a dead end. So instead of asking the model to be more sure,
+I started asking it for more evidence.
+The move we landed on — and I'll be straight, this lives in a test harness right now,
+not in the shipped app yet — is to stop treating one photo as the whole story. Show
+the model the same subject from a couple of angles, or even just a few augmented crops
+of the one shot, and fuse the results before it commits. In our harness this was
+close to a free lunch: the multi-angle version drove deadly-as-edible toward zero on
+the domains we tested while *lifting* accuracy on the safe stuff, not crushing it.
+Even better is making the second photo a targeted ask — only when the top guess is a
+safe-looking thing that has a deadly twin. The app turns to you and says "okay,
+photograph the stem before I sign off on this." The friction is the feature. The
+moment of "hang on, show me more" is exactly the moment a careful forager would slow
+down anyway. We're not nagging with a banner nobody reads; we're building the caution
+into the interaction. That part's still being wired in, and I'll write it up properly
+when it is.
+## The one-line bug that told me my safety net was broken
+A short detour into humility, because this is a field notes post and field notes
+should include the faceplants.
+For ages I believed my out-of-distribution detector didn't work. This is the piece
+that's supposed to notice when you point the camera at, I don't know, a car, or your
+own shoe, and say "that's not food, I'm not playing." I was using an energy score for
+it, and every time I measured the thing it scored about 0.25 AUROC, which for the
+non-nerds means it was worse than a coin flip. Inverted. Useless. I shelved it and
+moved on, mildly betrayed.
+Then I actually read my own code. The energy formula subtracts the max value for
+numerical stability and is supposed to add it right back. Mine subtracted it and
+forgot to add it back. One term. That single missing piece didn't just make the
+number wrong, it flipped its meaning, so my detector was confidently pointing the
+wrong direction the entire time. Fixed the one line, re-ran it, and the same detector
+jumped to around 0.90 AUROC on my hardest domain. It had been working all along. I
+had been reading the dial upside down.
+The lesson I keep taped to the inside of my skull now: when a detector says it's
+failing, check that you're not holding it backwards before you believe it. SHOCKING,
+I know.
+## The actual thesis
+Here's why I think the constraints were a gift and not a punishment. When you've got
+a giant model and infinite compute, you can paper over hard calls. You can be a
+little wrong everywhere and call it nuance. Out here, on a stamp-sized chip in a
+forest, there's nowhere to hide. Every tradeoff has to be made out loud: do I want
+decisive or do I want safe, and where exactly do I put the line, and who gets hurt at
+each setting. The edge didn't limit the engineering. It made the engineering honest.
+Which, now that I say it out loud, is the whole reason I'm doing any of this. When
+everything else goes digital and frictionless and confident, the stuff that keeps you
+alive is still physical, still slow, still asks you to look twice. I built a machine
+that's good enough to be worth beating. The goal was never to replace the forager. It
+was to keep them sharp.
+Or so I tell myself, two miles from the trailhead, photographing a stem.
+What would you have done at the safety-versus-useful fork — held the line on refusing,
+or chased the second photo? And what's the dumbest one-line bug that ever cost you a
+week?