| # No signal, no GPU, no second chances: building an AI for the woods |
|
|
| Picture the worst possible place to run a machine learning model. No cell signal. |
| No cloud to phone home to. No GPU, just a Raspberry Pi and a little accelerator |
| chip the size of a stamp, running off a battery I've been babying since sunrise. |
| And the user? The user is standing in a damp forest about to decide whether to put |
| something in their mouth. |
|
|
| That's foraging. That's the actual deployment target. I'm not shipping a chatbot |
| that gets to be wrong in an interesting way. I'm shipping a thing that, if it's |
| confidently wrong, helps somebody have the last bad afternoon of their life. |
|
|
| So when people ask why my models are so small, I laugh a little. Small isn't the |
| compromise here. Small is the only thing that survives contact with the woods. The |
| whole project is a study in constraints, and honestly, the constraints did most of |
| the design work for me. I just had to stop fighting them. |
|
|
| Here's the stack: a domain router plus three expert classifiers, all |
| `tf_efficientnet_lite2`, about nine million parameters each. Four models, roughly |
| 0.04 billion parameters total, which in the year of our lord 2026 is a rounding |
| error. They run offline on the device and they run on a plain CPU, no GPU required. |
| That's not me being humble. That's me being a forager with a dead phone two miles |
| from the trailhead. The "edge" everybody name-drops is, for me, just Tuesday. |
|
|
| ## Constraint number one: the model is not allowed to vote |
|
|
| My first real architecture was clever and I was proud of it, which is usually the |
| tell that something is about to go wrong. The router would decide a photo was a |
| "plant," then I'd run two expert models on it and take whichever was more |
| confident. Max confidence wins. Democratic. Elegant. |
|
|
| It also tried to feed people poison hemlock. |
|
|
| Here's the failure, and it's a beautiful one. A deadly plant β say, poison hemlock β |
| gets routed to "plant." My high-value forageables expert has never seen hemlock in |
| its life, but it has seen ramps, and hemlock and ramps are cousins that have killed |
| people who confused them. So the high-value model looks at a deadly plant and |
| announces, at 0.9 confidence, that it's ramps. Delicious ramps. Meanwhile my |
| medicinals expert, the one that actually knows hemlock is death, correctly flags it |
| as deadly at lower confidence. Max confidence wins. The confident idiot |
| out-votes the cautious expert. In my tests this leaked deadly-as-edible about six |
| percent of the time, and not one of my per-model benchmarks caught it, because no |
| single model was ever wrong. The *system* was wrong. |
|
|
| So I killed the vote. Now the router picks exactly one expert and that expert owns |
| the call, full stop. The mushroom expert never sees a plant, so it can never call a |
| plant anything. On top of that, any "deadly" verdict vetoes a more-confident "safe" |
| one, because in this domain a false reassurance is the only error that actually |
| matters. Deadly-as-edible dropped from around six percent to half a percent. The |
| constraint β keep it small, keep it dumb, keep the experts in their lanes β turned |
| out to be the safety feature. |
|
|
| ## Constraint number two: being safe is really easy if you don't mind being useless |
|
|
| Now the push and pull. Once you've been spooked by a near-miss, the temptation is to |
| crank every safety dial to the max. Refuse by default. When in doubt, abstain. Very |
| responsible. Very noble. |
|
|
| It's also how you build a tool nobody uses. I pushed the confidence gate up and |
| watched the thing turn into a coward. It started abstaining on blackberries. |
| Blackberries. At the safe-but-useless end of the curve it would refuse on roughly |
| half of perfectly edible finds, which is a fantastic way to teach your users that the |
| gadget is a paperweight and their own guessing is faster. And here's the part that |
| took me an embarrassing while to accept: you cannot gate your way to zero. I mapped |
| the whole curve. Loosen the gate and you get a decisive, useful tool that |
| occasionally calls something dangerous safe. Tighten it and you get a safe tool that |
| cries "I don't know" at a raspberry. There is no magic threshold that gives you both, |
| because the residual risk isn't low confidence β it's the model being confidently, |
| specifically wrong, and no confidence knob catches that. |
|
|
| Tightening the screw was a dead end. So instead of asking the model to be more sure, |
| I started asking it for more evidence. |
|
|
| The move we landed on β and I'll be straight, this lives in a test harness right now, |
| not in the shipped app yet β is to stop treating one photo as the whole story. Show |
| the model the same subject from a couple of angles, or even just a few augmented crops |
| of the one shot, and fuse the results before it commits. In our harness this was |
| close to a free lunch: the multi-angle version drove deadly-as-edible toward zero on |
| the domains we tested while *lifting* accuracy on the safe stuff, not crushing it. |
| Even better is making the second photo a targeted ask β only when the top guess is a |
| safe-looking thing that has a deadly twin. The app turns to you and says "okay, |
| photograph the stem before I sign off on this." The friction is the feature. The |
| moment of "hang on, show me more" is exactly the moment a careful forager would slow |
| down anyway. We're not nagging with a banner nobody reads; we're building the caution |
| into the interaction. That part's still being wired in, and I'll write it up properly |
| when it is. |
|
|
| ## The one-line bug that told me my safety net was broken |
|
|
| A short detour into humility, because this is a field notes post and field notes |
| should include the faceplants. |
|
|
| For ages I believed my out-of-distribution detector didn't work. This is the piece |
| that's supposed to notice when you point the camera at, I don't know, a car, or your |
| own shoe, and say "that's not food, I'm not playing." I was using an energy score for |
| it, and every time I measured the thing it scored about 0.25 AUROC, which for the |
| non-nerds means it was worse than a coin flip. Inverted. Useless. I shelved it and |
| moved on, mildly betrayed. |
|
|
| Then I actually read my own code. The energy formula subtracts the max value for |
| numerical stability and is supposed to add it right back. Mine subtracted it and |
| forgot to add it back. One term. That single missing piece didn't just make the |
| number wrong, it flipped its meaning, so my detector was confidently pointing the |
| wrong direction the entire time. Fixed the one line, re-ran it, and the same detector |
| jumped to around 0.90 AUROC on my hardest domain. It had been working all along. I |
| had been reading the dial upside down. |
|
|
| The lesson I keep taped to the inside of my skull now: when a detector says it's |
| failing, check that you're not holding it backwards before you believe it. SHOCKING, |
| I know. |
|
|
| ## The actual thesis |
|
|
| Here's why I think the constraints were a gift and not a punishment. When you've got |
| a giant model and infinite compute, you can paper over hard calls. You can be a |
| little wrong everywhere and call it nuance. Out here, on a stamp-sized chip in a |
| forest, there's nowhere to hide. Every tradeoff has to be made out loud: do I want |
| decisive or do I want safe, and where exactly do I put the line, and who gets hurt at |
| each setting. The edge didn't limit the engineering. It made the engineering honest. |
|
|
| Which, now that I say it out loud, is the whole reason I'm doing any of this. When |
| everything else goes digital and frictionless and confident, the stuff that keeps you |
| alive is still physical, still slow, still asks you to look twice. I built a machine |
| that's good enough to be worth beating. The goal was never to replace the forager. It |
| was to keep them sharp. |
|
|
| Or so I tell myself, two miles from the trailhead, photographing a stem. |
|
|
| What would you have done at the safety-versus-useful fork β held the line on refusing, |
| or chased the second photo? And what's the dumbest one-line bug that ever cost you a |
| week? |
|
|