Detection-as-code over open tables
Detection-as-code stops being a slogan the moment your detections are versioned code running over open tables instead of rules trapped inside a vendor's box.
Detection-as-code over open tables
Detection-as-code stops being a slogan the moment your detections are versioned code running over open tables instead of rules trapped inside a vendor’s box.
For most of my career, “detection-as-code” was an aspiration we approximated. We kept detections in version control, sure, but they still executed inside a SIEM that owned the data, billed by the gigabyte, and made anything resembling a real test cycle expensive enough that nobody did it honestly. You wrote a rule, you shipped it, and you found out in production whether it was any good. The code was under version control; the engineering discipline mostly wasn’t.
What changes when the SIEM is a lakehouse, when the telemetry lives in open tables you query directly instead of a proprietary store, is that the gap between “detection-as-code” and actual software engineering finally closes. Three things become possible that were previously too expensive to bother with, and all three are just normal engineering practice that security never got to have.
Backtesting becomes a replay, not a budget request. The single most useful question about a new detection, “would this have fired on the last year of our actual traffic, and how often, and on what,” is one most teams can’t afford to ask, because replaying a year of telemetry through a per-gigabyte SIEM is a line item. Over open tables it’s just a query against data you already retain. You can measure a rule’s true-positive behavior and its false-positive volume before it ever reaches an analyst’s queue. That one capability changes the economics of tuning from “ship and pray” to “measure and decide,” which is the whole game.
Detection CI becomes real. Once your detections are code and your historical telemetry is queryable, you can build the thing detection engineering has always deserved and rarely had: a real test suite. Every rule carries its known-true and known-false fixtures. A change to a rule runs against them in CI. A pull request that would blow up false positives fails the build instead of paging someone next week. I built a version of this for a global detection team years ago: 250 versioned rules with gated rollout across more than 900 companies, millions of users, and over ten million devices, bolted onto a traditional stack. It earned its keep the day the gate caught a buggy rule that would have fired so many alerts it would have shut down the SOC’s ticketing system. A bad detection isn’t just noise; at volume it’s a denial of service you run against your own analysts. On a lakehouse the gate stops being a bolt-on and becomes the native shape of the work.
Failed detections become test cases. This is the part I find most interesting, because it connects to how I think about improving models generally. When a detection misses something, or fires on something it shouldn’t, the instinct is to tweak the rule and move on. But that missed event is the most valuable test fixture you’ll ever get, a real example of the exact failure you’re trying to prevent. Capture it, add it to the rule’s fixture set, and now your detection can never silently regress on that case again. It’s the same move I use to fix model behavior in the Discipline Patch: a failure isn’t a grade, it’s a diagnosis, and the diagnosis becomes the corrective example. Detection engineering and model engineering turn out to be the same loop wearing different hats, and a lakehouse is what lets the detection side actually run it.
There’s a fair question lurking here, which is whether any of this needs AI at all, and the honest answer is no, not the parts above. Versioned rules, cheap backtesting, and detection CI are good engineering that stand on their own. But they’re also the substrate that makes agentic detection trustworthy. An agent that drafts detections is only as safe as the test harness it has to clear, and a lakehouse is what makes that harness cheap, fast, and run against real history instead of a handful of synthetic samples. The agent gets the leverage; the open table and the CI gate are what keep it honest. This pairs with the way I’ve argued the graph should be the architecture for agentic systems: get the data layer right and open, and the intelligence layer gets a lot more boring to govern.
I’ll admit a small frustration writing this. The most interesting implementation of this idea, an open, agentic SIEM built natively on a lakehouse, is in private preview right now, and I’m on the outside of the invite list looking in. But you don’t need the product to adopt the practice. The lesson is portable and available today: stop treating your detections as configuration trapped in someone else’s box, and start treating them as what they actually are, software, tested over your own data, improved by its own failures.
Axioms applied in this essay
This article tested 5 of the StoneyTECH engineering axioms. Each verdict is the result of applying that axiom in this specific argument.
- #2 Push work down toward determinism held
The open table and the CI gate form the deterministic layer keeping the agentic layer honest.
- #3 Probe → measure → refine → scale held
Backtesting over retained telemetry turns tuning from ship-and-pray into measure-and-decide before an analyst ever sees the rule.
- #9 TDD per deliverable held
Known-true and known-false fixtures per rule is TDD applied to detections, and failed detections become the next fixtures.
- #12 The model is the smallest lever; reach for it last held
The piece concedes the best parts need no AI at all; the agent comes last, on top of the substrate.
- #16 Don't comment without building. Don't curate without proving. held
The claims ride on a platform actually built and shipped (250 rules, 900+ tenants); the untested product gets flagged as untested.
