On system tests

Monday, February 9th 2026Zeke Gabrielse, Founder of Keygen

Keygen's test suite takes just over 2 hours to run on a single core. It's not great, I'll admit that, but the alternatives seem worse to me. The bulk of time is spent on authorization and system tests.

The former could probably be sped up using fixtures, but authorization tests depend on a lot of unique states (and modeling those correctly in fixtures may be just as expensive as using factories).

But the latter cannot really be sped up, since system tests do exactly what their name implies: they test the entire system.

There's a large negative sentiment towards system tests these days. They're slow, they're brittle, etc., etc. The Rails community has recently joined in on this negativity, thanks to people like DHH dismissing system tests as unnecessary.

One of the arguments against system tests is that they don't (or they shouldn't?) test business logic. But I disagree. Keygen is chock-full of business logic inside of middleware, inside of core extensions, etc., etc., and many of these parts don't interact with each other until the system is run in its entirety — and unfortunately, that interaction is often where the subtle bugs surface themselves.

Keygen's extensive system tests are how I'm able to confidently ship a stable business-critical product as a single person, and how I've been able to do that for coming up on 10 years now.

Every bug report I get first gets at least one system test written for it, to both verify the bug and find the root cause, but also to catch future regressions before they reach customers again.

I'll admit that I've gone "against the grain" quite a bit with Keygen's test suite (among other things). Not a lot of people these days use Gherkin and Cucumber, but I've had a lot of success with them. It's easy to write useful system tests in Gherkin, and even though I might've started out abusing system tests to also act as unit tests, I don't regret the choice I made 9 years ago to use Cucumber.

You can unit test everything, yet even if you do, there are no guarantees that everything is wired together properly until you actually test it. You either do this automatically via system tests, or you do it manually (or your customers do it for you). The Rails application stack can be complex, and making sure the path from request to middleware to controller to response works correctly is vital.

I'll admit, I probably have an easier time than most do, because Keygen is a JSON API. I'm not sure I would write as many system tests for an HTML-rendered application, simply due to brittleness. A JSON API is by-design unlikely to change, so system tests are usually written once and then not touched again, whereas system tests for an HTML app might need to change as the UI evolves.

But maybe that's a tooling issue more than anything else. I don't have enough experience writing system tests for HTML to be able to speak to that, at least confidently. But it seems like a lot of the flack system tests get is because they're brittle and they suck to write.

(If we're being honest, I'd much rather use Rails solely as a backend with React for the frontend, and test each app separately.)

Even with those problems, before you dismiss system tests because somebody told you to, consider what they might bring to the table. I've caught many a bug that unit tests alone would not have caught — but my production systems, i.e. my customers, would have. Relying on "smoke tests" to catch these will eventually fail you.

Customers value stable software (though how much largely depends on what you offer), and system tests help me build stable software.

If the objection isn't brittleness, but rather that system tests are slow, then I'd argue that time is easily parallelized, at least to an extent. That single-core time of ~2 hours that I originally reported runs in ~10 minutes when parallelized across 32 cores.^[1]

I could probably parallelize even more by splitting unit and system tests into their own runners, instead of running serially, and chunking larger test files. But that's a challenge for another day.

At the end of the day, you can unit test everything, but without system tests, how do you know it all works together?

[1]: I very recently moved from free GitHub runners (fair source, ftw!) to paid Depot runners, and cut CI time from ~80 minutes on 2-cores to that above mentioned 10 minute mark on 32-cores. Worth it? Not entirely sure yet, because I haven't seen what a month of usage costs yet — but I do like the iteration speed boost, which was starting to feel bottlenecked by having to wait on CI.