Move Fast and Break Things (Without Breaking Things)
Monday, June 27th 2022
tl;dr: We've open sourced a new gem called request_migrations
for versioning REST APIs, inspired by Stripe's API versioning system.
There's almost nothing worse than accidentally making a breaking API change and not realizing it. But some things do come close.
Like feeling like you can't undo past mistakes in your API's design.
If you've ever been involved with designing and maintaining an API service, you probably know how both of these feel.
(It's not good.)
But I would argue that the latter is actually worse.
At least you feel like you can fix the first one.
Feeling like you can't undo past mistakes—frankly—sucks.
It leaves your API's design in a bad state, which sucks for developer morale, and it can leave your API interface a confusing mess, which sucks for customers.
It also leaves you feeling like you have to absolutely nail an API's design the first time, even if you don't know what the perfect interface is yet.
Sure, you may have customers that use it (they use everything after all!), but that doesn't mean it's ideal, or that it's even good.
I've been working on Keygen for over 6 years now, and I've made some design mistakes.
From choosing bad names for things, to just poor API design.
Sometimes it may feel like you'll be stuck with your mistakes forever.
I know I sure did.
But it's not forever — there is a way to fix things.
A way that doesn't require throwing everything out and retrying with a new API version.
But first, some context.
The great mistake
Mid last year, I happily released a redesign of our distribution API.
Version 1 of the distribution API started as a small proof-of-concept written in Go.
Essentially, it was a micro-service that glued together Keygen's licensing API and AWS S3, put onto its own domain. It stored artifacts gated by license validation.
It was pretty hacky, but it proved that people would pay for the concept.
Version 2 of the distribution API was a complete rewrite, written in Ruby instead of Go, and completely integrated with our Rails-based licensing API.
At the time, I thought it was some of my greatest work.
The code was beautiful.
After months of development and testing, I deployed it.
And customers started to use it.
But months passed, usage grew, and I started to see some major design mistakes. And depending on their use case, so did my customers.
What were the mistakes? Well, all of them boiled down to this:
I assumed that a release object would only have a single artifact.
Instead of treating a release like a bucket of artifacts, I chose to treat a release as an artifact. What should have been two models, ended up being one.
My original thinking was kind-of-sort-of modeling a file-system, where each release was a "file" with a unique filename. But it just wasn't the right thinking.
This bad thinking meant that there wasn't just a single "v1" release — there were many, all with different filenames, filetypes and platforms, and on various architectures.
Because of this, you couldn't have multiple versions of the same artifact. For example, you could only have 1 artifact called stable.yml
— the latest version.
This made trivial things a challenge, such as displaying all v1 releases on a page.
And some things impossible, such as accessing previous versions of an artifact.
It also made a typical draft to published flow super tough, since every release's published status had to be managed separately. There wasn't a quick "publish all."
The design just didn't stand up to the real world.
I had made a big mistake, and it only took me a few months to realize it. Just enough time for customers to depend on my mistakes!
I should have caught onto the mistake earlier, while I was writing tests, but I didn't.
And this mistake introduced so many other compounding issues that I almost wanted to throw the whole thing out and restart.
It sucked, and I felt absolutely stuck.
I thought —
"Should I go ahead and scrap v2 and start working on a v3?"
"What would my customers think?"
After all, some of them just upgraded from the old version.
"Am I prepared to maintain multiple versions?"
No, I'm not.
But the changes were just too big to do otherwise.
At the time, I happened to stumble upon an old GitHub issue I had opened in early 2019, titled "Come up with a good API versioning strategy."
It linked to some great articles by Stripe and Intercom on API versioning.
I studied these, and others like them, and started to see a light at the end of the tunnel.
So I pulled up my sleeves and started brainstorming how I could fix the mess I made.
Fixing the mistake
It took what felt like a couple months to fully think through how I was going to go about fixing my mistake. More than once I felt like burning the whole thing down.
I took notes from Stripe, Intercom, and many others.
And I settled on the following plan:
- Create a real artifact model. Before, it was kind of an epheremal pseudo-model that housed a pointer to S3. But now it would be a true model, managed through an API resource like everything else is.
- Move
filename
,filesize
,filetype
,platform
andarch
from the release model to the new artifact model. This removed most attributes from the release model, leaving behind aversion
. - Update the release's artifact relationship from a has-one to a has-many. Remember, releases should represent a versioned bucket of artifacts, not a single artifact.
Now, this may seem like a simple changeset. But I can assure you, it was not.
Doing these without breaking existing API contracts was incredibly tough.
And the changes that ensued were felt throughout nearly the entire codebase, requiring a redesign of nearly all other distribution-related endpoints.
And not only that, it called for a complete redesign and major version of our CLI, a new electron-builder integration, as well as a new major version of our Go SDK.
It was a lot of code to write for just one guy.
I essentially rewrote the entire distribution API, and all of its tooling, all while maintaining the old API contracts.
To illustrate the changes — here's the diff for the release object:
{ "data": { "id": "30c64dcd-a74d-4f0d-8479-8745172a4817", "type": "releases", "attributes": { "name": "Example Release 2.0", "description": "This is an example description for the release.",- "signature": "qqcVWX402un4PEoa+E1VMBfPaBJ1RSxwiGVwrFpGfbI7dulfIqUlovvm1X96m3G2Sjl8gXUDEr8gLEAbJuQQCQ", - "checksum": "lQ8T/qtGvsbqsDaXBqBMh6h2AGL8mTGI4XgLvLDYZA3EumRH8gIMjJ2l5lsO5L0LIvYVqWNXPVzTEp03H4yfZA",- "filename": "Product-2.0.0.dmg",- "filetype": "dmg",- "filesize": 209715200,- "platform": "darwin", "channel": "stable", "status": "PUBLISHED", "version": "2.0.0", "semver": { "major": 2, "minor": 0, "patch": 0, "prerelease": null, "build": null }, "metadata": {}, "created": "2021-05-14T19:54:16.289Z", "updated": "2021-05-19T13:30:56.698Z", "yanked": null }, "relationships": { "account": { "links": { "related": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2" }, "data": { "type": "accounts", "id": "59e8b93a-3b09-4d07-94be-3ee2de040de2" } }, "product": { "links": { "related": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2/releases/30c64dcd-a74d-4f0d-8479-8745172a4817/product" }, "data": { "type": "products", "id": "652da162-cd35-4814-bd28-910a0df0dfad" } }, "constraints": { "links": { "related": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2/releases/30c64dcd-a74d-4f0d-8479-8745172a4817/constraints" } },- "artifact": { - "links": {- "related": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2/releases/30c64dcd-a74d-4f0d-8479-8745172a4817/artifact"- },- "data": {- "type": "artifacts",- "id": "51c4d24e-a292-4b36-b8e6-1fcb1b335f1d"- }- } + "artifacts": { + "links": {+ "related": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2/releases/30c64dcd-a74d-4f0d-8479-8745172a4817/artifacts"+ }+ },+ "upgrade": {+ "links": {+ "related": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2/releases/30c64dcd-a74d-4f0d-8479-8745172a4817/upgrade"+ }+ } }, "links": { "self": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2/releases/30c64dcd-a74d-4f0d-8479-8745172a4817" } } }
And here's a diff of the artifact object:
{ "data": { "id": "0dad8516-f071-4573-bcea-d774e81c4a37", "type": "artifacts", "attributes": {- "key": "install.sh", + "filename": "install.sh", + "filetype": "sh",+ "filesize": 3097,+ "platform": null,+ "arch": null,+ "signature": "q73uw0RZ3MDooFeIUYP2iMYSWsHdL2MIhnq74IiGVEVXx0Qxeuh6eWDvlbkZ15RmjxRlTeJFjTwOubF9Hdc9Aw",+ "checksum": "l7PETeny2BRIC4T7tC1w0dLOeR0ghWtDJZw3GIuIK9LEdSKRZKda7iWJVkH9KhDSroPunsAAJ1T14UB88MFiBg", "status": "UPLOADED", "metadata": {}, "created": "2022-05-30T13:28:01.592Z", "updated": "2022-05-30T13:28:31.786Z" }, "relationships": { "account": { "links": { "related": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2" }, "data": { "type": "accounts", "id": "59e8b93a-3b09-4d07-94be-3ee2de040de2" } }, "release": { "links": { "related": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2/releases/8157c656-c60f-4b82-b93f-3b3ed73abf80" }, "data": { "type": "releases", "id": "8157c656-c60f-4b82-b93f-3b3ed73abf80" } } }, "links": { "related": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2/releases/8157c656-c60f-4b82-b93f-3b3ed73abf80/artifacts/0dad8516-f071-4573-bcea-d774e81c4a37", "self": "/v1/accounts/59e8b93a-3b09-4d07-94be-3ee2de040de2/artifacts/0dad8516-f071-4573-bcea-d774e81c4a37" } } }
In addition, lots of API endpoints changed, too. Some endpoints were removed (hidden), and some were added, while others now behaved differently.
The upgrade flow changed completely, severing bad habits I picked up from copying Electron's way of serving artifacts for an update server. (Well, Squirrel's way.)
So how did I do it? Well, I wrote a gem to help me.
Introducing request_migrations
Inspired by how Stripe and friends do API versioning, I wrote an internal gem. It's been running great in the real world, and I wanted to extract it for others to use.
Today, I'm open sourcing it as request_migrations
.
It works by applying migrations to a request or a response object, transforming data from the app's current version, to some target version that the client asks for.
Here's an illustration —
Essentially, it's a simple data transformation pipeline.
Migrations should be small and single purpose. They should act upon an input shape, some data, a request, or a response, and transform it to an expected output shape.
It took 12 migrations to fix my earlier mistakes.
To illustrate some of these, let's look at a couple real-world migrations — from Keygen.
This one transforms a release's now has-many relationship back into a has-one:
class ArtifactHasManyToHasOneForReleaseMigration < BaseMigration description %(transforms a release's artifacts from a has-many to has-one relationship) # Match on singular objects migrate if: -> body { body in data: { ** } } do |body| case body # Match on the release object's shape in data: { type: 'releases', id: release_id, relationships: { account: { data: { type: 'accounts', id: account_id } }, artifacts: { ** } } } artifact = Artifact.select(:id).find_by(release_id:, account_id:) body[:data][:relationships].tap do |rels| # Define the has-one relationship rels[:artifact] = { data: artifact.present? ? { type: 'artifacts', id: artifact.id } : nil, links: { related: v1_account_release_artifact_path(account_id, release_id), }, } # Remove the has-many relationship rels.delete(:artifacts) end else end end # Match on successful responses of a release's CRU_ endpoints response if: -> res { res.status < 400 && res.request.params in controller: 'api/v1/releases' | 'api/v1/products/relationships/releases', action: 'show' | 'create' | 'update' } do |res| body = JSON.parse(res.body, symbolize_names: true) # Transform the response body migrate!(body) res.body = JSON.generate(body) endend
(Ruby's new pattern matching really knocks it out of the park here!)
The migration matches on the releases controller, for the CRU_ actions in particular.
From there, it matches on a specific shape of data — a release.
Then it transforms that data from the current API version, v1.1, to the old API version v1.0.
My codebase, for the most part, no longer needs to care about v1.0. This small compatibility layer by request_migrations
abstracts all of that away.
Let's take a look at another real-world migration.
Remember how I moved attributes from the release object to the artifact object?
Well, I have a contract with clients using v1.0 that says those attributes belong to a release object, not the artifact object.
To maintain that contract, this migration handles copying a release's artifact attributes back onto the release object.
class CopyArtifactAttributesToReleaseMigration < BaseMigration description %(copies artifact attributes onto a release) migrate if: -> body { body in data: { ** } } do |body| case body in data: { type: 'releases', id: release_id, attributes: { ** }, relationships: { account: { data: { type: 'accounts', id: account_id } } } } artifact = Artifact.find_by(release_id:, account_id:) body[:data][:attributes].tap do |attrs| attrs.merge!( platform: artifact&.platform, filetype: artifact&.filetype, filename: artifact&.filename, filesize: artifact&.filesize, signature: artifact&.signature, checksum: artifact&.checksum, ) end else end end response if: -> res { res.status < 400 && res.request.params in controller: 'api/v1/releases' | 'api/v1/products/relationships/releases', action: 'show' | 'create' | 'update' } do |res| body = JSON.parse(res.body, symbolize_names: true) migrate!(body) res.body = JSON.generate(body) endend
I have a rule that migrations can only operate on a particular shape of data.
Even though it's possible to lump everything into a single migration, it really helps to keep things nice and organized. It reduces cognitive overhead.
That means both of these migrations have accompanying migrations for their index actions, operating on an array of objects instead of an object.
Each migration is then assigned to a specific API version via an initializer:
RequestMigrations.configure do |config| config.current_version = '1.3' config.versions = { '1.2' => [ # ... ], '1.1' => [ # ... ], '1.0' => [ ArtifactHasManyToHasOneForReleaseMigration, CopyArtifactAttributesToReleaseMigration, # ... ], }end
The rest, more or less, is automagic.
Try it for yourself
If you maintain a Rails API, please check out request_migrations
and let me know what you think! Make some breaking changes.
Now if you'll excuse me —
I'm going to go fix the other mistakes I made over the last 6 years.
Until next time.
Follow us on Twitter: @keygen_sh