How to Build a Webhook System in Rails Using Sidekiq
Wednesday, Jun 16th 2021
It's 2021, and we're in the heyday of SaaS. As the SaaS economy has grown, so has the need for these services to communicate events to each other asynchronously. Complex business logic can often span across both internal and external services, and one way external communication is accomplished is with a design pattern called "webhooks."
If you've ever integrated with an API service such as Stripe, you're probably
well aware of what a webhook is. For Stripe, events like customer.created
and invoice.*
are table-stakes for many businesses. (At least they are for ours.)
Many other services, and even our own API, send webhooks for important events. There are even entire businesses built on the concept of webhooks.
Suffice it to say — webhooks are here to stay. They've gone from nice-to-have to must-have for any sort of API-based SaaS.
Speaking of SaaS —
There are a number of webhooks-as-a-service options out there, such as HostedHooks and YC-backed Svix. But much to my dismay, none of these existed back when I started Keygen, so I figured I'd share the knowledge on how I've built mine over the years.
It's actually kind of funny — back in 2017, I had a similar idea scribbled down in my "notebook of start up ideas", but never got around to building it as I've been too busy with Keygen over the past 5 years. But I'm glad others have ran with the idea.
But running a webhook system yourself has its, ehm… quirks? The quirks are usually due to misbehaved webhook endpoints, but we'll touch more on that later. It seems like every month I'm adjusting things to handle new quirks.
Another thing that can take some time to get right is a retry cadence that works for both you and your users, which we'll also cover in a bit more detail later on.
Anyways —
These days, you may want to spend some time and ask yourself the build-vs-buy question. Using a third-party for webhooks may save you time (and money.)
But today, we're going to build. And we can move fast when we're on Rails.
Defining our webhook resources
In its simplest form, a webhook system is built on top of the pub/sub design pattern:
Some event e happens in our service, and we want to notify all subscribers s.
In the context of a webhook, an event e will consist of the following information:
- An
event
— this will be an event name, for examplepayment.successful
,license.expired
, oruser.updated
. - A
payload
— this will be the data we'll be sending to subscribers. Typically, it's a snapshot of the affected resource. For example, theuser.updated
event may send a snapshot of the user after it was updated.
And a subscriber s will consist of the following information:
- A
url
— this will be the URL that the webhook is delivered to.
Since we don't know how big the set of s is, we don't want to run these notifications inline with our normal application code. Why? Because the bigger s is, the slower our app will be, and we also have no control over deliveribility speed, other than through setting an upper bound on webhook execution time with a timeout.
So rather than run inline, we'll need some sort of queueing and background job system so that we can asynchronously process notifications to each subscriber.
If you've ever used Rails, you're probably familiar with the ubiquitous Sidekiq gem.
Sidekiq is powerful background job library that we'll be using to queue up webhook events and process them asynchronously. And we'll also be leaning on Sidekiq to handle the bulk of our retry logic (more on this later.)
The webhook system we're going to build today is actually modeled after Keygen's webhook system. As of writing, Keygen has processed nearly 250 million webhook events using Sidekiq. (Thanks for your work, Mike!)
Building out our models
To kick things off, let's create a new Rails 6 API application:
$ rails new . --api --database postgresql \ --skip-active-storage \ --skip-action-cable
Then we'll want to create our first table, a webhook endpoint. These are going to represent the subscribers s in our webhook system.
Let's generate a new migration:
$ rails g migration CreateWebhookEndpoints
Next, we'll want to define the schema, which only consists of a url
:
class CreateWebhookEndpoints < ActiveRecord::Migration[5.2] def change create_table :webhook_endpoints do |t| t.string :url, null: false t.timestamps null: false end endend
Now before we run our migration, let's also go ahead and define the other table in our system, the webhook event. As the name implies, these will represent the events e in our webhook system. Let's generate another migration:
$ rails g migration CreateWebhookEvents
Then let's define the schema, which consists of a reference to a webhook endpoint,
as well as the event
and payload
attributes we discussed earlier:
class CreateWebhookEvents < ActiveRecord::Migration[5.2] def change create_table :webhook_events do |t| t.integer :webhook_endpoint_id, null: false, index: true t.string :event, null: false t.text :payload, null: false t.timestamps null: false end endend
Now let's go ahead and run those migrations:
$ rails db:migrate
Next, we'll want to define the 2 models. We'll start with the webhook endpoint:
class WebhookEndpoint < ApplicationRecord has_many :webhook_events, inverse_of: :webhook_endpoint validates :url, presence: trueend
Lastly, we'll define the webhook event model:
class WebhookEvent < ApplicationRecord belongs_to :webhook_endpoint, inverse_of: :webhook_events validates :event, presence: true validates :payload, presence: trueend
Okay, phew —
Let's go ahead and test out what we've got in the console:
$ rails c> WebhookEndpoint.create!(url: 'https://functions.ecorp.example/webhooks')# => #<WebhookEndpoint# id: 1,# url: "https://functions.ecorp.example/webhooks",# created_at: "2021-06-14 22:14:53.587473000 +0000",# updated_at: "2021-06-14 22:14:53.587473000 +0000"# >> WebhookEvent.create!( webhook_endpoint: _, event: 'events.test', payload: { test: 1 } )# => #<WebhookEvent# id: 1,# webhook_endpoint_id: 1,# event: "events.test",# payload: { "test" => 1 },# created_at: "2021-06-14 22:17:06.908392000 +0000",# updated_at: "2021-06-14 22:17:06.908392000 +0000"# >
Building our webhook worker
Now that we have an event queued up, we need to process it. We've talked a lot about Sidekiq, so let's write our webhook worker.
Installing dependencies
To start, let's go ahead and add Sidekiq and Redis to our Gemfile
:
+gem 'sidekiq' +gem 'redis'
We're also going to need an HTTP library so that we can send webhook events to webhook
endpoints. Let's add the popular http.rb
gem.
+gem 'http'
And we'll run Bundler to install everything:
$ bundle
Defining our webhook worker
Next up we'll want to create a new app/workers
directory for our worker classes
to live in, to keep our Sidekiq workers distinct from any ActiveJobs.
$ mkdir app/workers$ touch app/workers/webhook_worker.rb
Now, let's define the base logic for our webhook worker. It's going to accept a webhook
ID as an input parameter, use that to query the endpoint, then POST
the event payload
to the endpoint's URL. If it fails, it'll retry.
require 'http.rb' class WebhookWorker include Sidekiq::Worker def perform(webhook_event_id) webhook_event = WebhookEvent.find_by(id: webhook_event_id) return if webhook_event.nil? webhook_endpoint = webhook_event.webhook_endpoint return if webhook_endpoint.nil? # Send the webhook request with a 30 second timeout. response = HTTP.timeout(30) .headers( 'User-Agent' => 'rails_webhook_system/1.0', 'Content-Type' => 'application/json', ) .post( webhook_endpoint.url, body: { event: webhook_event.event, payload: webhook_event.payload, }.to_json ) # Raise a failed request error and let Sidekiq handle retrying. raise FailedRequestError unless response.status.success? end private # General failed request error that we're going to use to signal # Sidekiq to retry our webhook worker. class FailedRequestError < StandardError; endend
As you can see, our webhook system is actually relatively simple. And it's going to stay that way apart from a little subscription management and some special-case error handling. But overall, you may be surprised at how "simple" this system is.
Since we're leaning on Sidekiq to handle queueing, processing and retries, all
we really have to do is handle delivery. We're attempting delivery by sending an
HTTP POST
request to the endpoint's URL with a JSON-encoded payload:
{ "event": "events.test", "payload": { "test": 1 }}
A quick note: since we're raising an exception here, this is going to get
a bit… noisy. I have some log line filters in place to ignore the verbose stack
traces from FailedRequestError
, so keep that idea tucked away. You can also
silence the stack trace by overriding FailedRequestError#backtrace
, e.g.
def backtrace = nil
.
This whole thing could probably be done without relying on errors for control flow, but I can't be bothered. It works as-is, and ignoring noisy log lines is cheap.
Delivering our first webhook
Let's go ahead and attempt to deliver our first webhook event. For now, we'll run jobs inline from the console, rather than queue them:
$ rails c> WebhookWorker.new.perform(WebhookEvent.last.id)# => Traceback (most recent call last):# 2: from (irb):1# 1: from app/workers/webhook_worker.rb:24:in `perform'# HTTP::TimeoutError (execution expired)
Unless you happen to have a server running at the endpoint you entered, you're probably going to get a timeout error after about 30 seconds.
Which brings us to our first error case: timeouts.
Handing delivery timeouts
First thing's first — let's go ahead and update our worker class to rescue from that error class so that we can handle it more gracefully:
def perform(webhook_event_id) ...+rescue HTTP::TimeoutError + end
But what does "gracefully" mean here? Well, let's think about this from a developer's perspective — what do they want to know? They probably want to know their webhook endpoint just timed out, right?
Ideally, we'd send them an alert. But we won't go that deep today. At the very least, we need a way to store an error on the webhook event model.
But rather than just store errors, why don't we go ahead and store the entire response
?
Providing visibility into what response the endpoint actually sent will offer a
nice developer experience when they're tasked with debugging a webhook integration.
Rather than dig through logs, they can look at their recent failed webhook events.
Let's go ahead and generate a new migration for that:
$ rails g migration AddResponseToWebhookEvents
And we'll add a new jsonb
column to our webhook events called response
:
class AddResponseToWebhookEvents < ActiveRecord::Migration[5.2] def change add_column :webhook_events, :response, :jsonb, default: {} endend
Now let's adjust our worker to store the response object, both for typical responses as well as for newly discovered timeout errors:
def perform(webhook_event_id) webhook_event = WebhookEvent.find_by(id: webhook_event_id) return if webhook_event.nil? webhook_endpoint = webhook_event.webhook_endpoint return if webhook_endpoint.nil? # Send the webhook request with a 30 second timeout. response = HTTP.timeout(30) .headers( 'User-Agent' => 'rails_webhook_system/1.0', 'Content-Type' => 'application/json', ) .post( webhook_endpoint.url, body: { event: webhook_event.event, payload: webhook_event.payload, }.to_json ) # Store the webhook response. webhook_event.update(response: { headers: response.headers.to_h, code: response.code.to_i, body: response.body.to_s, }) # Raise a failed request error and let Sidekiq handle retrying. raise FailedRequestError unless response.status.success?rescue HTTP::TimeoutError # This error means the webhook endpoint timed out. We can either # raise a failed request error to trigger a retry, or leave it # as-is and consider timeouts terminal. We'll do the latter. webhook_event.update(response: { error: 'TIMEOUT_ERROR' })end
Okay, now that we've (hopefully) handled timeouts, let's try delivering the event again:
$ rails c> WebhookWorker.new.perform(WebhookEvent.last.id)# => true> WebhookEvent.last.response# => { "error" => "TIMEOUT_ERROR" }
Nice! As you can see, our response
column now says there was a timeout error. We
obviously don't want to continue using ecorp.example
as our test server, so let's
update our endpoint to something else, say, a Rails server on port 3000
.
In a new terminal pane, let's start up our Rails server:
rails s
Then let's update our endpoint:
$ rails c> WebhookEndpoint.last.update!(url: 'http://localhost:3000/webhooks')# => true
Lastly, let's kick off a new webhook worker:
$ rails c> WebhookWorker.new.perform(WebhookEvent.last.id)# => Traceback (most recent call last):# 2: from (irb):17# 1: from app/workers/webhook_worker.rb:76:in `perform'# WebhookWorker::FailedRequestError (WebhookWorker::FailedRequestError)> WebhookEvent.last.response# => { "body" => "...", "code" => 404, "headers" => { ... } }
As we can see, the worker is (correctly) raising the failed request error for the 404
response,
which will signal Sidekiq to automatically retry the job. And you should also see a line in your
server logs indicating the 404
:
ActionController::RoutingError (No route matches [POST] "/webhooks"):
Now before we go any further, we need to adjust our webhook server to return a 204
response
for the /webhooks
route. To keep things simple, we'll add an inline proc
handler
that returns a basic Rack response.
Let's edit our config/routes.rb
file with a new route:
Rails.application.routes.draw do- # For details on the DSL available within this file, see https://guides.rubyonrails.org/routing.html + post '/webhooks', to: proc { [204, {}, []] } end
And then let's attempt to deliver the webhook one more time again:
$ rails c> WebhookWorker.new.perform(WebhookEvent.last.id)# => nil> WebhookEvent.last.response# => { "body" => "", "code" => 204, "headers" => { ... } }
Note the lack of a failed request error, and our status code is 204
. We should also see
a log line in our Rails server logs indicating the request was sent:
Started POST "/webhooks" for ::1 at 2021-06-15 10:04:34 -0500
Broadcasting our webhook events
Now, creating webhook events and delivering them inline from the console is kind of cumbersome. We've also been sending the same event every time. Let's write a service object that helps streamline the process of broadcasting new events to our webhook endpoints.
Let's create a new services
directory, and a file for our new service object:
$ mkdir app/services$ touch app/services/broadcast_webhook_service.rb
Our service object is going to accept an event
and a payload
, and it will create a new
webhook event for each endpoint and queue it up for delivery:
class BroadcastWebhookService def self.call(event:, payload:) new(event: event, payload: payload).call end def call WebhookEndpoint.find_each do |webhook_endpoint| webhook_event = WebhookEvent.create!( webhook_endpoint: webhook_endpoint, event: event, payload: payload, ) WebhookWorker.perform_async(webhook_event.id) end end private attr_reader :event, :payload def initialize(event:, payload:) @event = event @payload = payload endend
In the real world, this service may only want to broadcast events to endpoints belonging to specific users, for example if your app is multi-tenant. But right now, we aren't going to worry about that sort of detail.
Let's test the service out in our console:
$ rails c> WebhookEvent.delete_all# => 1> BroadcastWebhookService.call(event: 'events.test', payload: { test: 2 })# => nil> WebhookEvent.last# => #<WebhookEvent# id: 2,# webhook_endpoint_id: 1,# event: "events.test",# payload: { "test" => 2 },# created_at: "2021-06-15 15:43:21.767801000 +0000",# updated_at: "2021-06-15 15:43:21.767801000 +0000",# response: {}# >
Curious — response
is empty, and we don't see any request logs for our Rails server.
What gives? Well, we queued up the webhook worker, but we're no longer processing it
inline. We'll need to boot up a Sidekiq process and let our workers work.
In a new terminal pane:
$ sidekiq
Then we can go back to our console session and check the last event:
$ rails c> WebhookEvent.last# => #<WebhookEvent# id: 2,# webhook_endpoint_id: 2,# event: "events.test",# payload: { "test" => 2 },# created_at: "2021-06-15 15:48:32.801960000 +0000",# updated_at: "2021-06-15 15:48:32.810783000 +0000",# response: {# "body" => "",# "code" => 204,# "headers" => { ... }# }# >
The event was delivered successfully, as indicated by the 204
response.
Subscribing to certain events
As your list of event types grows, your users are likely to only care about a handful of them. For example, our API service sends over 60 different event types, but we've found that each customer on average only listens to about 5 of them (which 5 depends on the licensing and billing model for the given business).
Allowing users to subscribe to only the events they need lets us:
- Reduces our costs by not queueing up and delivering superfluous webhooks.
- Reduces their costs by not spamming webhooks they don't need.
It's a win-win.
Let's update our webhook endpoints to be able to subscribe to certain events. To start,
let's add a new subscriptions
column to our webhook endpoints:
$ rails g migration AddSubscriptionsToWebhookEndpoints
Then we'll throw this into the migration file:
class AddSubscriptionsToWebhookEndpoints < ActiveRecord::Migration[5.2] def change add_column :webhook_endpoints, :subscriptions, :jsonb, default: ['*'] endend
Note the default *
wildcard value. We're going to use that to signal that the endpoint
will subscribe to all event types. Rare for production, useful for development.
Let's run the migration:
$ rails db:migrate
And then let's update our webhook endpoint model to require at least 1 subscription, and also add a helper method to the model for checking if a given endpoint is subscribed to an event type. We'll use this throughout our webhook system.
class WebhookEndpoint < ApplicationRecord has_many :webhook_events, inverse_of: :webhook_endpoint + validates :subscriptions, length: { minimum: 1 }, presence: true validates :url, presence: true+ + def subscribed?(event) + (subscriptions & ['*', event]).any? + end end
We can test this out by updating our endpoint's subscriptions:
$ rails c> WebhookEndpoint.last.subscriptions# => ["*"]> WebhookEndpoint.last.subscribed?('events.noop')# => true> WebhookEndpoint.last.subscribed?('events.test')# => true> WebhookEndpoint.last.update!(subscriptions: ['events.test'])# => true> WebhookEndpoint.last.subscribed?('events.noop')# => false> WebhookEndpoint.last.subscribed?('events.test')# => true
Next, we'll adjust our service object to skip over endpoints that are not subscribed to the current event being broadcast:
def call WebhookEndpoint.find_each do |webhook_endpoint|+ next unless + webhook_endpoint.subscribed?(event) + webhook_event = WebhookEvent.create!( webhook_endpoint: webhook_endpoint, event: event, payload: payload, ) WebhookWorker.perform_async(webhook_event.id) end end
Lastly, we'll also want to update our worker to do the same, just in case a webhook is in the process of being delivered but the user has since unsubscribed to that event.
For example, if a given event is particularly noisy, it may cause performance issues
for the end-user's webhook server and they may want to retroactively unsubscribe
from the noisy event. (For instance, our license.validation.*
events can get
pretty noisy depending on the licensing integration.)
Let's adjust our webhook worker to skip over event types that the webhook endpoint is no longer subscribed to:
def perform(webhook_event_id) webhook_event = WebhookEvent.find_by(id: webhook_event_id) return if webhook_event.nil? webhook_endpoint = webhook_event.webhook_endpoint return if webhook_endpoint.nil?+ + return unless + webhook_endpoint.subscribed?(webhook_event.event) # Send the webhook request with a 30 second timeout. response = HTTP.timeout(30) .headers( 'User-Agent' => 'rails_webhook_system/1.0', 'Content-Type' => 'application/json', ) .post( webhook_endpoint.url, body: { event: webhook_event.event, payload: webhook_event.payload, }.to_json ) # Store the webhook response. webhook_event.update(response: { headers: response.headers.to_h, code: response.code.to_i, body: response.body.to_s, }) # Raise a failed request error and let Sidekiq handle retrying. raise FailedRequestError unless response.status.success? rescue HTTP::TimeoutError # This error means the webhook endpoint timed out. We can either # raise a failed request error to trigger a retry, or leave it # as-is and consider timeouts terminal. We'll do the latter. webhook_event.update(response: { error: 'TIMEOUT_ERROR' }) end
We can once again test this out by queueing up a new event that our endpoint isn't subscribed to. Let's do this from the console:
$ rails c> WebhookEvent.count# => 2> BroadcastWebhookService.call(event: 'events.noop', payload: { test: 3 })# => nil> WebhookEvent.count# => 2
Improving our retry cadence
Right now, we're using Sidekiq's default retry cadence, which is an exponential backoff.
This is great for normal background jobs, but in our case, potentially retrying seconds
after a failed webhook usually just exacerbates the problem. Sidekiq's default retry
cadence is retry_count ** 4
, which starts small and eventually gets large.
For example, if a webhook server is timing out because of too many requests, retrying all
of them in quick succession until the backoff grows large enough isn't going to help
the situation. We can alleviate that risk by adding in a little bit of "jitter" into
the retry cadence and increasing the exponent from 4
to 5
:
class WebhookWorker include Sidekiq::Worker+ + sidekiq_retry_in do |retry_count| + # Exponential backoff, with a random 30-second to 10-minute "jitter" + # added in to help spread out any webhook "bursts." + jitter = rand(30.seconds..10.minutes).to_i + + (retry_count ** 5) + jitter + end def perform(webhook_event_id) ... end end
This should do a couple things:
- Reduce occurrences of instantaneous retries, reducing the chance of us exacerbating any issues with the webhook server.
- Help space out sudden large bursts of failing webhooks by using a random jitter between 30 seconds and 10 minutes.
One other thing we can also do is limit the amount of times a webhook will retry:
class WebhookWorker include Sidekiq::Worker + sidekiq_options retry: 10, dead: false sidekiq_retry_in do |retry_count| # Exponential backoff, with a random 30-second to 10-minute "jitter" # added in to help spread out any webhook "bursts." jitter = rand(30.seconds..10.minutes).to_i (retry_count ** 5) + jitter end def perform(webhook_event_id) ... end end
Here we've set the maximum number of retries to 10
, and we've also told Sidekiq to
not store these failed webhooks in its set of "dead" jobs. We don't care about dead
webhook jobs. With our retry exponent of 5
and a maximum retry limit of 10
, retries
should occur over approximately 3 days:
$ rails c> include ActionView::Helpers::DateHelper> total = 0.0> 10.times { |i| total += ((i + 1) ** 5) + rand(30.seconds..10.minutes) }> distance_of_time_in_words(total)# => "3 days"
The exponent and retry limit can be increased to spread the retries out over a longer duration. (The values can also be decreased, of course.)
Disabling our webhook endpoints
Another great feature to have is being able to disable webhook endpoints. This could come in handy for us when we want to disable a problem endpoint, but not delete it (so that the problem can be resolved by the endpoint owner.) This feature can also come in handy for our users, by allowing them to keep certain webhook endpoints on-hand, but only have them enabled when they need them.
To accomplish this, we'll want to add a new enabled
column to our webhook endpoints:
(There's a relatively wide changeset here — we're touching a lot of files.)
$ rails g migration AddEnabledToWebhookEndpoints
And then within the migration, we'll want to have this:
class AddEnabledToWebhookEndpoints < ActiveRecord::Migration[5.2] def change add_column :webhook_endpoints, :enabled, :boolean, default: true, index: true endend
And we'll want to run the migration:
$ rails db:migrate
Next, we'll want to update the webhook endpoint model to have an enabled
scope,
and we'll also add a bang-method to disable!
a webhook endpoint:
class WebhookEndpoint < ApplicationRecord has_many :webhook_events, inverse_of: :webhook_endpoint validates :subscriptions, length: { minimum: 1 }, presence: true validates :url, presence: true+ + scope :enabled, -> { where(enabled: true) } def subscribed?(event) (subscriptions & ['*', event]).any? end+ + def disable! + update!(enabled: false) + end end
Next, the broadcast webhook service needs to utilize the new enabled
scope so that
we only broadcast events to endpoints that are enabled:
def call- WebhookEndpoint.find_each do |webhook_endpoint| + WebhookEndpoint.enabled.find_each do |webhook_endpoint| ... end end
Finally, we'll update the webhook worker to bail early if the endpoint is disabled:
def perform(webhook_event_id) ... return unless- webhook_endpoint.subscribed?(webhook_event.event) + webhook_endpoint.subscribed?(webhook_event.event) && + webhook_endpoint.enabled? ... end
If we go ahead and disable our endpoint and then broadcast a new event, we shouldn't see a webhook delivery occur:
$ rails c> WebhookEndpoint.last.update!(enabled: false)# => true> WebhookEvent.count# => 2> BroadcastWebhookService.call(event: 'events.test', payload: { test: 4 })# => nil> WebhookEvent.count# => 2
If we had a failed webhook event being retried, and then we disabled the event's endpoint, the worker should stop attempting to retry. We aren't going to test that scenario, but it should be covered by the changeset here.
Improving our error handling
When it comes to sending webhooks, there are a plethora of errors that can occur. From DNS issues, to TLS issues, to an ngrok tunnel no longer being active, to various type of timeouts and connection errors.
For now, we're going to handle the first 2: DNS and TLS issues.
Let's adjust our worker to rescue from a couple error classes:
OpenSSL::SSL::SSLError
— this means the TLS connection failed, often due to an expired cert. In my experience, these are often short-lived and resolve within the 3 day delivery window that we've configured for retries.HTTP::ConnectionError
— this is a general "catch-all" fromhttp.rb
. From my experience it usually means DNS, but it's kind of nuanced.
def perform(webhook_event_id) ...rescue OpenSSL::SSL::SSLError # Since TLS issues may be due to an expired cert, we'll continue retrying # since the issue may get resolved within the 3 day retry window. This # may be a good place to send an alert to the endpoint owner. webhook_event.update(response: { error: 'TLS_ERROR' }) # Signal the webhook for retry. raise FailedRequestErrorrescue HTTP::ConnectionError # This error usually means DNS issues. To save us the bandwidth, # we're going to disable the endpoint. This would also be a good # location to send an alert to the endpoint owner. webhook_event.update(response: { error: 'CONNECTION_ERROR' }) # Disable the problem endpoint. webhook_endpoint.disable!rescue HTTP::TimeoutError # This error means the webhook endpoint timed out. We can either # raise a failed request error to trigger a retry, or leave it # as-is and consider timeouts terminal. We'll do the latter. webhook_event.update(response: { error: 'TIMEOUT_ERROR' })end
How we handle each of these errors ends up being pretty arbitrary. I just chose these to exemplify how to handle a few different scenarios:
- Retrying the webhook after an error occurs
- Disabling an endpoint after a fatal error
- Not retrying after an error
Using pattern matching for special cases
I mentioned ngrok earlier and I wanted to share some information on how you could use Ruby's new pattern matching to handle certain response patterns differently. We'll be applying this to ngrok specifically, but you could use the same logic to match against other types of responses as well.
Take an example — when an ngrok user creates a tunnel to a local server, and then adds that URL as a webhook endpoint, often times the tunnel session will be killed at the end of the day, but the webhook endpoint will still be enabled. I've found this to be a common occurrence with various localhost tunnel services.
One way we can handle these ngrok tunnels is by using pattern matching to match against certain response codes and response bodies from ngrok endpoints.
There are 3 scenarios for ngrok endpoints that we're going to cover today:
- When an ngrok URL no longer exists. This will return a
404
response code. We'll handle this by completely deleting the endpoint, since non-stable URLs are randomly generated and cannot be recreated. - When an ngrok URL is active, but the server being tunneled to is no longer
running. This will return a
502
and usually occurs when the developer kills their local server at the end of a work day, but forgets to kill the ngrok session. We'll keep retrying this one, since the ngrok process records the events which can be replayed later on. - When a "stable" ngrok URL is
valid but there is no active tunnel session. This will return a
504
. We'll automatically disable this endpoint.
Let's modify our webhook worker to be able to handle these scenarios:
def perform(webhook_event_id) ... # Exit early if the webhook was successful. return if response.status.success? # Handle response errors. case webhook_event in webhook_endpoint: { url: /\.ngrok\.io/ }, response: { code: 404, body: /tunnel .+?\.ngrok\.io not found/i } # Automatically delete dead ngrok tunnel endpoints. This error likely # means that the developer forgot to remove their temporary ngrok # webhook endpoint, seeing as it no longer exists. webhook_endpoint.destroy! in webhook_endpoint: { url: /\.ngrok\.io/ }, response: { code: 502 } # The bad gateway error usually means that the tunnel is still open # but the local server is no longer responding for any number of # reasons. We're going to automatically retry. raise FailedRequestError in webhook_endpoint: { url: /\.ngrok\.io/ }, response: { code: 504 } # Automatically disable these since the endpoint is likely an ngrok # "stable" URL, but it's not currently running. To save bandwidth, # we do not want to automatically retry. webhook_endpoint.disable! else # Raise a failed request error and let Sidekiq handle retrying. raise FailedRequestError end ...end
Now to actually be able to pattern match against our webhook event model,
we'll need to add a deconstruct_keys
method:
class WebhookEvent < ApplicationRecord belongs_to :webhook_endpoint, inverse_of: :webhook_events validates :event, presence: true validates :payload, presence: true + def deconstruct_keys(keys) + { + webhook_endpoint: { url: webhook_endpoint.url }, + event: event, + payload: payload, + response: response.symbolize_keys, + } + end end
This will allow us to match against the hash pattern we define. In our case, we're
surfacing the webhook_endpoint
, event
, payload
and the response
object.
We can test these scenarios by creating an ngrok tunnel to our local Rails server:
$ ngrok http 3000
Then we can update our webhook endpoint to use the generated ngrok URL:
$ rails c> WebhookEndpoint.last.update!( url: 'https://349df8f512ea.ngrok.io/webhooks' )# => true
Next, we can queue up a new webhook event to send:
$ rails c> BroadcastWebhookService.call(event: 'events.test', payload: { test: 5 })# => nil> WebhookEvent.last.response# => { "body" => "", "code" => 204, "headers" => { ... } }
Looking at our ngrok logs, we see a 204
status code for that webhook. Now, let's
kill our ngrok process and send another event:
$ rails c> WebhookEndpoint.count# => 1> BroadcastWebhookService.call(event: 'events.test', payload: { test: 6 })# => nil> WebhookEndpoint.count# => 0
Using pattern matching, our worker (correctly) determined that the bad ngrok webhook endpoint
should be deleted, since it's now returning a 404
.
Similarly, we can test the other scenarios, for example by keeping the ngrok session active but killing the local Rails server process. But I'll leave that as an exercise for the curious reader.
Caveats and summary
Today, we've covered how to build a webhook system using Rails and Sidekiq. We've learned how we can rely on Sidekiq to do the heavy-lifting for us, and then we broke out Ruby's new pattern matching syntax for some special-case response handling.
So what's next? Here are things to try and look out for:
- Assert that webhook endpoints are non-malicious (one big thing to assert is that they are not pointing to the system itself!)
- Assert that webhook endpoints use TLS.
- Assert that webhook jobs are unique (see Sidekiq Pro's
unique jobs feature, or the
sidekiq-unique-jobs
gem.) - Assert that the webhook event
response
is not too large. (Some endpoints will try to send you the entire Internet to store.) - Add the ability to manually retry events.
- Add better error handling.
- Add logging (!)
You can view the full example app on GitHub.
Until next time.
If you find any errors in my code, or if you can think of ways to improve things, ping me via Twitter.