Running Rails jobs in background threads
It's good practice for Rails developers to execute slow tasks in a background job, rather than running them during the HTTP request/response cycle. The idea is to send the browser an HTML page as soon as possible, without making it wait while the server grinds away doing something else (like sending emails).
The most common approach to running these longer "jobs" is to hand them off to a separate process to execute them independently of the web server. There are plenty of great libraries out there that provide the "background job" functionality for you.
There's only one problem – if your app runs on Heroku you'll need to start up a worker dyno (at a cost of $35 per month) to run these background jobs. When all you want to do is to send a few emails in response to people taking actions in your app, this can seem like overkill.
In Agile Planner, I'm sending notification emails in background threads instead.
Threading in Rails
I think a bit of background on how Rails behaves with threads wouldn't go amiss at this point.
Aaron Patterson has written a great post on the config.threadsafe! configuration setting, which explains how threads work with different types of web server. I recommend you read it.
Once you've read Aaron's post you'll understand why (if you're running on a threaded web server such as Puma) enabling threading in your Rails app will allow a single process to concurrently handle multiple requests simultaneously.
You'll also understand that if you're running on a multi-process web server (e.g. Unicorn) that each process will only handle one request at once (even if you enable threading) due to the way that Unicorn's multi-process model works. A single Unicorn "worker" process will be handed a request by the parent server. While it's serving the request Unicorn's master process won't hand it another one, but once the TCP socket to the web browser has been closed the worker is added back into the pool of workers that are available to handle another incoming request.
None of this has much of an impact on the code I'm about to show you; I mention it simply because if you're running on Unicorn (as I am) you needn't give too much thought to questions of whether or not Rails is threadsafe in order to be able to put threads to good use yourself.
Running tasks in a thread
Threading in Ruby is pretty simple. You can just do this (the new thread will execute immediately):
Thread.new do
puts "I'm in a thread!"
end
To deliver an email to a list of users from a background thread, we just need to use ActionMailer
from within a thread. A naive example might look like this:
Thread.new do
list_of_users.each do |user|
MyNotificationMailer.say_hello(user).deliver
end
end
There are a couple of problems with this code.
Cleaning up the database connections
Each thread opens its own database connection, which you should close when the thread terminates in order to avoid leaving dangling connections to your database.
To update the previous example, we could write:
Thread.new do
list_of_users.each do |user|
MyNotificationMailer.say_hello(user).deliver
end
ActiveRecord::Base.connection.close
end
Avoiding race conditions
We've just seen that ActiveRecord opens a new connection to the database for every thread. When you're using a relational database you'll usually find that data written via one connection won't become visible to the other connections until the transaction is committed.
In our example, the code in the thread won't be able to find any users that have been added in the main thread until the main thread commits its transaction. Sometimes the transaction will be committed before the thread tries to access the database, but sometimes the thread will get there first. This could mean that some people won't receive our email!
A cheeky workaround to this problem is to load any ActiveRecord objects that you need in your thread before you enter the thread. You can do this by writing some Ruby that will force the query to run before you enter the thread:
# Note the call to to_a, which forces ActiveRecord to
# execute the query outside of the thread.
#
# The list_of_users variable is available to the thread.
list_of_users = User.where(my: 'query').to_a
Thread.new do
list_of_users.each do |user|
MyNotificationMailer.say_hello(user).deliver
end
ActiveRecord::Base.connection.close
end
Consider performance when calling to_a
on an ActiveRecord scope like this; it loads all the objects returned by the query into memory. If you're only going to be calling it on a small number of records it's fine, but if you had thousands of users to email this approach could have a negative impact on your app's performance. That's why I describe this approach as "cheeky".
An alternative solution is to commit the data in the main thread, before you call the code that spawns the thread. You can do this by wrapping your code that updates the database with ActiveRecord's transaction
method:
AnyActiveRecordModel.transaction do
# Modify data that the thread needs here...
User.create!(user_attributes)
end
Thread.new do
list_of_users = User.where(my: 'query')
list_of_users.each do |user|
MyNotificationMailer.say_hello(user).deliver
end
ActiveRecord::Base.connection.close
end
Putting it all together
This code isn't something that I'd want to drop directly into a controller action. It's too long, and it's not very obvious at first glance what it does. I've got a few emails that need sending in the background like this, so I've bundled them all up into a class called MailFlinger
.
Here's a snippet showing how the "Alice created a new board" emails get sent by Agile Planner:
class MailFlinger < Struct.new(:current_user)
def board_created(board)
background do
recipients = board.account.users.rejecting(current_user)
recipients.each do |recipient|
AccountMailer.new_board(board, current_user, recipient).deliver
end
end
end
def background(&block)
Thread.new do
yield
ActiveRecord::Base.connection.close
end
end
end
A Hexagonal aside
If you're wondering why none of my example code shows emails getting sent from within a Rails controller, it's because I've been using some of the ideas in Hexagonal Rails to keep things tidy. I've already written about hexagonal rails, explaining how I arrived at controller actions that look like this:
class BoardsController < ApplicationController::Base
def create
builder = BoardBuilder.new(current_account)
builder.add_subscriber(CreateResponse.new(self))
builder.create_board(name: board_params['name'])
end
end
The BoardBuilder
is responsible for creating a new board. If the database is updated successfully, BoardBuilder
emits a board_created
event (by calling #board_created
on all its subscribers). The CreateResponse
class knows about setting the flash message and sending an HTTP redirect to the browser. It also knows how to cope with a board_create_failed
event, re-rendering the form with an error message, etc.
So by now you've probably worked it out how MailFlinger
fits in. It's a subscriber that knows what to do when it hears that a board has been created. I use it like this:
class BoardsController < ApplicationController::Base
def create
builder = BoardBuilder.new(current_account)
builder.add_subscriber(CreateResponse.new(self))
builder.add_subscriber(MailFlinger.new(current_user))
builder.create_board(name: board_params['name'])
end
end
If you'd like to read more about that approach, check out Refactoring with Hexagonal Rails.
It became especially useful once I started building the API, which is implemented using a different set of controllers that are built from the same domain objects.
In the API controllers I just need to choose to add the appropriate subscribers to each domain object. There are some differences - the API code doesn't need to set any flash messages, for example.
I'll be extending MailFlinger
again later when Agile Planner starts sending more emails in response to user activity.
When to use threads for background jobs
There is one more potential issue with this approach, and you'll need to decide whether or not the technique is suitable on a case by case basis.
When Ruby creates a new thread, that thread can either shut itself down when it finishes its work, or be killed automatically when the main Ruby process finishes. Unless you go out of your way to prevent it, Ruby will kill your thread when the process exits. In other words, when your web server gets restarted all your threads will get killed.
Paul Brannan pointed out (in the comments) that we can use Ruby's at_exit
method to define a block of code that Ruby will execute when it wants to shut down. This means that you could write an at_exit
handler that would delay Ruby's exit until your job had finished.
However… jkburges noticed that Paul's technique creates a memory leak, so it's not a good idea if you've got a long running process and/or will be creating a lot of threads.
If you like the idea of running jobs in threads instead of in a separate process, I recommend you take a quick look at Sucker Punch. It's an in-memory job runner, that uses a separate thread for each "worker". I haven't used it myself, but it looks good. It can also be used with the Active Job API, which is a good idea if you're using Rails; you'd be able to switch to a more robust job runner if/when your application required it.